T5 Model
T5Model
The T5Model
class is used for any NLP task performed with a T5 model or a mT5 model.
To create a T5Model
, you must specify the model_type
and model_name
.
model_type
should be one of the model types from the supported models (t5
ormt5
)-
model_name
specifies the exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.Note: For a list of standard pre-trained models, see here.
Note: For a list of community models, see here.
You may use any of these models provided they are a T5 model.
1
2
3
4
5
6
7
from simpletransformers.t5 import T5Model
model = T5Model(
"t5",
"t5-base"
)
Note: For more information on working with Simple Transformers models, please refer to the General Usage section.
Configuring a T5Model
T5Model
has the following task-specific configuration options.
Argument | Type | Default | Description | |
---|---|---|---|---|
dataset_class | Dataset | None | A custom dataset class to use. (Subclass of Pytorch Dataset) | |
do_sample | bool | False | If set to False greedy decoding is used. Otherwise sampling is used. Defaults to False as defined in configuration_utils.PretrainedConfig. | |
early_stopping | bool | True | if set to True beam search is stopped when at least num_beams sentences finished per batch. | |
evaluate_generated_text | bool | False | Generate sequences for evaluation. | |
length_penalty | float | 2.0 | Exponential penalty to the length. Default to 2. | |
max_length | int | 20 | The max length of the sequence to be generated. Between 0 and infinity. Default to 20. | |
max_steps | int | -1 | Maximum number of training steps. Will override the effect of num_train_epochs. | |
num_beams | int | 1 | Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1. | |
num_return_sequences | int | 1 | The number of samples to generate. | |
preprocess_inputs | bool | True | Automatically add : and < /s> tokens to train_model() and eval_model() inputs. Automatically add < /s> to each string in to_predict in predict(). | |
repetition_penalty | float | 1.0 | The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0. | |
special_tokens_list | list | [] | The list of special tokens to be added to the model tokenizer | |
top_k | int | None | Filter top-k tokens before sampling (<=0: no filtering) | |
top_p | float | None | Nucleus filtering (top-p) before sampling (<=0.0: no filtering) | |
use_multiprocessed_decoding | bool | True | Use multiprocessing when decoding outputs. Significantly speeds up decoding (CPU intensive). |
1
2
3
4
5
6
7
8
9
10
from simpletransformers.t5 import T5Model, T5Args
model_args = T5Args()
model_args.num_train_epochs = 3
model = T5Model(
"t5-base",
args=model_args,
)
Note: For configuration options common to all Simple Transformers models, please refer to the Configuring a Simple Transformers Model section.
Class T5Model
simpletransformers.t5.T5Model(self, model_name, args=None, use_cuda=True, cuda_device=-1, **kwargs,)
Initializes a T5Model model.
Parameters
-
model_type (
str
) - The type of model (t5, mt5). -
model_name (
str
) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files. -
args (
dict
, optional) - Default args will be used if this parameter is not provided. If provided, it should be a dict containing the args that should be changed in the default args or aT5Args
object. -
use_cuda (
bool
, optional) - Use GPU if available. Setting to False will force model to use CPU only. (See here) -
cuda_device (
int
, optional) - Specific GPU that should be used. Will use the first available GPU by default. (See here) -
kwargs (optional) - For providing proxies, force_download, resume_download, cache_dir and other options specific to the ‘from_pretrained’ implementation where this will be supplied. (See here)
Returns
None
Training a T5Model
The train_model()
method is used to train the model.
1
model.train_model(train_data)
simpletransformers.t5.T5Model.train_model(self, train_data, output_dir=None, show_running_loss=True, args=None, eval_data=None, verbose=True, **kwargs)
Trains the model using ‘train_data’
Parameters
- train_data - Pandas DataFrame containing the 3 columns -
prefix
,input_text
,target_text
.prefix
: A string indicating the task to perform. (E.g."question"
,"stsb"
)input_text
: The input text sequence.prefix
is automatically prepended to form the full input. (: ) target_text
: The target sequence
-
output_dir (
str
, optional) - The directory where model files will be saved. If not given,self.args['output_dir']
will be used. -
show_running_loss (
bool
, optional) - If True, the running loss (training loss at current step) will be logged to the console. -
args (
dict
, optional) - A dict of configuration options for theT5Model
. Any changes made will persist for the model. -
eval_data (optional) - Evaluation data (same format as train_data) against which evaluation will be performed when
evaluate_during_training
is enabled. Is required ifevaluate_during_training
is enabled. - kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section.
E.g.
f1=sklearn.metrics.f1_score
. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.
Returns
None
Note: For more details on evaluating T5 models with custom metrics, please refer to the Evaluating Generated Sequences section.
Note: For more details on training models with Simple Transformers, please refer to the Tips and Tricks section.
Evaluating a T5Model
The eval_model()
method is used to evaluate the model.
The following metrics will be calculated by default:
eval_loss
- Model loss over the evaluation data
1
result = model.eval_model(eval_data)
simpletransformers.t5.T5Model.eval_model(self, eval_data, output_dir=None, verbose=True, silent=False, **kwargs)
Evaluates the model using ‘eval_data’
Parameters
- eval_data - Pandas DataFrame containing the 3 columns -
prefix
,input_text
,target_text
.prefix
: A string indicating the task to perform. (E.g."question"
,"stsb"
)input_text
: The input text sequence.prefix
is automatically prepended to form the full input. (: ) target_text
: The target sequence
-
output_dir (
str
, optional) - The directory where model files will be saved. If not given,self.args['output_dir']
will be used. -
verbose (
bool
, optional) - If verbose, results will be printed to the console on completion of evaluation. -
silent (
bool
, optional) - If silent, tqdm progress bars will be hidden. - kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section.
E.g.
f1=sklearn.metrics.f1_score
. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.
Returns
- result (
dict
) - Dictionary containing evaluation results.
Note: For more details on evaluating T5 models with custom metrics, please refer to the Evaluating Generated Sequences section.
Note: For more details on evaluating models with Simple Transformers, please refer to the Tips and Tricks section.
Making Predictions With a T5Model
The predict()
method is used to make predictions with the model.
1
2
3
4
5
6
to_predict = [
"binary classification: Luke blew up the first Death Star",
"generate question: In 1971, George Lucas wanted to film an adaptation of the Flash Gordon serial, but could not obtain the rights, so he began developing his own space opera.",
]
predictions = model.predict(to_predict)
Note: The input must be a List even if there is only one sentence.
simpletransformers.t5.T5Model.predict(to_predict)
Performs predictions on a list of text to_predict
.
Parameters
to_predict - A python list of text (str) to be sent to the model for prediction.
Returns
- preds (
list
) - A python list of the generated sequences.