Tips and Tricks
This section contains various tips and tricks applicable to most tasks in the library.
Visualization support
The Weights & Biases framework is supported for visualizing model training.
To use this, simply set a project name for W&B in the wandb_project
attribute of the args
dictionary. This will log all hyperparameter values, training losses, and evaluation metrics to the given project.
1
model = ClassificationModel('roberta', 'roberta-base', args={'wandb_project': 'project-name'})
For a complete example, see here.
Using early stopping
Early stopping is a technique used to prevent model overfitting. In a nutshell, the idea is to periodically evaluate the performance of a model against a test dataset and terminate the training once the model stops improving on the test data.
The exact conditions for early stopping can be adjusted as needed using a model’s configuration options.
Note: Refer the configuration options table for more details. (early_stopping_consider_epochs
, early_stopping_delta
, early_stopping_metric
, early_stopping_metric_minimize
, early_stopping_patience
)
You must set use_early_stopping
to True
in order to use early stopping.
1
2
3
4
5
6
7
8
9
10
11
12
from simpletransformers.classification import ClassificationModel, ClassificationArgs
model_args = ClassificationArgs()
model_args.use_early_stopping = True
model_args.early_stopping_delta = 0.01
model_args.early_stopping_metric = "mcc"
model_args.early_stopping_metric_minimize = False
model_args.early_stopping_patience = 5
model_args.evaluate_during_training_steps = 1000
model = ClassficationModel("bert", "bert-base-cased", args=model_args)
With this configuration, the training will terminate if the mcc
score of the model on the test data does not improve upon the best mcc
score by at least 0.01
for 5 consecutive evaluations. An evaluation will occur once for every 1000
training steps.
Pro tip: You can use the evaluation during training functionality without invoking early stopping by setting evaluate_during_training
to True
while keeping use_early_stopping
as False
.
Additional Evaluation Metrics
Task-specific Simple Transformers models each have their own default metrics that will be calculated when a model is evaluated on a dataset. The default metrics have been chosen according to the task, usually by looking at the metrics used in standard benchmarks for that task.
However, it is likely that you will wish to calculate your own metrics depending on your particular use case. To facilitate this, all eval_model()
and train_model()
methods in Simple Transformers accepts keyword-arguments consisting of the name of the metric (str), and the metric function itself. The metric function should accept two inputs, the true labels and the model predictions (sklearn format).
1
2
3
4
5
6
7
8
9
from simpletransformers.classification import ClassificationModel
import sklearn
model = ClassficationModel("bert", "bert-base-cased")
model.train_model(train_df, acc=sklearn.metrics.accuracy_score)
model.eval_model(eval_df, acc=sklearn.metrics.accuracy_score)
Pro tip: You can combine the additional evaluation metrics functionality with early stopping by setting the name of your metrics function as the early_stopping_metric
.
Simple-Viewer (Visualizing Model Predictions with Streamlit)
Simple Viewer is a web-app built with the Streamlit framework which can be used to quickly try out trained models.
To start Simple Viewer, run the command simple-viewer
.
When Simple Viewer is started, it will look for Simple Transfomers models in the current directory and any subdirectories. All detected models can be found in the Choose Model
dropdown. Alternatively, you can load a model by specifying the Simple Transformers task, model type, and model name (model type and model name follows the usual Simple Transformers conventions). The model name may be the path to a local model, or it may be the model name for a model from the Hugging Face model hub.
The following Simple Transformers tasks are currently supported:
- Classification
- Multi-Label Classification
- Named Entity Recognition
- Question Answering
Hyperparameter Optimization
Machine learning models can be very sensitive to the hyperparameters used to train them. While large models like Transformers can perform well across a relatively wider hyperparameter range, they can also break completely under certain conditions (like training with large learning rates for many iterations).
Hint: We can define two kinds of parameters used to train Transformer models. The first is the learned parameters (like the model weights) and the second is hyperparameters. To give a high-level description of the two kinds of parameters, the hyperparameters (learning rate, batch sizes, etc.) are used to control the process of learning learned parameters.
Choosing a good set of hyperparameter values plays a huge role in developing a state-of-the-art model. Because of this, Simple Transformers has native support for the excellent W&B Sweeps feature for automated hyperparameter optimization.
How to perform hyperparameter optimization with Simple Transformers and W&B Sweeps (Adapted from W&B docs):
1. Setup the sweep
The sweep can be configured through a Python dictionary (sweep_config
). The dictionary contains at least 3 keys;
-
method
– Specifies the search strategymethod
Meaning grid Grid search iterates over all possible combinations of parameter values. random Random search chooses random sets of values. bayes Bayesian Optimization uses a gaussian process to model the function and then chooses parameters to optimize probability of improvement. This strategy requires a metric key to be specified. -
metric
– Specifies the metric to be optimizedThis should be a metric that is logged to W&B by the training script
The
metric
key of thesweep_config
points to another Python dictionary containing thename
,goal
, and (optionally)target
.sub-key Meaning name Name of the metric to optimize goal "minimize"
or"maximize"
(Default is"minimize"
)target Value that you’d like to achieve for the metric you’re optimizing. When any run in the sweep achieves that target value, the sweep’s state will be set to “Finished.” This means all agents with active runs will finish those jobs, but no new runs will be launched in the sweep. -
parameters
– Specifies the hyperparameters and their values to exploreThe
parameters
key of thesweep_config
points to another Python dictionary which contains all the hyperparameters to be optimized and their possible values. Generally, these will be any combination of themodel_args
for the particular Simple Transformers model.W&B offers a variety of ways to define the possible values for each parameter, all of which can be found in the W&B docs. The possible values are also represented using a Python dictionary. Two common methods are given below.
-
Discrete values
A dictionary with the key
values
pointing to a Python list of discrete values. -
Range of values
A dictionary with the two keys
min
andmax
which specifies the minimum and maximum values of the range. The range is continuous ifmin
andmax
are floats and discrete ifmin
andmax
are ints.
-
Example sweep_config
:
1
2
3
4
5
6
7
8
sweep_config = {
"method": "bayes", # grid, random
"metric": {"name": "train_loss", "goal": "minimize"},
"parameters": {
"num_train_epochs": {"values": [2, 3, 5]},
"learning_rate": {"min": 5e-5, "max": 4e-4},
},
}
2. Initialize the sweep
Initialize a W&B sweep with the config defined earlier.
1
sweep_id = wandb.sweep(sweep_config, project="Simple Sweep")
3. Prepare the data and default model configuration
In order to run our sweep, we must get our data ready. This is identical to how you would normally set up datasets for training a Simple Transformers model.
For example;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Preparing train data
train_data = [
["Aragorn was the heir of Isildur", "true"],
["Frodo was the heir of Isildur", "false"],
]
train_df = pd.DataFrame(train_data)
train_df.columns = ["text", "labels"]
# Preparing eval data
eval_data = [
["Theoden was the king of Rohan", "true"],
["Merry was the king of Rohan", "false"],
]
eval_df = pd.DataFrame(eval_data)
eval_df.columns = ["text", "labels"]
Next, we can set up the default configuration for the Simple Transformers model. This would include any args
that are not being optimized through the sweep.
Hint: As a rule of thumb, it might be a good idea to set all of reprocess_input_data
, overwrite_output_dir
, and no_save
to True
when running sweeps.
1
2
3
4
5
6
7
8
9
10
model_args = ClassificationArgs()
model_args.reprocess_input_data = True
model_args.overwrite_output_dir = True
model_args.evaluate_during_training = True
model_args.manual_seed = 4
model_args.use_multiprocessing = True
model_args.train_batch_size = 16
model_args.eval_batch_size = 8
model_args.labels_list = ["true", "false"]
model_args.wandb_project = "Simple Sweep"
4. Set up the training function
W&B will call this function to run the training for a particular sweep run. This function must perform 3 critical tasks.
- Initialize the
wandb
run - Initialize a Simple Transformers model and pass in
sweep_config=wandb.config
as akwarg
. - Run the training for the Simple Transformers model.
wandb.config
contains the hyperparameter values for the current sweeps run. Simple Transformers will update the model args
accordingly.
An example training function is shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def train():
# Initialize a new wandb run
wandb.init()
# Create a TransformerModel
model = ClassificationModel(
"roberta",
"roberta-base",
use_cuda=True,
args=model_args,
sweep_config=wandb.config,
)
# Train the model
model.train_model(train_df, eval_df=eval_df)
# Evaluate the model
model.eval_model(eval_df)
# Sync wandb
wandb.join()
In addition to the 3 tasks outlined earlier, the function also performs an evaluation and manually syncs the W&B run.
Hint: This function can be reused across any Simple Transformers task by simply replacing ClassificationModel
with the appropriate model class.
5. Run the sweeps
The following line will execute the sweeps.
1
wandb.agent(sweep_id, train)
6. Putting it all together
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import logging
import pandas as pd
import sklearn
import wandb
from simpletransformers.classification import (
ClassificationArgs,
ClassificationModel,
)
sweep_config = {
"method": "bayes", # grid, random
"metric": {"name": "train_loss", "goal": "minimize"},
"parameters": {
"num_train_epochs": {"values": [2, 3, 5]},
"learning_rate": {"min": 5e-5, "max": 4e-4},
},
}
sweep_id = wandb.sweep(sweep_config, project="Simple Sweep")
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)
# Preparing train data
train_data = [
["Aragorn was the heir of Isildur", "true"],
["Frodo was the heir of Isildur", "false"],
]
train_df = pd.DataFrame(train_data)
train_df.columns = ["text", "labels"]
# Preparing eval data
eval_data = [
["Theoden was the king of Rohan", "true"],
["Merry was the king of Rohan", "false"],
]
eval_df = pd.DataFrame(eval_data)
eval_df.columns = ["text", "labels"]
model_args = ClassificationArgs()
model_args.reprocess_input_data = True
model_args.overwrite_output_dir = True
model_args.evaluate_during_training = True
model_args.manual_seed = 4
model_args.use_multiprocessing = True
model_args.train_batch_size = 16
model_args.eval_batch_size = 8
model_args.labels_list = ["true", "false"]
model_args.wandb_project = "Simple Sweep"
def train():
# Initialize a new wandb run
wandb.init()
# Create a TransformerModel
model = ClassificationModel(
"roberta",
"roberta-base",
use_cuda=True,
args=model_args,
sweep_config=wandb.config,
)
# Train the model
model.train_model(train_df, eval_df=eval_df)
# Evaluate the model
model.eval_model(eval_df)
# Sync wandb
wandb.join()
wandb.agent(sweep_id, train)
Hint: This script can also be found in the examples
directory of the Github repo.
To visualize your sweep results, open the project on W&B. Please refer to W&B docs for more details on understanding the results.
Guide: Guide for hyperparameter optimization here.
Custom Parameter Groups (Freezing Layers)
Simple Transformers supports custom parameter groups which can be used to set different learning rates for different layers in a model, freeze layers, train only the final layer, etc.
All Simple Transformers models supports the following three configuration options for setting up custom parameter groups.
Custom parameter groups
custom_parameter_groups
offers the most granular configuration option. This should be a list of Python dicts where each dict contains a params
key and any other optional keys matching the keyword arguments accepted by the optimizer (e.g. lr
, weight_decay
). The value for the params
key should be a list of named parameters (e.g. ["classifier.weight", "bert.encoder.layer.10.output.dense.weight"]
)
Hint: All Simple Transformers models have a get_named_parameters()
method that returns a list of all parameter names in the model.
1
2
3
4
5
6
7
model_args = ClassificationArgs()
model_args.custom_parameter_groups = [
{
"params": ["classifier.weight", "bert.encoder.layer.10.output.dense.weight"],
"lr": 1e-2,
}
]
Custom layer parameters
custom_layer_parameters
makes it more convenient to set the optimizer options for a given layer or set of layers. This should be a list of Python dicts where each dict contains a layer
key and any other optional keys matching the keyword arguments accepted by the optimizer (e.g. lr
, weight_decay
). The value for the layer
key should be an int
(must be numeric) which specifies the layer (e.g. 0
, 1
, 11
).
1
2
3
4
5
6
7
8
9
10
11
model_args = ClassificationArgs()
model_args.custom_layer_parameters = [
{
"layer": 10,
"lr": 1e-3,
},
{
"layer": 0,
"lr": 1e-5,
},
]
Note: Any named parameters specified through custom_layer_parameters
with bias
or LayerNorm.weight
in the name will have their weight_decay
set to 0.0
. This also happens for any parameters not specified in either custom_parameter_groups
or in custom_layer_parameters
but does not happen for parameters specified through custom_parameter_groups
.
Order of precedence:
Note that custom_parameter_groups
has higher priority than custom_layer_parameters
as custom_parameter_groups
is more specific. If a parameter specificed in custom_parameter_groups
also happens to be in a layer specified in custom_layer_parameters
, that particular parameter will be assigned to the parameter group specified in custom_parameter_groups
.
For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
model_args = ClassificationArgs()
model_args.custom_layer_parameters = [
{
"layer": 10,
"lr": 1e-3,
},
{
"layer": 0,
"lr": 1e-5,
},
]
model_args.custom_parameter_groups = [
{
"params": ["classifier.weight", "bert.encoder.layer.10.output.dense.weight"],
"lr": 1e-2,
}
]
Here, "bert.encoder.layer.10.output.dense.weight"
is specified in both the custom_parameter_groups
and the custom_layer_parameters
. However, "bert.encoder.layer.10.output.dense.weight"
will have a lr
of 1e-2
due to the higher precedence of custom_parameter_groups
.
Hint: Any parameters not specified in either custom_parameter_groups
or in custom_layer_parameters
will be assigned the general values from the model args.
Train custom parameters only
The train_custom_parameters_only
option is used to facilitate the training of specific parameters only. If train_custom_parameters_only
is set to True
, only the parameters specified in either custom_parameter_groups
or in custom_layer_parameters
will be trained.
For example, to train only the Classification layers of a ClassificationModel
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import pandas as pd
import logging
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)
# Preparing train data
train_data = [
["Aragorn was the heir of Isildur", 1],
["Frodo was the heir of Isildur", 0],
]
train_df = pd.DataFrame(train_data)
train_df.columns = ["text", "labels"]
# Preparing eval data
eval_data = [
["Theoden was the king of Rohan", 1],
["Merry was the king of Rohan", 0],
]
eval_df = pd.DataFrame(eval_data)
eval_df.columns = ["text", "labels"]
# Train only the classifier layers
model_args = ClassificationArgs()
model_args.train_custom_parameters_only = True
model_args.custom_parameter_groups = [
{
"params": ["classifier.weight"],
"lr": 1e-3,
},
{
"params": ["classifier.bias"],
"lr": 1e-3,
"weight_decay": 0.0,
},
]
# Create a ClassificationModel
model = ClassificationModel(
"bert", "bert-base-cased", args=model_args
)
# Train the model
model.train_model(train_df)
Options For Downloading Pre-Trained Models
Most Simple Transformers models will use the from_pretrained()
method from the Hugging Face Transformers library to download pre-trained models. You can pass kwargs
to this method to configure things like proxies and force downloading (refer to method link above).
You can pass these kwargs
when initializing a Simple Transformers task-specific model to access the same functionality. For example, if you are behind a firewall and need to set the proxy settings;
1
2
3
4
5
model = ClassficationModel(
"bert",
"bert-base-cased",
proxies={"http": "foo.bar:3128", "http://hostname": "foo.bar:4012"}
)
ONNX Support (Beta)
Simple Transformers has ONNX support for Classification and NER tasks. These models can be converted to an ONNX model and run through the ONNX-runtime.
Heads up: ONNX support should be considered experimental at this time. If you encounter any problems, please open an issue in the repo. Please provide a detailed explanation and the minimal code necessary to replicate the issue.
ONNX setup
Please refer to the following pages for instructions on installing ONNX.
Converting a Simple Transformers model to the ONNX format.
The following models are currently compatible:
- ClassificationModel
- NERModel
These models can be converted by calling the convert_to_onnx()
method. You can change the output directory by specifying output_dir
when calling this method.
1
2
3
4
5
6
7
8
9
10
11
12
13
from simpletransformers.classification import (
ClassificationModel,
ClassificationArgs,
)
model = ClassificationModel(
"roberta",
"roberta-base",
)
model.convert_to_onnx("onnx_outputs")
Loading a converted ONNX model
You can load the ONNX model just as you would load any other model in Simple Transformers.
1
2
3
4
5
6
7
8
9
10
11
12
13
from simpletransformers.classification import (
ClassificationModel,
ClassificationArgs,
)
model = ClassificationModel(
"roberta",
"onnx_outputs",
)
model.convert_to_onnx("onnx_outputs")
After the model is loaded, you can use the predict()
method to make predictions.
Code example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from time import time
from simpletransformers.classification import (
ClassificationModel,
ClassificationArgs,
)
model_args = ClassificationArgs()
model_args.overwrite_output_dir = True
# Create a TransformerModel
model = ClassificationModel(
"roberta",
"roberta-base",
use_cuda=False,
args=model_args,
)
start = time()
print(model.predict(["test " * 450]))
end = time()
print(f"Pytorch CPU: {end - start}")
model.convert_to_onnx("onnx_outputs")
model_args.dynamic_quantize = True
model = ClassificationModel(
"roberta",
"onnx_outputs",
args=model_args,
)
start = time()
print(model.predict(["test " * 450]))
end = time()
print(f"ONNX CPU (Cold): {end - start}")
start = time()
print(model.predict(["test " * 450]))
end = time()
print(f"ONNX CPU (Warm): {end - start}")
Execution Providers
ONNX-Runtime supports many different Execution Providers.
If use_cuda
is True
, CUDAExecutionProvider
will be used. If it is False
, the CPUExecutionProvider
will be used.
You can manually specify the provider using the onnx_execution_provider
argument when loading a model.
1
2
3
4
5
6
7
model = ClassificationModel(
"roberta",
"onnx_outputs",
args=model_args,
onnx_execution_provider="CPUExecutionProvider",
)
Note that the library is only tested with CPU and CUDA Execution Providers
Saving checkpoints
Don’t save model checkpoints
When training takes little time we may want to save no intermediary checkpoints to reduce disk space usage and training time.
Note that the model artifacts will still be saved to output_dir
when the training process finishes.
We can prevent the model from saving intermediary checkpoints by setting the following arguments: set save_steps
to -1
and save_model_every_epoch
to False
1
2
3
4
5
6
7
from simpletransformers.classification import ClassificationModel, ClassificationArgs
model_args = ClassificationArgs()
model_args["save_steps"] = -1
model_args["save_model_every_epoch"] = False
model = ClassficationModel("bert", "bert-base-cased", args=model_args)
Save model checkpoint every 3 epochs
Every model checkpoint takes the same disk space as a final model, When training transformer models for a high number of epochs we may not want to save checkpoints for every single epoch since this would take a lot of disk space. In the following example You will see how to save a checkpoint every 3 epochs.
The procedure just requires two steps:
- Turn off automatic save after every epoch by setting
save_model_every_epoch
arg toFalse
save_steps
must be set to N(save every N epochs) times the number of steps the model will perform for every epoch
1
2
3
4
5
6
7
8
9
10
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import math
SAVE_EVERY_N_EPOCHS = 3
model_args = ClassificationArgs()
steps_per_epoch = math.floor(len(train_df) / SAVE_EVERY_N_EPOCHS)
if(len(train_df) % SAVE_EVERY_N_EPOCHS > 0):
steps_per_epoch +=1
model_args["save_steps"] = steps_per_epoch * SAVE_EVERY_N_EPOCHS
model_args["save_model_every_epoch"] = False
model = ClassficationModel("bert", "bert-base-cased", args=model_args)