Classification Models

There are two task-specific Simple Transformers classification models, ClassificationModel and MultiLabelClassificationModel. The two are mostly identical except for the specific use-case and a few other minor differences detailed below.

ClassificationModel

The ClassificationModel class is used for all text classification tasks except for multi label classification.

To create a ClassificationModel, you must specify a model_type and a model_name.

  • model_type should be one of the model types from the supported models (e.g. bert, electra, xlnet)
  • model_name specifies the exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

    Note: For a list of standard pre-trained models, see here.

    Note: For a list of community models, see here.

    You may use any of these models provided the model_type is supported.

1
2
3
4
5
6
from simpletransformers.classification import ClassificationModel


model = ClassificationModel(
    "roberta", "roberta-base"
)

Note: For more information on working with Simple Transformers models, please refer to the General Usage section.

Class ClassificationModel

simpletransformers.classification.ClassificationModel(self, model_type, model_name, num_labels=None, weight=None, args=None, use_cuda=True, cuda_device=-1, **kwargs,)

Initializes a ClassificationModel model.

Parameters

  • model_type (str) - The type of model to use (model types)

  • model_name (str) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

  • num_labels (int, optional) - The number of labels or classes in the dataset. (See here)

  • weight (list, optional) - A list of length num_labels containing the weights to assign to each label for loss calculation. (See here)

  • args (dict, optional) - Default args will be used if this parameter is not provided. If provided, it should be a dict containing the args that should be changed in the default args.

  • use_cuda (bool, optional) - Use GPU if available. Setting to False will force model to use CPU only. (See here)

  • cuda_device (int, optional) - Specific GPU that should be used. Will use the first available GPU by default. (See here)

  • kwargs (optional) - For providing proxies, force_download, resume_download, cache_dir and other options specific to the ‘from_pretrained’ implementation where this will be supplied. (See here)

Returns

  • None

Specifying the number of classes/labels

By default, a ClassificationModel will behave as a binary classifier. You can specify the number of classes/labels to use it as a multi-class classifier or as a regression model.

Binary classification

1
2
3
model = ClassificationModel(
    "roberta", "roberta-base"
)

Multi-class classification

1
2
3
model = ClassificationModel(
    "roberta", "roberta-base", num_labels=4
)

Regression

1
2
3
4
5
6
7
8
model = ClassificationModel(
    "roberta",
    "roberta-base",
    num_labels=1,
    args={
        "regression": True
    }
)

Note: When performing regression, you must configure the model’s args dict and set regression to True in addition to specifying num_labels=1.

Setting class weights

A commonly used tactic to deal with imbalanced datasets is to assign weights to each label. This can be done by passing in a list of weights. The list must contain a weight value for each label.

1
2
3
4
5
6
model = ClassificationModel(
    "roberta",
    "roberta-base",
    num_labels=4,
    weight=[1, 0.5, 1, 2]
)

Configuring a Classification model

ClassificationModel has the following task-specific configuration options.

Argument Type Default Description
lazy_delimiter str \t The delimiter used to separate column in the file containing the lazy loading dataset
lazy_loading_start_line int 1 The line number where the dataset starts (1 means header row is skipped)
lazy_labels_column int 0 The column (based on the delimiter) containing the labels for lazy loading single sentence datasets
lazy_text_a_column int None The column (based on the delimiter) containing the first sentence (text_a) for lazy loading sentence-pair datasets
lazy_text_b_column int None The column (based on the delimiter) containing the second sentence (text_a) for lazy loading sentence-pair datasets
lazy_text_column int 0 The column (based on the delimiter) containing text for lazy loading single sentence datasets
regression int False Set True when doing regression. num_labels parameter in the model must also be set to 1.
sliding_window bool False Whether to use sliding window technique to prevent truncating longer sequences
special_tokens_list list [] The list of special tokens to be added to the model tokenizer
stride float/int 0.8 The distance to move the window when generating sub-sequences using a sliding window. Can be a fraction of the max_seq_length OR a number of tokens
tie_value int 1 The tie_value will be used as the prediction label for any samples where the sliding window predictions are tied
1
2
3
4
5
6
7
8
9
10
from simpletransformers.classification import ClassificationModel, ClassificationArgs


model_args = ClassificationArgs(sliding_window=True)

model = ClassificationModel(
    "roberta",
    "roberta-base",
    args=model_args,
)

Note: For configuration options common to all Simple Transformers models, please refer to the Configuring a Simple Transformers Model section.

MultiLabelClassificationModel

The MultiLabelClassificationModel is used for multi-label classification tasks.

To create a MultiLabelClassificationModel, you must specify a model_type and a model_name.

  • model_type should be one of the model types from the supported models (e.g. bert, electra, xlnet)
  • model_name specifies the exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

    Note: For a list of standard pre-trained models, see here.

    Note: For a list of community models, see here.

    You may use any of these models provided the model_type is supported.

1
2
3
4
5
6
from simpletransformers.classification import MultiLabelClassificationModel


model = MultiLabelClassificationModel(
    "roberta", "roberta-base"
)

Note: For more information on working with Simple Transformers models, please refer to the General Usage section.

Class MultiLabelClassificationModel

simpletransformers.classification.MultiLabelClassificationModel(self, model_type, model_name, num_labels=None, pos_weight=None, args=None, use_cuda=True, cuda_device=-1, **kwargs,)

Initializes a MultiLabelClassification model.

Parameters

  • model_type (str) - The type of model to use (model types)

  • model_name (str) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

  • num_labels (int, optional) - The number of labels or classes in the dataset. (See here)

  • pos_weight (list, optional) - A list of length num_labels containing the weights to assign to each label for loss calculation. (See here)

  • args (dict, optional) - Default args will be used if this parameter is not provided. If provided, it should be a dict containing the args that should be changed in the default args.

  • use_cuda (bool, optional) - Use GPU if available. Setting to False will force model to use CPU only. (See here)

  • cuda_device (int, optional) - Specific GPU that should be used. Will use the first available GPU by default. (See here)

  • kwargs (optional) - For providing proxies, force_download, resume_download, cache_dir and other options specific to the ‘from_pretrained’ implementation where this will be supplied. (See here)

Returns

  • None

Specifying the number of labels

The default number of labels in a MultiLabelClassificationModel is 2. This can be changed by passing in the number of values to num_labels.

1
2
3
model = MultiLabelClassificationModel(
    "roberta", "roberta-base", num_labels=4
)

Setting class weights

Setting class weights in the MultiLabelClassificationModel is done through the pos_weight parameter.

1
2
3
4
5
6
model = MultiLabelClassificationModel(
    "roberta",
    "roberta-base",
    num_labels=4,
    pos_weight=[1, 0.5, 1, 2]
)

Configuring a Multi-Label Classification Model

MultiLabelClassificationModel has the following task-specific configuration options.

Argument Type Default Description
threshold float 0.5 The threshold is the value at which a given label flips from 0 to 1 when predicting. The threshold may be a single value or a list of value with the same length as the number of labels. This enables the use of separate threshold values for each label.
1
2
3
4
5
6
7
8
9
model_args = {
    "threshold": 0.8
}

model = MultiLabelClassificationModel(
    "roberta",
    "roberta-base",
    args=model_args,
)

Note: For configuration options common to all Simple Transformers models, please refer to the Configuring a Simple Transformers Model section.

Training a Classification Model

The train_model() method is used to train the model. The train_model() method is identical for ClassificationModel and MultiLabelClassificationModel, except for the multi_label argument being True by default for the latter.

1
model.train_model(train_df)

simpletransformers.classification.ClassificationModel.train_model(self, train_df, multi_label=False, output_dir=None, show_running_loss=True, args=None, eval_df=None, verbose=True, **kwargs)

Trains the model using ‘train_df’

Parameters

  • train_df - Pandas DataFrame containing the train data. Refer to Data Format.

  • output_dir (str, optional) - The directory where model files will be saved. If not given, self.args['output_dir'] will be used.

  • show_running_loss (bool, optional) - If True, the running loss (training loss at current step) will be logged to the console.

  • args (dict, optional) - A dict of configuration options for the ClassificationModel. Any changes made will persist for the model.

  • eval_df (dataframe, optional) - A DataFrame against which evaluation will be performed when evaluate_during_training is enabled. Is required if evaluate_during_training is enabled.

  • kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section. E.g. f1=sklearn.metrics.f1_score. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.

Returns

  • None

Note: For more details on training models with Simple Transformers, please refer to the Tips and Tricks section.

Evaluating a Classification Model

The eval_model() method is used to evaluate the model. The eval_model() method is identical for ClassificationModel and MultiLabelClassificationModel, except for the multi_label argument being True by default for the latter.

The following metrics will be calculated by default:

1
result, model_outputs, wrong_predictions = model.eval_model(eval_df)

simpletransformers.classification.ClassificationModel.eval_model(self, eval_df, multi_label=False, output_dir=None, verbose=True, silent=False, **kwargs)

Evaluates the model using ‘eval_df’

Parameters

  • eval_df - Pandas DataFrame containing the evaluation data. Refer to Data Format.

  • output_dir (str, optional) - The directory where model files will be saved. If not given, self.args['output_dir'] will be used.

  • verbose (bool, optional) - If verbose, results will be printed to the console on completion of evaluation.

  • silent (bool, optional) - If silent, tqdm progress bars will be hidden.

  • kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics. E.g. f1=sklearn.metrics.f1_score section. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.

Returns

  • result (dict) - Dictionary containing evaluation results.

  • model_outputs (list) - List of model outputs for each row in eval_df

  • wrong_preds (list) - List of InputExample objects corresponding to each incorrect prediction by the model

Note: For more details on evaluating models with Simple Transformers, please refer to the Tips and Tricks section.

Making Predictions With a Classification Model

The predict() method is used to make predictions with the model. The predict() method is identical for ClassificationModel and MultiLabelClassificationModel, except for the multi_label argument being True by default for the latter.

1
2
3
4
5
6
7
8
predictions, raw_outputs = model.predict(["Sample sentence 1", "Sample sentence 2"])
# For LayoutLM
predictions, raw_outputs = model.predict(
    [
        ["Sample page text 1", [1, 2, 3, 4], [11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]],
        ["Sample page text 2", [1, 2, 3, 4], [11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]],
    ]
)

Note: The input must be a List (or list of lists) even if there is only one sentence.

simpletransformers.classification.ClassificationModel.predict(to_predict, multi_label=False)

Performs predictions on a list of text (list of lists for model types layoutlm and layoutlmv2) to_predict.

Parameters

  • to_predict - A python list of text (str) to be sent to the model for prediction. For layoutlm and layoutlmv2 model types, this should be a list of lists: [ [text1, [x0], [y0], [x1], [y1]], [text2, [x0], [y0], [x1], [y1]], … [text3, [x0], [y0], [x1], [y1]] ]

Returns

  • preds (list) - A python list of the predictions (0 or 1) for each text.
  • model_outputs (list) - A python list of the raw model outputs for each text.

Tip: You can also make predictions using the Simple Viewer web app. Please refer to the Simple Viewer section.

Updated: