Multi-Modal Classification Model
MultiModalClassificationModel
The MultiModalClassificationModel
class is used for Multi-Modal Classification.
To create a MultiModalClassificationModel
, you must specify a model_type
and a model_name
.
model_type
should be one of the model types from the supported models (e.g. bert)-
model_name
specifies the exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.Note: For a list of standard pre-trained models, see here.
Note: For a list of community models, see here.
You may use any of these models provided the
model_type
is supported.
1
2
3
4
5
6
from simpletransformers.classification import MultiModalClassificationModel
model = MultiModalClassificationModel(
"bert", "bert-base-uncased"
)
Note: For more information on working with Simple Transformers models, please refer to the General Usage section.
Configuring a MultiModalClassificationModel
MultiModalClassificationModel
has several task-specific configuration options.
Argument | Type | Default | Description |
---|---|---|---|
regression | bool | False |
Set to True if predicting a continuous value. |
num_image_embeds | int | 1 |
Number of image embeddings from the image encoder. |
text_label | str | text |
The name of the text column/field in the dataset. |
labels_label | str | labels |
The name of the labels column/field in the dataset. |
images_label | str | images |
The name of the images column/field in the dataset. |
image_type_extension | str | "" |
The file type extension for image files (e.g. .json, .jpg). Only required if filepaths in the datasets do not include the file extensions. |
data_type_extension | str | "" |
The file type extension for text files (e.g. .json, .jpg). Only required if filepaths in the datasets do not include the file extensions. |
special_tokens_list | list | [] | The list of special tokens to be added to the model tokenizer |
1
2
3
4
5
6
7
8
9
10
from simpletransformers.classification import MultiModalClassificationModel, MultiModalClassificationArgs
model_args = MultiModalClassificationArgs()
model = MultiModalClassificationModel(
"bert",
"bert-base-uncased",
args=model_args,
)
Note: For configuration options common to all Simple Transformers models, please refer to the Configuring a Simple Transformers Model section.
Class MultiModalClassificationModel
simpletransformers.classification.MultiModalClassificationModel(self, model_type, model_name, multi_label=False, label_list=None, num_labels=None, pos_weight=None, args=None, use_cuda=True, cuda_device=-1, **kwargs,)
Initializes a MultiModalClassificationModel model.
Parameters
-
model_type (
str
) - The type of model to use (model types) -
model_name (
str
) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files. -
multi_label (
bool
, optional) - Set to True for multi label tasks. -
label_list (
list
, optional) - A list of all the labels (str) in the dataset. -
num_labels (
int
, optional) - The number of labels or classes in the dataset. -
pos_weight (
list
, optional) - A list of lengthnum_labels
containing the weights to assign to each label for loss calculation. -
args (
dict
, optional) - Default args will be used if this parameter is not provided. If provided, it should be a dict containing the args that should be changed in the default args. -
use_cuda (
bool
, optional) - Use GPU if available. Setting to False will force model to use CPU only. (See here) -
cuda_device (
int
, optional) - Specific GPU that should be used. Will use the first available GPU by default. (See here) -
kwargs (optional) - For providing proxies, force_download, resume_download, cache_dir and other options specific to the ‘from_pretrained’ implementation where this will be supplied. (See here)
Returns
None
Note: For configuration options common to all Simple Transformers models, please refer to the Configuring a Simple Transformers Model section.
Training a MultiModalClassificationModel
The train_model()
method is used to train the model.
1
model.train_model(train_data)
simpletransformers.classification.MultiModalClassificationModel(self, train_data, files_list=None, image_path=None, text_label=None, labels_label=None, images_label=None, image_type_extension=None, data_type_extension=None, auto_weights=False, output_dir=None, show_running_loss=True, args=None, eval_data=None, verbose=True, **kwargs)
Trains the model using ‘train_data’
Parameters
-
train_data - Path to data directory containing text files (JSON) and image files OR a Pandas DataFrame. If a DataFrame is given, it should contain the columns [text, labels, images]. When using a DataFrame, image_path MUST be specified. The image column of the DataFrame should contain the relative path from image_path to the image. E.g: For an image file 1.jpeg located in
"data/train/"
;
image_path = "data/train/"
images = "1.jpeg"
-
files_list (
list
orstr
, optional) - If given, only the files specified in this list will be taken from data directory.files_list
can be a Python list or the path (str
) to a JSON file containing a list of files. -
image_path (
str
, optional) - Must be specified when using DataFrame as input. Path to the directory containing the images. -
text_label (
str
, optional) - Column name to look for instead of the default"text"
. Note that the change will persist even after the method call terminates. That is, theargs
dictionary of the model itself will be modified. -
labels_label (
str
, optional) - Column name to look for instead of the default"labels"
. Note that the change will persist even after the method call terminates. That is, theargs
dictionary of the model itself will be modified. -
images_label (
str
, optional) - Column name to look for instead of the default"images"
. Note that the change will persist even after the method call terminates. That is, theargs
dictionary of the model itself will be modified. -
image_type_extension (
str
, optional) - If given, this will be added to the end of each value in “images”. -
data_type_extension (
str
, optional) - If given, this will be added to the end of each value in “files_list”. -
auto_weights (
str
, optional) - If True, weights will be used to balance the classes. Only implemented for multi label tasks currently. -
output_dir (
bool
, optional) - The directory where model files will be saved. If not given,self.args['output_dir']
will be used. -
show_running_loss (
bool
, optional) - If True, the running loss (training loss at current step) will be logged to the console. -
args (
dict
, optional) - A dict of configuration options for theMultiModalClassificationModel
. Any changes made will persist for the model. -
eval_data (optional) - Evaluation data (same format as train_data) against which evaluation will be performed when evaluate_during_training is enabled. Is required if evaluate_during_training is enabled.
-
kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section. E.g.
f1=sklearn.metrics.f1_score
. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.
Returns
-
global_step (
int
) - Number of global steps trained -
training_details (
list
) - Average training loss ifevaluate_during_training
is False or full training progress scores ifevaluate_during_training
is True
Note: For more details on training models with Simple Transformers, please refer to the Tips and Tricks section.
Evaluating a MultiModalClassificationModel
The eval_model()
method is used to evaluate the model.
The following metrics will be calculated by default:
mcc
- Matthews correlation coefficienttp
- True positivestn
- True negativesfp
- False positivesfn
- False negativeseval_loss
- Cross Entropy Loss for eval_data
1
result, model_outputs = model.eval_model(eval_data)
simpletransformers.classification.MultiModalClassificationModel.eval_model(self, eval_data, files_list=None, image_path=None, text_label=None, labels_label=None, images_label=None, image_type_extension=None, data_type_extension=None, auto_weights=False, output_dir=None, show_running_loss=True, args=None, eval_data=None, verbose=True, **kwargs)
Evaluates the model using ‘eval_data’
Parameters
-
eval_data - Path to data directory containing text files (JSON) and image files OR a Pandas DataFrame. If a DataFrame is given, it should contain the columns [text, labels, images]. When using a DataFrame, image_path MUST be specified. The image column of the DataFrame should contain the relative path from image_path to the image. E.g: For an image file 1.jpeg located in
"data/train/"
;
image_path = "data/train/"
images = "1.jpeg"
-
files_list (
list
orstr
, optional) - If given, only the files specified in this list will be taken from data directory.files_list
can be a Python list or the path (str
) to a JSON file containing a list of files. -
image_path (
str
, optional) - Must be specified when using DataFrame as input. Path to the directory containing the images. -
text_label (
str
, optional) - Column name to look for instead of the default"text"
. Note that the change will persist even after the method call terminates. That is, theargs
dictionary of the model itself will be modified. -
labels_label (
str
, optional) - Column name to look for instead of the default"labels"
. Note that the change will persist even after the method call terminates. That is, theargs
dictionary of the model itself will be modified. -
images_label (
str
, optional) - Column name to look for instead of the default"images"
. Note that the change will persist even after the method call terminates. That is, theargs
dictionary of the model itself will be modified. -
image_type_extension (
str
, optional) - If given, this will be added to the end of each value in “images”. -
data_type_extension (
str
, optional) - If given, this will be added to the end of each value in “files_list”. -
output_dir (
str
, optional) - The directory where model files will be saved. If not given,self.args['output_dir']
will be used. -
verbose (
bool
, optional) - If verbose, results will be printed to the console on completion of evaluation. -
silent (
bool
, optional) - If silent, tqdm progress bars will be hidden. -
kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section. E.g.
f1=sklearn.metrics.f1_score
. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.
Returns
-
result (
dict
) - Dictionary containing evaluation results. (Matthews correlation coefficient, tp, tn, fp, fn) -
model_outputs (
list
) - List of model outputs for each row in eval_data
Note: For more details on evaluating models with Simple Transformers, please refer to the Tips and Tricks section.
Making Predictions With a MultiModalClassificationModel
The predict()
method is used to make predictions with the model.
simpletransformers.classification.MultiModalClassificationModel.predict(to_predict, image_path, image_type_extension=None)
Performs predictions on a list of text to_predict
.
Parameters
-
to_predict - A python dictionary to be sent to the model for prediction. The dictionary should be of the form
{"text": [<list of sentences>], "images": [<list of images>]}
. -
image_path (
str
) - Path to the directory containing the image or images. -
image_type_extension (
str
, optional) - If given, this will be added to the end of each value in “images”.
Returns
-
preds (
list
) - A python list of the predictions for each text. -
model_outputs (
list
) - A python list of the raw model outputs for each text.