Multi-Modal Classification Specifics

Multi-Modal Classification fuses text and image data. This is performed using multi-modal bitransformer models introduced in the paper Supervised Multimodal Bitransformers for Classifying Images and Text.

Usage Steps

The process of performing Multi-Modal Classification in Simple Transformers does not deviate from the standard pattern.

Initialize a Model
Train the model with train_model()
Evaluate the model with eval_model()
Make predictions on (unlabelled) data with predict()

Supported Model Types

Model	Model code for `Model`
BERT	bert

Tip: The model code is used to specify the model_type in a Simple Transformers model.

Label formats

With Multi-Modal Classification, labels are always given as strings. You may specify a list of labels by passing in the list to label_list argument when creating the model. If label_list is given, num_labels is not required.

If label_list is not given, num_labels is required and the labels should be strings starting from "0" up to "<num_labels>".