Multi-Modal Classification Specifics
Multi-Modal Classification fuses text and image data. This is performed using multi-modal bitransformer models introduced in the paper Supervised Multimodal Bitransformers for Classifying Images and Text.
Usage Steps
The process of performing Multi-Modal Classification in Simple Transformers does not deviate from the standard pattern.
- Initialize a
Model
- Train the model with
train_model()
- Evaluate the model with
eval_model()
- Make predictions on (unlabelled) data with
predict()
Supported Model Types
Model | Model code for Model |
---|---|
BERT | bert |
Tip: The model code is used to specify the model_type
in a Simple Transformers model.
Label formats
With Multi-Modal Classification, labels are always given as strings. You may specify a list of labels by passing in the list to label_list
argument when creating the model. If label_list
is given, num_labels
is not required.
If label_list
is not given, num_labels
is required and the labels should be strings starting from "0"
up to "<num_labels>"
.