Seq2Seq Model
Seq2SeqModel
The Seq2SeqModel
class is used for Sequence-to-Sequence tasks.
Currently, four main types of Sequence-to-Sequence models are available.
- Encoder-Decoder (Generic)
- MBART (Translation)
- MarianMT (Translation)
- BART (Summarization)
- RAG *(Retrieval Augmented Generation - E,g, Question Answering)
Generic Encoder-Decoder Models
The following rules currently apply to generic Encoder-Decoder models (does not apply to BART and Marian):
- The decoder must be a
bert
model. - The encoder can be one of
[bert, roberta, distilbert, camembert, electra]
. - The encoder and the decoder must be of the same “size”. (E.g.
roberta-base
encoder and abert-base-uncased
decoder)
To create a generic Encoder-Decoder model with Seq2SeqModel
, you must provide the three parameters below.
encoder_type
: The type of model to use as the encoder.encoder_name
: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.-
decoder_name
: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.Note: For a list of standard pre-trained models, see here.
Note: For a list of community models, see here.
Note: There is no
decoder_type
parameter as the decoder must be abert
model.
1
2
3
4
5
6
7
8
9
from simpletransformers.seq2seq import Seq2SeqModel
model = Seq2SeqModel(
"roberta",
"roberta-base",
"bert-base-cased",
)
MarianMT Models
MarianMT models are translation models with support for a huge variety of languages.
The followng information is taken from the Hugging Face docs here.
-
Each model is about 298 MB on disk, there are 1,000+ models.
-
The list of supported language pairs can be found here.
-
The 1,000+ models were originally trained by Jörg Tiedemann using the Marian C++ library, which supports fast training and translation.
-
All models are transformer encoder-decoders with 6 layers in each component. Each model’s performance is documented in a model card.
-
The 80 opus models that require BPE preprocessing are not supported.
To create a MarianMT translation model, you must provide the two parameters below.
encoder_decoder_type
: This should be"marian"
.-
encoder_decoder_name
: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.Note: Please refer to the Naming and Multilingual Models sections of the MarianMT docs on Hugging Face for more information on choosing the
encoder_decoder_name
.
1
2
3
4
5
6
7
8
9
10
from simpletransformers.seq2seq import Seq2SeqModel
# Initialize a Seq2SeqModel for English to German translation
model = Seq2SeqModel(
encoder_decoder_type="marian",
encoder_decoder_name="Helsinki-NLP/opus-mt-en-de",
)
BART Models
encoder_decoder_type
: This should be"bart"
.-
encoder_decoder_name
: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.Note: For a list of standard pre-trained models, see here.
Note: For a list of community models, see here.
MBART Models
encoder_decoder_type
: This should be"mbart"
.-
encoder_decoder_name
: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.Note: For a list of standard pre-trained models, see here.
Note: For a list of community models, see here.
RAG Models
Note: You must have Faiss (GPU or CPU) installed to use RAG Models. Faiss installation instructions can be found here.
encoder_decoder_type
: Either"rag-token"
or"rag-sequence"
.-
encoder_decoder_name
: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.Note: For a list of standard pre-trained models, see here.
Note: For a list of community models, see here.
index_name
(optional): Name of the index to use -hf
for a canonical dataset from the datasets library,custom
for a local index, orlegacy
for the original index. This will default tocustom
(not necessary to specify the parameter) when a local knowledge dataset is used.- knowledge_dataset (optional): Path to a TSV file (two columns -
title
,text
) containing a knowledge dataset for RAG or the path to a directory containing a saved Huggingface dataset for RAG. If this is not given for a RAG model, a dummy dataset will be used. index_path
(optional): Path to the faiss index of the custom knowledge dataset. If this is not given andknowledge_dataset
is given, it will be computed.dpr_ctx_encoder_model_name
(optional): The DPR context encoder model to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files. This is required when using a customknowledge_dataset
.
Configuring a Seq2SeqModel
Seq2SeqModel
has the following task-specific configuration options.
Argument | Type | Default | Description |
---|---|---|---|
base_marian_model_name | str | None | Name of the base Marian model used to load the tokenizer. |
dataset_class | Dataset | None | A custom dataset class to use. (Subclass of Pytorch Dataset) |
do_sample | bool | False | If set to False greedy decoding is used. Otherwise sampling is used. Defaults to False as defined in configuration_utils.PretrainedConfig. |
early_stopping | bool | True | if set to True beam search is stopped when at least num_beams sentences finished per batch. |
evaluate_generated_text | bool | False | Generate sequences for evaluation. |
length_penalty | float | 2.0 | Exponential penalty to the length. Default to 2. |
max_length | int | 20 | The max length of the sequence to be generated. Between 0 and infinity. Default to 20. |
max_steps | int | -1 | Maximum number of training steps. Will override the effect of num_train_epochs. |
num_beams | int | 1 | Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1. |
num_return_sequences | int | 1 | The number of samples to generate. |
rag_embed_batch_size | int | 1 | The batch size used when generating embeddings for RAG models. |
repetition_penalty | float | 1.0 | The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0. |
top_k | float | None | Filter top-k tokens before sampling (<=0: no filtering) |
top_p | float | None | Nucleus filtering (top-p) before sampling (<=0.0: no filtering) |
use_multiprocessed_decoding | bool | False | Use multiprocessing when decoding outputs. Significantly speeds up decoding (CPU intensive). Turn off if multiprocessing causes insatibility. |
save_knowledge_dataset | bool | True | Save the Knowledge Dataset when saving a RAG model |
save_knowledge_dataset_with_checkpoints | bool | False | Save the knowledge dataset when saving a RAG model training checkpoint |
split_text_character | str | ” “ | The character used to split text on when splitting text in a RAG model knowledge dataset |
split_text_n | int | 100 | Split text into a new doc every split_text_n occurences of split_text_character when splitting text in a RAG model knowledge dataset |
src_lang | str | en_XX | Code for the source language. Only relevant to MBART model. |
tgt_lang | str | ro_RO | Code for the target language. Only relevant to MBART model. |
1
2
3
4
5
6
7
8
9
10
11
12
13
from simpletransformers.seq2seq import Seq2SeqModel, Seq2SeqArgs
model_args = Seq2SeqArgs()
model_args.num_train_epochs = 3
model = Seq2SeqModel(
encoder_type,
"roberta-base",
"bert-base-cased",
args=model_args,
)
Note: For configuration options common to all Simple Transformers models, please refer to the Configuring a Simple Transformers Model section.
Class Seq2SeqModel
simpletransformers.seq2seq.Seq2SeqModel(self, encoder_type=None, encoder_name=None, decoder_name=None, encoder_decoder_type=None, encoder_decoder_name=None, config=config, args=None, use_cuda=True, cuda_device=-1, **kwargs,)
Initializes a Seq2SeqModel model.
Parameters
-
encoder_type (
str
, optional) - The type of model to use as the encoder. -
encoder_name (
str
, optional) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files. -
decoder_name (
str
, optional) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files. -
encoder_decoder_type (
str
, optional) - The type of encoder-decoder model. (E.g. bart) - encoder_decoder_name (
str
, optional) - The path to a directory containing the saved encoder and decoder of a Seq2SeqModel. (E.g. “outputs/”) OR a valid BART or MarianMT model. -
config (
dict
, optional) - A configuration file to build an EncoderDecoderModel. See here. -
args (
dict
, optional) - Default args will be used if this parameter is not provided. If provided, it should be a dict containing the args that should be changed in the default args or aSeq2SeqArgs
object. -
use_cuda (
bool
, optional) - Use GPU if available. Setting to False will force model to use CPU only. (See here) -
cuda_device (
int
, optional) - Specific GPU that should be used. Will use the first available GPU by default. (See here) - kwargs (optional) - For providing proxies, force_download, resume_download, cache_dir and other options specific to the ‘from_pretrained’ implementation where this will be supplied. (See here)
Returns
None
Training a Seq2SeqModel
The train_model()
method is used to train the model.
1
model.train_model(train_data)
simpletransformers.seq2seq.Seq2SeqModel.train_model(self, train_data, output_dir=None, show_running_loss=True, args=None, eval_data=None, verbose=True, **kwargs)
Trains the model using ‘train_data’
Parameters
- train_data - Pandas DataFrame containing the 2 columns -
input_text
,target_text
.input_text
: The input text sequence.target_text
: The target text sequence.
-
output_dir (
str
, optional) - The directory where model files will be saved. If not given,self.args['output_dir']
will be used. -
show_running_loss (
bool
, optional) - If True, the running loss (training loss at current step) will be logged to the console. -
args (
dict
, optional) - A dict of configuration options for theSeq2SeqModel
. Any changes made will persist for the model. -
eval_data (optional) - Evaluation data (same format as train_data) against which evaluation will be performed when
evaluate_during_training
is enabled. Is required ifevaluate_during_training
is enabled. - kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section.
E.g.
f1=sklearn.metrics.f1_score
. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.
Returns
None
Note: For more details on evaluating Seq2Seq models with custom metrics, please refer to the Evaluating Generated Sequences section.
Note: For more details on training models with Simple Transformers, please refer to the Tips and Tricks section.
Evaluating a Seq2SeqModel
The eval_model()
method is used to evaluate the model.
The following metrics will be calculated by default:
eval_loss
- Model loss over the evaluation data
1
result = model.eval_model(eval_data)
simpletransformers.seq2seq.Seq2SeqModel.eval_model(self, eval_data, output_dir=None, verbose=True, silent=False, **kwargs)
Evaluates the model using ‘eval_data’
Parameters
- eval_data - Pandas DataFrame containing the 2 columns -
input_text
,target_text
.input_text
: The input text sequence.target_text
: The target text sequence.
-
output_dir (
str
, optional) - The directory where model files will be saved. If not given,self.args['output_dir']
will be used. -
verbose (
bool
, optional) - If verbose, results will be printed to the console on completion of evaluation. -
silent (
bool
, optional) - If silent, tqdm progress bars will be hidden. - kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section.
E.g.
f1=sklearn.metrics.f1_score
. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.
Returns
- result (
dict
) - Dictionary containing evaluation results.
Note: For more details on evaluating Seq2Seq models with custom metrics, please refer to the Evaluating Generated Sequences section.
Note: For more details on evaluating models with Simple Transformers, please refer to the Tips and Tricks section.
Making Predictions With a Seq2SeqModel
The predict()
method is used to make predictions with the model.
1
2
3
4
5
to_predict = [
"Tyson is a Cyclops, a son of Poseidon, and Percy Jackson’s half brother. He is the current general of the Cyclopes army."
]
predictions = model.predict(to_predict)
Note: The input must be a List even if there is only one sentence.
simpletransformers.seq2seq.Seq2SeqModel.predict(to_predict)
Performs predictions on a list of text to_predict
.
Parameters
- to_predict - A python list of text (str) to be sent to the model for prediction.
Returns
- preds (
list
) - A python list of the generated sequences.