Seq2Seq Model

Seq2SeqModel

The Seq2SeqModel class is used for Sequence-to-Sequence tasks.

Currently, four main types of Sequence-to-Sequence models are available.

  • Encoder-Decoder (Generic)
  • MBART (Translation)
  • MarianMT (Translation)
  • BART (Summarization)
  • RAG *(Retrieval Augmented Generation - E,g, Question Answering)

Generic Encoder-Decoder Models

The following rules currently apply to generic Encoder-Decoder models (does not apply to BART and Marian):

  • The decoder must be a bert model.
  • The encoder can be one of [bert, roberta, distilbert, camembert, electra].
  • The encoder and the decoder must be of the same “size”. (E.g. roberta-base encoder and a bert-base-uncased decoder)

To create a generic Encoder-Decoder model with Seq2SeqModel, you must provide the three parameters below.

  • encoder_type: The type of model to use as the encoder.
  • encoder_name: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.
  • decoder_name: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

    Note: For a list of standard pre-trained models, see here.

    Note: For a list of community models, see here.

    Note: There is no decoder_type parameter as the decoder must be a bert model.

1
2
3
4
5
6
7
8
9
from simpletransformers.seq2seq import Seq2SeqModel


model = Seq2SeqModel(
    "roberta",
    "roberta-base",
    "bert-base-cased",
)

MarianMT Models

MarianMT models are translation models with support for a huge variety of languages.

The followng information is taken from the Hugging Face docs here.

  • Each model is about 298 MB on disk, there are 1,000+ models.

  • The list of supported language pairs can be found here.

  • The 1,000+ models were originally trained by Jörg Tiedemann using the Marian C++ library, which supports fast training and translation.

  • All models are transformer encoder-decoders with 6 layers in each component. Each model’s performance is documented in a model card.

  • The 80 opus models that require BPE preprocessing are not supported.

To create a MarianMT translation model, you must provide the two parameters below.

  • encoder_decoder_type: This should be "marian".
  • encoder_decoder_name: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

    Note: Please refer to the Naming and Multilingual Models sections of the MarianMT docs on Hugging Face for more information on choosing the encoder_decoder_name.

1
2
3
4
5
6
7
8
9
10
from simpletransformers.seq2seq import Seq2SeqModel


# Initialize a Seq2SeqModel for English to German translation
model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="Helsinki-NLP/opus-mt-en-de",
)


BART Models

  • encoder_decoder_type: This should be "bart".
  • encoder_decoder_name: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

    Note: For a list of standard pre-trained models, see here.

    Note: For a list of community models, see here.

MBART Models

  • encoder_decoder_type: This should be "mbart".
  • encoder_decoder_name: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

    Note: For a list of standard pre-trained models, see here.

    Note: For a list of community models, see here.

RAG Models

Note: You must have Faiss (GPU or CPU) installed to use RAG Models. Faiss installation instructions can be found here.

  • encoder_decoder_type: Either "rag-token" or "rag-sequence".
  • encoder_decoder_name: The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

    Note: For a list of standard pre-trained models, see here.

    Note: For a list of community models, see here.

  • index_name (optional): Name of the index to use - hf for a canonical dataset from the datasets library, custom for a local index, or legacy for the original index. This will default to custom (not necessary to specify the parameter) when a local knowledge dataset is used.
  • knowledge_dataset (optional): Path to a TSV file (two columns - title, text) containing a knowledge dataset for RAG or the path to a directory containing a saved Huggingface dataset for RAG. If this is not given for a RAG model, a dummy dataset will be used.
  • index_path (optional): Path to the faiss index of the custom knowledge dataset. If this is not given and knowledge_dataset is given, it will be computed.
  • dpr_ctx_encoder_model_name (optional): The DPR context encoder model to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files. This is required when using a custom knowledge_dataset.

Configuring a Seq2SeqModel

Seq2SeqModel has the following task-specific configuration options.

Argument Type Default Description
base_marian_model_name str None Name of the base Marian model used to load the tokenizer.
dataset_class Dataset None A custom dataset class to use. (Subclass of Pytorch Dataset)
do_sample bool False If set to False greedy decoding is used. Otherwise sampling is used. Defaults to False as defined in configuration_utils.PretrainedConfig.
early_stopping bool True if set to True beam search is stopped when at least num_beams sentences finished per batch.
evaluate_generated_text bool False Generate sequences for evaluation.
length_penalty float 2.0 Exponential penalty to the length. Default to 2.
max_length int 20 The max length of the sequence to be generated. Between 0 and infinity. Default to 20.
max_steps int -1 Maximum number of training steps. Will override the effect of num_train_epochs.
num_beams int 1 Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1.
num_return_sequences int 1 The number of samples to generate.
rag_embed_batch_size int 1 The batch size used when generating embeddings for RAG models.
repetition_penalty float 1.0 The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0.
top_k float None Filter top-k tokens before sampling (<=0: no filtering)
top_p float None Nucleus filtering (top-p) before sampling (<=0.0: no filtering)
use_multiprocessed_decoding bool False Use multiprocessing when decoding outputs. Significantly speeds up decoding (CPU intensive). Turn off if multiprocessing causes insatibility.
save_knowledge_dataset bool True Save the Knowledge Dataset when saving a RAG model
save_knowledge_dataset_with_checkpoints bool False Save the knowledge dataset when saving a RAG model training checkpoint
split_text_character str ” “ The character used to split text on when splitting text in a RAG model knowledge dataset
split_text_n int 100 Split text into a new doc every split_text_n occurences of split_text_character when splitting text in a RAG model knowledge dataset
src_lang str en_XX Code for the source language. Only relevant to MBART model.
tgt_lang str ro_RO Code for the target language. Only relevant to MBART model.
1
2
3
4
5
6
7
8
9
10
11
12
13
from simpletransformers.seq2seq import Seq2SeqModel, Seq2SeqArgs


model_args = Seq2SeqArgs()
model_args.num_train_epochs = 3

model = Seq2SeqModel(
    encoder_type,
    "roberta-base",
    "bert-base-cased",
    args=model_args,
)

Note: For configuration options common to all Simple Transformers models, please refer to the Configuring a Simple Transformers Model section.

Class Seq2SeqModel

simpletransformers.seq2seq.Seq2SeqModel(self, encoder_type=None, encoder_name=None, decoder_name=None, encoder_decoder_type=None, encoder_decoder_name=None, config=config, args=None, use_cuda=True, cuda_device=-1, **kwargs,)

Initializes a Seq2SeqModel model.

Parameters

  • encoder_type (str, optional) - The type of model to use as the encoder.

  • encoder_name (str, optional) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

  • decoder_name (str, optional) - The exact architecture and trained weights to use. This may be a Hugging Face Transformers compatible pre-trained model, a community model, or the path to a directory containing model files.

  • encoder_decoder_type (str, optional) - The type of encoder-decoder model. (E.g. bart)

  • encoder_decoder_name (str, optional) - The path to a directory containing the saved encoder and decoder of a Seq2SeqModel. (E.g. “outputs/”) OR a valid BART or MarianMT model.
  • config (dict, optional) - A configuration file to build an EncoderDecoderModel. See here.

  • args (dict, optional) - Default args will be used if this parameter is not provided. If provided, it should be a dict containing the args that should be changed in the default args or a Seq2SeqArgs object.

  • use_cuda (bool, optional) - Use GPU if available. Setting to False will force model to use CPU only. (See here)

  • cuda_device (int, optional) - Specific GPU that should be used. Will use the first available GPU by default. (See here)

  • kwargs (optional) - For providing proxies, force_download, resume_download, cache_dir and other options specific to the ‘from_pretrained’ implementation where this will be supplied. (See here)

Returns

  • None

Training a Seq2SeqModel

The train_model() method is used to train the model.

1
model.train_model(train_data)

simpletransformers.seq2seq.Seq2SeqModel.train_model(self, train_data, output_dir=None, show_running_loss=True, args=None, eval_data=None, verbose=True, **kwargs)

Trains the model using ‘train_data’

Parameters

  • train_data - Pandas DataFrame containing the 2 columns - input_text, target_text.
    • input_text: The input text sequence.
    • target_text: The target text sequence.
  • output_dir (str, optional) - The directory where model files will be saved. If not given, self.args['output_dir'] will be used.

  • show_running_loss (bool, optional) - If True, the running loss (training loss at current step) will be logged to the console.

  • args (dict, optional) - A dict of configuration options for the Seq2SeqModel. Any changes made will persist for the model.

  • eval_data (optional) - Evaluation data (same format as train_data) against which evaluation will be performed when evaluate_during_training is enabled. Is required if evaluate_during_training is enabled.

  • kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section. E.g. f1=sklearn.metrics.f1_score. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.

Returns

  • None

Note: For more details on evaluating Seq2Seq models with custom metrics, please refer to the Evaluating Generated Sequences section.

Note: For more details on training models with Simple Transformers, please refer to the Tips and Tricks section.

Evaluating a Seq2SeqModel

The eval_model() method is used to evaluate the model.

The following metrics will be calculated by default:

  • eval_loss - Model loss over the evaluation data
1
result = model.eval_model(eval_data)

simpletransformers.seq2seq.Seq2SeqModel.eval_model(self, eval_data, output_dir=None, verbose=True, silent=False, **kwargs)

Evaluates the model using ‘eval_data’

Parameters

  • eval_data - Pandas DataFrame containing the 2 columns - input_text, target_text.
    • input_text: The input text sequence.
    • target_text: The target text sequence.
  • output_dir (str, optional) - The directory where model files will be saved. If not given, self.args['output_dir'] will be used.

  • verbose (bool, optional) - If verbose, results will be printed to the console on completion of evaluation.

  • silent (bool, optional) - If silent, tqdm progress bars will be hidden.

  • kwargs (optional) - Additional metrics that should be calculated. Pass in the metrics as keyword arguments (name of metric: function to calculate metric). Refer to the additional metrics section. E.g. f1=sklearn.metrics.f1_score. A metric function should take in two parameters. The first parameter will be the true labels, and the second parameter will be the predictions.

Returns

  • result (dict) - Dictionary containing evaluation results.

Note: For more details on evaluating Seq2Seq models with custom metrics, please refer to the Evaluating Generated Sequences section.

Note: For more details on evaluating models with Simple Transformers, please refer to the Tips and Tricks section.

Making Predictions With a Seq2SeqModel

The predict() method is used to make predictions with the model.

1
2
3
4
5
to_predict = [
    "Tyson is a Cyclops, a son of Poseidon, and Percy Jackson’s half brother. He is the current general of the Cyclopes army."
]

predictions = model.predict(to_predict)

Note: The input must be a List even if there is only one sentence.

simpletransformers.seq2seq.Seq2SeqModel.predict(to_predict)

Performs predictions on a list of text to_predict.

Parameters

  • to_predict - A python list of text (str) to be sent to the model for prediction.

Returns

  • preds (list) - A python list of the generated sequences.

Updated: