Text Representation Examples

Minimal example for generating word embeddings

Generate a list of contextual word embeddings for every sentence in a list

1
2
3
4
5
6
7
8
9
10
from simpletransformers.language_representation import RepresentationModel
        
sentences = ["Example sentence 1", "Example sentence 2"]
model = RepresentationModel(
        model_type="bert",
        model_name="bert-base-uncased",
        use_cuda=False
    )
word_vectors = model.encode_sentences(sentences, combine_strategy=None)
assert word_vectors.shape === (2, 5, 768) # token vector for every token in each sentence, bert based models add 2 tokens per sentence by default([CLS] & [SEP])

Minimal example for generating sentence embeddings

Same code as for generating word embeddings, the only difference is that we pass combine_strategy="mean" parameter

1
2
3
4
5
6
7
8
9
from simpletransformers.language_representation import RepresentationModel
sentences = ["Example sentence 1", "Example sentence 2"]
model = RepresentationModel(
        model_type="bert",
        model_name="bert-base-uncased",
        use_cuda=False
    )
word_vectors = model.encode_sentences(sentences, combine_strategy="mean")
assert word_vectors.shape === (2, 768) # one sentence embedding per sentence

Updated: