Text Representation Examples
Minimal example for generating word embeddings
Generate a list of contextual word embeddings for every sentence in a list
1
2
3
4
5
6
7
8
9
10
from simpletransformers.language_representation import RepresentationModel
sentences = ["Example sentence 1", "Example sentence 2"]
model = RepresentationModel(
model_type="bert",
model_name="bert-base-uncased",
use_cuda=False
)
word_vectors = model.encode_sentences(sentences, combine_strategy=None)
assert word_vectors.shape === (2, 5, 768) # token vector for every token in each sentence, bert based models add 2 tokens per sentence by default([CLS] & [SEP])
Minimal example for generating sentence embeddings
Same code as for generating word embeddings, the only difference is that we pass combine_strategy="mean"
parameter
1
2
3
4
5
6
7
8
9
from simpletransformers.language_representation import RepresentationModel
sentences = ["Example sentence 1", "Example sentence 2"]
model = RepresentationModel(
model_type="bert",
model_name="bert-base-uncased",
use_cuda=False
)
word_vectors = model.encode_sentences(sentences, combine_strategy="mean")
assert word_vectors.shape === (2, 768) # one sentence embedding per sentence