Named Entity Recognition Specifics

The goal of Named Entity Recognition is to locate and classify named entities in a sequence. The named entities are pre-defined categories chosen according to the use case such as names of people, organizations, places, codes, time notations, monetary values, etc. Essentially, NER aims to assign a class to each token (usually a single word) in a sequence. Because of this, NER is also referred to as token classification.

Usage Steps

The process of performing Named Entity Recognition in Simple Transformers does not deviate from the standard pattern.

  1. Initialize a NERModel
  2. Train the model with train_model()
  3. Evaluate the model with eval_model()
  4. Make predictions on (unlabelled) data with predict()

Supported Model Types

New model types are regularly added to the library. Named Entity Recognition tasks currently supports the model types given below.

Model Model code for NERModel
ALBERT albert
BERT bert
BERTweet bertweet
CamemBERT camembert
DistilBERT distilbert
ELECTRA electra
LayoutLM layoutlm
Longformer longformer
MPNet mpnet
MobileBERT mobilebert
RoBERTa roberta
SqueezeBert squeezebert
XLM-RoBERTa xlmroberta
XLNet xlnet

Tip: The model code is used to specify the model_type in a Simple Transformers model.

Custom Labels

The default list of labels used in the NERModel is from the CoNLL dataset which uses the following tags/labels.

["O", "B-MISC", "I-MISC", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC"]

However, named entity recognition is a very versatile task and has many different applications. It is highly likely that you will wish to define and use your own token tags/labels.

This can be done by passing in your list of labels when creating the NERModel to the labels parameter.

custom_labels = ["O", "B-SPELL", "I-SPELL", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-PLACE", "I-PLACE"]

model = NERModel(
    "bert", "bert-cased-base", labels=custom_labels

Prediction Caveats

By default, NERModel will split input sequences to the predict() method on spaces and assign a NER tag to each “word” of the split sequence. This might not be desirable in some languages (e.g. Chinese). To avoid this, you can specify split_on_spaces=False when calling the NERModel.predict() method. In this case, you must provide a list of lists as the to_predict input to the predict() method. The inner list will be the list of split “words” belonging to a single sequence and the outer list is the list of all sequences.

Lazy Loading Data

The system memory required to keep a large dataset in memory can be prohibitively large. In such cases, the data can be lazy loaded from disk to minimize memory consumption.

To enable lazy loading, you must set the lazy_loading flag to True in NERArgs.

model_args = NERArgs()
model_args.lazy_loading = True

Note: The data must be input as a path to a file in the CoNLL format to use lazy loading. See here for the correct format.

Note: This will typically be slower as the feature conversion is done on the fly. However, the tradeoff between speed and memory consumption should be reasonable.

Tip: See Configuring a NER model for information on configuring the model to read the lazy loading data file correctly.