medcat.ner.transformers_ner

Module Contents

Classes

TransformersNER

TODO: Add documentation

Attributes

logger

medcat.ner.transformers_ner.logger
class medcat.ner.transformers_ner.TransformersNER(cdb, config=None, training_arguments=None)

Bases: object

TODO: Add documentation

Parameters:

config (Optional[medcat.config_transformers_ner.ConfigTransformersNER]) –

name = 'transformers_ner'
__init__(cdb, config=None, training_arguments=None)
Parameters:

config (Optional[medcat.config_transformers_ner.ConfigTransformersNER]) –

Return type:

None

create_eval_pipeline()
get_hash()

A partial hash trying to catch differences between models.

Returns:

str – The hex hash.

Return type:

str

_prepare_dataset(json_path, ignore_extra_labels, meta_requirements, file_name='data.json')
train(json_path=None, ignore_extra_labels=False, dataset=None, meta_requirements=None, trainer_callbacks=None)

Train or continue training a model give a json_path containing a MedCATtrainer export. It will continue training if an existing model is loaded or start new training if the model is blank/new.

Parameters:
  • json_path (str or list) – Path/Paths to a MedCATtrainer export containing the meta_annotations we want to train for.

  • ignore_extra_labels – Makes only sense when an existing deid model was loaded and from the new data we want to ignore labels that did not exist in the old model.

  • dataset – Defaults to None.

  • meta_requirements – Defaults to None

  • trainer_callbacks (List[TrainerCallback]) – A list of trainer callbacks for collecting metrics during the training at the client side. The transformers Trainer object will be passed in when each callback is called.

Returns:

Tuple – The dataframe, examples, and the dataset

Return type:

Tuple

eval(json_path=None, dataset=None, ignore_extra_labels=False, meta_requirements=None)
Parameters:

json_path (Union[str, list, None]) –

save(save_dir_path)

Save all components of this class to a file

Parameters:

save_dir_path (str) – Path to the directory where everything will be saved.

Return type:

None

classmethod load(save_dir_path, config_dict=None)

Load a meta_cat object.

Parameters:
  • save_dir_path (str) – The directory where all was saved.

  • config_dict (dict) – This can be used to overwrite saved parameters for this meta_cat instance. Why? It is needed in certain cases where we autodeploy stuff.

Returns:

meta_cat (medcat.MetaCAT) – You don’t say

Return type:

TransformersNER

static batch_generator(stream, batch_size_chars)
Parameters:
  • stream (Iterable[spacy.tokens.Doc]) –

  • batch_size_chars (int) –

Return type:

Iterable[List[spacy.tokens.Doc]]

pipe(stream, *args, **kwargs)

Process many documents at once.

Parameters:
  • stream (Iterable[spacy.tokens.Doc]) – List of spacy documents.

  • *args – Extra arguments (not used here).

  • **kwargs – Extra keyword arguments (not used here).

Yields:

Doc – The same document.

Returns:

Iterator[Doc] – If the stream is None or empty.

Return type:

Iterator[spacy.tokens.Doc]

_process(stream, batch_size_chars)
Parameters:
  • stream (Iterable[Union[spacy.tokens.Doc, None]]) –

  • batch_size_chars (int) –

Return type:

Iterator[Optional[spacy.tokens.Doc]]

__call__(doc)

Process one document, used in the spacy pipeline for sequential document processing.

Parameters:

doc (Doc) – A spacy document

Returns:

Doc – The same spacy document.

Return type:

spacy.tokens.Doc