medcat.utils.ner.deid

De-identification model.

This describes a wrapper on the regular CAT model. The idea is to simplify the use of a DeId-specific model.

It tackles two use cases 1) Creation of a deid model 2) Loading and use of a deid model

I.e for use case 1:

Instead of: cat = CAT(cdb=ner.cdb, addl_ner=ner)

You can use: deid = DeIdModel.create(ner)

And for use case 2:

Instead of: cat = CAT.load_model_pack(model_pack_path) anon_text = deid_text(cat, text)

You can use: deid = DeIdModel.load_model_pack(model_pack_path) anon_text = deid.deid_text(text)

Or if/when structured output is desired: deid = DeIdModel.load_model_pack(model_pack_path) anon_doc = deid(text) # the spacy document

The wrapper also exposes some CAT parts directly: - config - cdb

Module Contents

Classes

DeIdModel

The DeID model.

Attributes

logger

medcat.utils.ner.deid.logger
class medcat.utils.ner.deid.DeIdModel(cat)

Bases: medcat.utils.ner.model.NerModel

The DeID model.

This wraps a CAT instance and simplifies its use as a de-identification model.

It provies methods for creating one from a TransformersNER as well as loading from a model pack (along with some validation).

It also exposes some useful parts of the CAT it wraps such as the config and the concept database.

Parameters:

cat (medcat.cat.CAT) –

__init__(cat)
Parameters:

cat (medcat.cat.CAT) –

Return type:

None

train(json_path, *args, **kwargs)

Train the underlying transformers NER model.

All the extra arguments are passed to the TransformersNER train method.

Parameters:
  • json_path (Union[str, list, None]) – The JSON file path to read the training data from.

  • train_nr (int) – The number of the NER object in cat._addl_train to train. Defaults to 0.

  • *args – Additional arguments for TransformersNER.train .

  • **kwargs – Additional keyword arguments for TransformersNER.train .

Returns:

Tuple[Any, Any, Any] – df, examples, dataset

Return type:

Tuple[Any, Any, Any]

deid_text(text, redact=False)

Deidentify text and potentially redact information.

Parameters:
  • text (str) – The text to deidentify.

  • redact (bool) – Whether to redact the information.

Returns:

str – The deidentified text.

Return type:

str

deid_multi_texts(texts, redact=False, addl_info=['cui2icd10', 'cui2ontologies', 'cui2snomed'], n_process=None, batch_size=None)

Deidentify text on multiple branches

Parameters:
  • texts (Union[Iterable[str], Iterable[Tuple]]) – Text to be annotated

  • redact (bool) – Whether to redact the information.

  • addl_info (List[str], optional) – Additional info. Defaults to [‘cui2icd10’, ‘cui2ontologies’, ‘cui2snomed’].

  • n_process (Optional[int], optional) – Number of processes. Defaults to None.

  • batch_size (Optional[int], optional) – The size of a batch. Defaults to None.

Raises:

ValueError – In case of unsupported input.

Returns:

List[str] – List of deidentified documents.

Return type:

List[str]

classmethod load_model_pack(model_pack_path, config=None)

Load DeId model from model pack.

The method first loads the CAT instance.

It then makes sure that the model pack corresponds to a valid DeId model.

Parameters:
  • config (Optional[Dict]) – Config for DeId model pack (primarily for stride of overlap window)

  • model_pack_path (str) – The model pack path.

Raises:

ValueError – If the model pack does not correspond to a DeId model.

Returns:

DeIdModel – The resulting DeI model.

Return type:

DeIdModel

classmethod _is_deid_model(cat)
Parameters:

cat (medcat.cat.CAT) –

Return type:

bool

classmethod _get_reason_not_deid(cat)
Parameters:

cat (medcat.cat.CAT) –

Return type:

str