medcat.utils.ner.deid
De-identification model.
This describes a wrapper on the regular CAT model. The idea is to simplify the use of a DeId-specific model.
It tackles two use cases 1) Creation of a deid model 2) Loading and use of a deid model
I.e for use case 1:
Instead of: cat = CAT(cdb=ner.cdb, addl_ner=ner)
You can use: deid = DeIdModel.create(ner)
And for use case 2:
Instead of: cat = CAT.load_model_pack(model_pack_path) anon_text = deid_text(cat, text)
You can use: deid = DeIdModel.load_model_pack(model_pack_path) anon_text = deid.deid_text(text)
Or if/when structured output is desired: deid = DeIdModel.load_model_pack(model_pack_path) anon_doc = deid(text) # the spacy document
The wrapper also exposes some CAT parts directly: - config - cdb
Module Contents
Classes
The DeID model. |
Attributes
- medcat.utils.ner.deid.logger
- class medcat.utils.ner.deid.DeIdModel(cat)
Bases:
medcat.utils.ner.model.NerModel
The DeID model.
This wraps a CAT instance and simplifies its use as a de-identification model.
It provies methods for creating one from a TransformersNER as well as loading from a model pack (along with some validation).
It also exposes some useful parts of the CAT it wraps such as the config and the concept database.
- Parameters:
cat (medcat.cat.CAT) –
- __init__(cat)
- Parameters:
cat (medcat.cat.CAT) –
- Return type:
None
- train(json_path, *args, **kwargs)
Train the underlying transformers NER model.
All the extra arguments are passed to the TransformersNER train method.
- Parameters:
json_path (Union[str, list, None]) – The JSON file path to read the training data from.
train_nr (int) – The number of the NER object in cat._addl_train to train. Defaults to 0.
*args – Additional arguments for TransformersNER.train .
**kwargs – Additional keyword arguments for TransformersNER.train .
- Returns:
Tuple[Any, Any, Any] – df, examples, dataset
- Return type:
Tuple[Any, Any, Any]
- deid_text(text, redact=False)
Deidentify text and potentially redact information.
- Parameters:
text (str) – The text to deidentify.
redact (bool) – Whether to redact the information.
- Returns:
str – The deidentified text.
- Return type:
str
- deid_multi_texts(texts, redact=False, addl_info=['cui2icd10', 'cui2ontologies', 'cui2snomed'], n_process=None, batch_size=None)
Deidentify text on multiple branches
- Parameters:
texts (Union[Iterable[str], Iterable[Tuple]]) – Text to be annotated
redact (bool) – Whether to redact the information.
addl_info (List[str], optional) – Additional info. Defaults to [‘cui2icd10’, ‘cui2ontologies’, ‘cui2snomed’].
n_process (Optional[int], optional) – Number of processes. Defaults to None.
batch_size (Optional[int], optional) – The size of a batch. Defaults to None.
- Raises:
ValueError – In case of unsupported input.
- Returns:
List[str] – List of deidentified documents.
- Return type:
List[str]
- classmethod load_model_pack(model_pack_path, config=None)
Load DeId model from model pack.
The method first loads the CAT instance.
It then makes sure that the model pack corresponds to a valid DeId model.
- Parameters:
config (Optional[Dict]) – Config for DeId model pack (primarily for stride of overlap window)
model_pack_path (str) – The model pack path.
- Raises:
ValueError – If the model pack does not correspond to a DeId model.
- Returns:
DeIdModel – The resulting DeI model.
- Return type:
- classmethod _is_deid_model(cat)
- Parameters:
cat (medcat.cat.CAT) –
- Return type:
bool
- classmethod _get_reason_not_deid(cat)
- Parameters:
cat (medcat.cat.CAT) –
- Return type:
str