medcat.utils.ner.helpers

Module Contents

Functions

_deid_text(cat, text[, redact])

De-identify text.

replace_entities_in_text(text, entities, get_cui_name)

deid_text(*args, **kwargs)

make_or_update_cdb(json_path[, cdb, min_count])

Creates a new CDB or updates an existing one with new

medcat.utils.ner.helpers._deid_text(cat, text, redact=False)

De-identify text.

De-identified text. If redaction is enabled, identifiable entities will be replaced with starts (e.g *****). Otherwise, the replacement will be the CUI or in other words, the type of information that was hidden (e.g [PATIENT]).

Parameters:
  • cat (CAT) – The CAT object to use for deid.

  • text (str) – The input document.

  • redact (bool) – Whether to redact. Defaults to False.

Returns:

str – The de-identified document.

Return type:

str

medcat.utils.ner.helpers.replace_entities_in_text(text, entities, get_cui_name, redact=False)
Parameters:
  • text (str) –

  • entities (Dict) –

  • get_cui_name (Callable[[str], str]) –

  • redact (bool) –

Return type:

str

medcat.utils.ner.helpers.deid_text(*args, **kwargs)
Return type:

str

medcat.utils.ner.helpers.make_or_update_cdb(json_path, cdb=None, min_count=0)

Creates a new CDB or updates an existing one with new concepts if the cdb argument is provided. All concepts that are less frequent than min_count will be ignored.

Parameters:
  • json_path (str) – The json path

  • cdb (Optional[CDB]) – The CDB if present. Defaults to None.

  • min_count (int) – Minimum count to include. Defaults to 0.

Returns:

CDB – The same or new CDB.

Return type:

medcat.cdb.CDB