medcat.utils.helpers

Module Contents

Functions

get_important_config_parameters(config)

cui2tuis(cdb, cui)

tui2name(cdb, tui)

to_json_simple(docs, cdb)

output: [{'text': <text>, 'entities': [<start,end,type>, ]}]

to_json_sumithra(docs, cdb)

output: [

doc2html(doc)

json2html(doc)

prepare_name(cat, name[, version])

Cleans up the name.

get_all_from_name(name, nlp, source_value[, SEP, version])

tkn_inds_from_doc(spacy_doc[, text_inds, source_val])

tkns_from_doc(spacy_doc, start, end)

filter_cdb_by_icd10(cdb)

Filters an existing CDB to only contain concepts that have an associated ICD-10 code.

umls_to_icd10cm(cdb, csv_path)

umls_to_icd10_over_snomed(cdb, pickle_path)

umls_to_icd10_ext(cdb, pickle_path)

umls_to_icd10(cdb, csv_path)

Map UMLS CDB to ICD10 concepts.

umls_to_snomed(cdb, pickle_path)

Map UMLS CDB to SNOMED concepts.

snomed_to_umls(cdb, pickle_path)

Map SNOMED CDB to UMLS concepts.

snomed_to_icd10(cdb, csv_path)

Add map from cui to icd10 for concepts.

snomed_to_desc(cdb, csv_path)

Add descriptions to the concepts.

filter_only_icd10(doc, cat)

add_names_icd10(csv_path, cat)

add_names_icd10cm(cdb, csv_path, cat)

remove_icd10_ranges(cdb)

dep_check_scispacy()

run_cv(cdb_path, data_path, vocab_path[, cv, nepochs, ...])

has_new_spacy()

Figures out whether or not a newer version of spacy is installed.

has_spacy_model(model_name)

Checks if the spacy model is available.

ensure_spacy_model(model_name)

Ensure the specified spacy model exists.

Attributes

logger

medcat.utils.helpers.logger
medcat.utils.helpers.get_important_config_parameters(config)
medcat.utils.helpers.cui2tuis(cdb, cui)
Parameters:
Return type:

Set[str]

medcat.utils.helpers.tui2name(cdb, tui)
Parameters:
Return type:

str

medcat.utils.helpers.to_json_simple(docs, cdb)

output: [{‘text’: <text>, ‘entities’: [<start,end,type>, ]}]

Parameters:
  • docs – The list of documents.

  • cdb (CDB) – The concept database

Returns:

List[Dict] – The output dict.

Return type:

List[Dict]

medcat.utils.helpers.to_json_sumithra(docs, cdb)
output: [

[ text, {‘entities’: [<start,end,type>, ]} ], …]

Parameters:
  • docs – The list of documents.

  • cdb (CDB) – The concept database

Returns:

List[List] – The output list.

Return type:

List[List]

medcat.utils.helpers.doc2html(doc)
medcat.utils.helpers.json2html(doc)
medcat.utils.helpers.prepare_name(cat, name, version='CLEAN')

Cleans up the name.

Parameters:
  • cat – The model pack.

  • name – The name to prepare.

  • version – The version of prepare to use.

Returns:

Tuple[str, List] – The name and tokens

Return type:

Tuple[str, List]

medcat.utils.helpers.get_all_from_name(name, nlp, source_value, SEP='', version='clean')
medcat.utils.helpers.tkn_inds_from_doc(spacy_doc, text_inds=None, source_val=None)
medcat.utils.helpers.tkns_from_doc(spacy_doc, start, end)
medcat.utils.helpers.filter_cdb_by_icd10(cdb)

Filters an existing CDB to only contain concepts that have an associated ICD-10 code. Can be used for snomed orr UMLS CDBs.

Parameters:

cdb (CDB) – The input CDB

Returns:

CDB – The filtered CDB

Return type:

medcat.cdb.CDB

medcat.utils.helpers.umls_to_icd10cm(cdb, csv_path)
medcat.utils.helpers.umls_to_icd10_over_snomed(cdb, pickle_path)
medcat.utils.helpers.umls_to_icd10_ext(cdb, pickle_path)
medcat.utils.helpers.umls_to_icd10(cdb, csv_path)

Map UMLS CDB to ICD10 concepts.

Parameters:
  • cdb (CDB) – The CDB

  • csv_path (str) – The path to the file to load.

medcat.utils.helpers.umls_to_snomed(cdb, pickle_path)

Map UMLS CDB to SNOMED concepts.

Parameters:
  • cdb (CDB) – The CDB

  • pickle_path – The path to the file to load.

medcat.utils.helpers.snomed_to_umls(cdb, pickle_path)

Map SNOMED CDB to UMLS concepts.

Parameters:
  • cdb (CDB) – The concept database.

  • pickle_path (str) – The file path.

medcat.utils.helpers.snomed_to_icd10(cdb, csv_path)

Add map from cui to icd10 for concepts.

Parameters:
  • cdb (CDB) – The concept database.

  • csv_path (str) – The file path.

medcat.utils.helpers.snomed_to_desc(cdb, csv_path)

Add descriptions to the concepts.

Parameters:
  • cdb (CDB) – The concept database.

  • csv_path (str) – The file path.

medcat.utils.helpers.filter_only_icd10(doc, cat)
medcat.utils.helpers.add_names_icd10(csv_path, cat)
medcat.utils.helpers.add_names_icd10cm(cdb, csv_path, cat)
medcat.utils.helpers.remove_icd10_ranges(cdb)
medcat.utils.helpers.dep_check_scispacy()
medcat.utils.helpers.run_cv(cdb_path, data_path, vocab_path, cv=100, nepochs=16, test_size=0.1, lr=1, groups=None, **kwargs)
medcat.utils.helpers.has_new_spacy()

Figures out whether or not a newer version of spacy is installed.

This plays a role in how some parts of the Span needs to be interacted with.

As of writing, the new version starts at v3.3.1.

Returns:

bool – Whether new version was detected.

Return type:

bool

medcat.utils.helpers.has_spacy_model(model_name)

Checks if the spacy model is available.

Parameters:

model_name (str) – The model name.

Returns:

bool – True if the model is available, False otherwise.

Return type:

bool

medcat.utils.helpers.ensure_spacy_model(model_name)

Ensure the specified spacy model exists.

If the model does not currently exist, it will attempt downloading it.

Parameters:

model_name (str) – The spacy model name.

Return type:

None