medcat.preprocessing.cleaners

Text cleaners of various levels, from removing only garbage to pretty much everything that is not a word.

Module Contents

Functions

prepare_name(raw_name, nlp, names, config)

Generates different forms of a name. Will edit the provided names dictionary

basic_clean(text)

Remove almost everything from text

clean_text(text)

Remove almost everything from text

clean_drugs_uk(text[, stopwords, umls])

clean_name(text[, stopwords, umls])

clean_umls(text[, stopwords])

clean_def(text)

clean_snt(text)

clean_snomed_name(text)

Attributes

BR_U4

CB

CB_D

BR

PH_RM

SKIP_CHARS

medcat.preprocessing.cleaners.prepare_name(raw_name, nlp, names, config)

Generates different forms of a name. Will edit the provided names dictionary and add information generated from the name.

Parameters:
  • raw_name (str) – Thre raw name to prepare.

  • nlp (Language) – Spacy nlp model.

  • names (Dict) – Dictionary of existing names for this concept in this row of a CSV. The new generated name versions and other required information will be added here.

  • config (Config) – Global config for medcat.

Returns:

names (Dict) – The new dictionary of prepared names.

Return type:

Dict

medcat.preprocessing.cleaners.basic_clean(text)

Remove almost everything from text

Parameters:

text (str) – Text to be cleaned.

Returns:

str – The cleaned text.

Return type:

str

medcat.preprocessing.cleaners.clean_text(text)

Remove almost everything from text

Parameters:

text (str) – Text to be cleaned.

Returns:

str – The cleaned text.

Return type:

str

medcat.preprocessing.cleaners.BR_U4
medcat.preprocessing.cleaners.CB
medcat.preprocessing.cleaners.CB_D
medcat.preprocessing.cleaners.BR
medcat.preprocessing.cleaners.PH_RM
medcat.preprocessing.cleaners.SKIP_CHARS
medcat.preprocessing.cleaners.clean_drugs_uk(text, stopwords=None, umls=False)
Parameters:
  • text (str) –

  • stopwords (Optional[List[str]]) –

  • umls (bool) –

Return type:

str

medcat.preprocessing.cleaners.clean_name(text, stopwords=None, umls=False)
Parameters:
  • text (str) –

  • stopwords (Optional[List[str]]) –

  • umls (bool) –

Return type:

str

medcat.preprocessing.cleaners.clean_umls(text, stopwords=None)
Parameters:
  • text (str) –

  • stopwords (Optional[List[str]]) –

Return type:

str

medcat.preprocessing.cleaners.clean_def(text)
Parameters:

text (str) –

Return type:

str

medcat.preprocessing.cleaners.clean_snt(text)
Parameters:

text (str) –

Return type:

str

medcat.preprocessing.cleaners.clean_snomed_name(text)
Parameters:

text (str) –

Return type:

str