`medcat.utils.normalizers`

Module Contents

Classes

`BasicSpellChecker`
`TokenNormalizer`	Will normalize all tokens in a spacy document.

Attributes

CONTAINS_NUMBER

medcat.utils.normalizers.CONTAINS_NUMBER

class medcat.utils.normalizers.BasicSpellChecker(cdb_vocab, config, data_vocab=None)

Bases: object

__init__(cdb_vocab, config, data_vocab=None)

P(word): Probability of word.

__contains__(word)

fix(word): Most probable spelling correction for word.

candidates(word): Generate possible spelling corrections for word.

known(words): The subset of words that appear in the dictionary of WORDS.

edits1(word): All edits that are one edit away from word.

edits2(word): All edits that are two edits away from word.

edits3(word): All edits that are two edits away from word.

class medcat.utils.normalizers.TokenNormalizer(config, spell_checker=None)

Bases: medcat.pipeline.pipe_runner.PipeRunner

Will normalize all tokens in a spacy document.

Parameters:

config –
spell_checker –

name = 'token_normalizer'

__init__(config, spell_checker=None)

__call__(doc)