`medcat.utils.normalizers`

Module Contents

Classes

`BasicSpellChecker`
`TokenNormalizer`	Will normalize all tokens in a spacy document.

Attributes

CONTAINS_NUMBER

medcat.utils.normalizers.CONTAINS_NUMBER

class medcat.utils.normalizers.BasicSpellChecker(cdb_vocab, config, data_vocab=None)

Bases: object

__init__(cdb_vocab, config, data_vocab=None)

P(word)

Probability of word.

Parameters:: word (str) – The word in question.
Returns:: float – The probability.
Return type:: float

__contains__(word)

fix(word)

Most probable spelling correction for word.

Parameters:: word (str) – The word.
Returns:: Optional[str] – Fixed word, or None if no fixes were applied.
Return type:: Optional[str]

candidates(word)

Generate possible spelling corrections for word.

Parameters:: word (str) – The word.
Returns:: Iterable[str] – The list of candidate words.
Return type:: Iterable[str]

known(words)

The subset of words that appear in the dictionary of WORDS.

Parameters:: words (Iterable[str]) – The words.
Returns:: Set[str] – The set of candidates.
Return type:: Set[str]

edits1(word)

All edits that are one edit away from word.

Parameters:: word (str) – The word.
Returns:: Set[str] – The set of all edits
Return type:: Set[str]

edits2(word)

All edits that are two edits away from word.

Parameters:: word (str) – The word to start from.
Returns:: Iterator[str] – All 2-away edits.
Return type:: Iterator[str]

edits3(word): All edits that are two edits away from word.

class medcat.utils.normalizers.TokenNormalizer(config, spell_checker=None)

Bases: medcat.pipeline.pipe_runner.PipeRunner

Will normalize all tokens in a spacy document.

Parameters:

config –
spell_checker –

name = 'token_normalizer'

__init__(config, spell_checker=None)

__call__(doc)

Read the Docs v: latest

Versions: latest; stable

Downloads

On Read the Docs: Project Home; Builds