medcat.utils.normalizers
Module Contents
Classes
Will normalize all tokens in a spacy document. |
Attributes
- medcat.utils.normalizers.CONTAINS_NUMBER
- class medcat.utils.normalizers.BasicSpellChecker(cdb_vocab, config, data_vocab=None)
Bases:
object
- __init__(cdb_vocab, config, data_vocab=None)
- P(word)
Probability of word.
- Parameters:
word (str) – The word in question.
- Returns:
float – The probability.
- Return type:
float
- __contains__(word)
- fix(word)
Most probable spelling correction for word.
- Parameters:
word (str) – The word.
- Returns:
Optional[str] – Fixed word, or None if no fixes were applied.
- Return type:
Optional[str]
- candidates(word)
Generate possible spelling corrections for word.
- Parameters:
word (str) – The word.
- Returns:
Iterable[str] – The list of candidate words.
- Return type:
Iterable[str]
- known(words)
The subset of words that appear in the dictionary of WORDS.
- Parameters:
words (Iterable[str]) – The words.
- Returns:
Set[str] – The set of candidates.
- Return type:
Set[str]
- edits1(word)
All edits that are one edit away from word.
- Parameters:
word (str) – The word.
- Returns:
Set[str] – The set of all edits
- Return type:
Set[str]
- edits2(word)
All edits that are two edits away from word.
- Parameters:
word (str) – The word to start from.
- Returns:
Iterator[str] – All 2-away edits.
- Return type:
Iterator[str]
- edits3(word)
All edits that are two edits away from word.
- class medcat.utils.normalizers.TokenNormalizer(config, spell_checker=None)
Bases:
medcat.pipeline.pipe_runner.PipeRunner
Will normalize all tokens in a spacy document.
- Parameters:
config –
spell_checker –
- name = 'token_normalizer'
- __init__(config, spell_checker=None)
- __call__(doc)