medcat.utils.normalizers
Module Contents
Classes
Will normalize all tokens in a spacy document. |
Attributes
- medcat.utils.normalizers.CONTAINS_NUMBER
- class medcat.utils.normalizers.BasicSpellChecker(cdb_vocab, config, data_vocab=None)
Bases:
object- __init__(cdb_vocab, config, data_vocab=None)
- P(word)
Probability of word.
- __contains__(word)
- fix(word)
Most probable spelling correction for word.
- candidates(word)
Generate possible spelling corrections for word.
- known(words)
The subset of words that appear in the dictionary of WORDS.
- edits1(word)
All edits that are one edit away from word.
- edits2(word)
All edits that are two edits away from word.
- edits3(word)
All edits that are two edits away from word.
- class medcat.utils.normalizers.TokenNormalizer(config, spell_checker=None)
Bases:
medcat.pipeline.pipe_runner.PipeRunnerWill normalize all tokens in a spacy document.
- Parameters:
config –
spell_checker –
- name = 'token_normalizer'
- __init__(config, spell_checker=None)
- __call__(doc)