:py:mod:`medcat.utils.normalizers` ================================== .. py:module:: medcat.utils.normalizers Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.normalizers.BasicSpellChecker medcat.utils.normalizers.TokenNormalizer Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.normalizers.CONTAINS_NUMBER .. py:data:: CONTAINS_NUMBER .. py:class:: BasicSpellChecker(cdb_vocab, config, data_vocab=None) Bases: :py:obj:`object` .. py:method:: __init__(cdb_vocab, config, data_vocab=None) .. py:method:: P(word) Probability of `word`. :param word: The word in question. :type word: str :Returns: **float** -- The probability. .. py:method:: __contains__(word) .. py:method:: fix(word) Most probable spelling correction for word. :param word: The word. :type word: str :Returns: **Optional[str]** -- Fixed word, or None if no fixes were applied. .. py:method:: candidates(word) Generate possible spelling corrections for word. :param word: The word. :type word: str :Returns: **Iterable[str]** -- The list of candidate words. .. py:method:: known(words) The subset of `words` that appear in the dictionary of WORDS. :param words: The words. :type words: Iterable[str] :Returns: **Set[str]** -- The set of candidates. .. py:method:: edits1(word) All edits that are one edit away from `word`. :param word: The word. :type word: str :Returns: **Set[str]** -- The set of all edits .. py:method:: edits2(word) All edits that are two edits away from `word`. :param word: The word to start from. :type word: str :Returns: **Iterator[str]** -- All 2-away edits. .. py:method:: edits3(word) All edits that are two edits away from `word`. .. py:class:: TokenNormalizer(config, spell_checker=None) Bases: :py:obj:`medcat.pipeline.pipe_runner.PipeRunner` Will normalize all tokens in a spacy document. :param config: :param spell_checker: .. py:attribute:: name :value: 'token_normalizer' .. py:method:: __init__(config, spell_checker=None) .. py:method:: __call__(doc)