medcat.utils.normalizers

Module Contents

Classes

BasicSpellChecker

TokenNormalizer

Will normalize all tokens in a spacy document.

Attributes

CONTAINS_NUMBER

medcat.utils.normalizers.CONTAINS_NUMBER
class medcat.utils.normalizers.BasicSpellChecker(cdb_vocab, config, data_vocab=None)

Bases: object

__init__(cdb_vocab, config, data_vocab=None)
P(word)

Probability of word.

__contains__(word)
fix(word)

Most probable spelling correction for word.

candidates(word)

Generate possible spelling corrections for word.

known(words)

The subset of words that appear in the dictionary of WORDS.

edits1(word)

All edits that are one edit away from word.

edits2(word)

All edits that are two edits away from word.

edits3(word)

All edits that are two edits away from word.

class medcat.utils.normalizers.TokenNormalizer(config, spell_checker=None)

Bases: medcat.pipeline.pipe_runner.PipeRunner

Will normalize all tokens in a spacy document.

Parameters:
  • config

  • spell_checker

name = 'token_normalizer'
__init__(config, spell_checker=None)
__call__(doc)