`medcat.pipe`

Module Contents

Classes

Pipe

A wrapper around the standard spacy pipeline.

Attributes

`logger`
`DEFAULT_SPACY_MODEL`

medcat.pipe.logger

medcat.pipe.DEFAULT_SPACY_MODEL = 'en_core_web_md'

class medcat.pipe.Pipe(tokenizer, config)

Bases: object

A wrapper around the standard spacy pipeline.

Parameters:

tokenizer (spacy.tokenizer.Tokenizer) – What will be used to split text into tokens, can be anything built as a spacy tokenizer.
config (medcat.config.Config) – Global config for medcat.

Properties:

nlp (spacy.language.<lng>):: The base spacy NLP pipeline.

property spacy_nlp: spacy.language.Language

The spaCy Language object.

Return type:: spacy.language.Language

__init__(tokenizer, config)

Parameters:

tokenizer (spacy.tokenizer.Tokenizer) –
config (medcat.config.Config) –

Return type:

None

_init_nlp(config)

Parameters:: config (medcat.config.Config) –
Return type:: spacy.language.Language

add_tagger(tagger, name=None, additional_fields=[])

Add any kind of a tagger for tokens.

Parameters:

tagger (Callable) – Any object/function that takes a spacy doc as an input, does something and returns the same doc.
name (Optional[str], optional) – Name for this component in the pipeline. (Default value = None)
additional_fields (List[str], optional) – Fields to be added to the _ properties of a token. (Default value = [])

Return type:

None

add_token_normalizer(config, name=None, spell_checker=None)

Parameters:

config (medcat.config.Config) –
name (Optional[str]) –
spell_checker (Optional[medcat.utils.normalizers.BasicSpellChecker]) –

Return type:

None

add_ner(ner, name=None)

Add NER from CAT to the pipeline, will also add the necessary fields to the document and Span objects.

Parameters:

ner (NER) – The NER instance
name (Optional[str], optional) – The pipeline name (Default value = None)

Return type:

None

add_linker(linker, name=None)

Add entity linker to the pipeline, will also add the necessary fields to Span object.

Parameters:

linker (Linker) – Any object/function created based on the requirements for a spaCy pipeline components. Have a look at https://spacy.io/usage/processing-pipelines#custom-components
name (Optional[str], optional) – The component name (Default value = None)

Return type:

None

add_meta_cat(meta_cat, name=None)

Parameters:

meta_cat (medcat.meta_cat.MetaCAT) –
name (Optional[str]) –

Return type:

None

add_addl_ner(addl_ner, name=None)

Parameters:

addl_ner (medcat.ner.transformers_ner.TransformersNER) –
name (Optional[str]) –

Return type:

None

batch_multi_process(texts, n_process=None, batch_size=None)

Batch process a list of texts in parallel.

Parameters:

texts (Iterable[str]) – The input sequence of texts to process.
n_process (int) – The number of processes running in parallel. Defaults to max(mp.cpu_count() - 1, 1).
batch_size (int) – The number of texts to buffer. Defaults to 1000.
total (int) – The number of texts in total.

Returns:

Generator[Doc] – The output sequence of spacy documents with the extracted entities

Return type:

Iterable[spacy.tokens.Doc]

set_error_handler(error_handler)

Parameters:: error_handler (Callable) –
Return type:: None

reset_error_handler()

Return type:: None

force_remove(component_name)

Parameters:: component_name (str) –
Return type:: None

destroy()

Return type:: None

static _ensure_serializable(doc)

Parameters:: doc (spacy.tokens.Doc) –
Return type:: spacy.tokens.Doc

__call__(text)

Parameters:: text (Union[str, Iterable[str]]) –
Return type:: Union[spacy.tokens.Doc, List[spacy.tokens.Doc]]

medcat.pipe

Module Contents

Classes

Attributes

`medcat.pipe`