medcat.pipe

Module Contents

Classes

Pipe

A wrapper around the standard spacy pipeline.

Attributes

logger

DEFAULT_SPACY_MODEL

medcat.pipe.logger
medcat.pipe.DEFAULT_SPACY_MODEL = 'en_core_web_md'
class medcat.pipe.Pipe(tokenizer, config)

Bases: object

A wrapper around the standard spacy pipeline.

Parameters:
  • tokenizer (spacy.tokenizer.Tokenizer) – What will be used to split text into tokens, can be anything built as a spacy tokenizer.

  • config (medcat.config.Config) – Global config for medcat.

Properties:
nlp (spacy.language.<lng>):

The base spacy NLP pipeline.

property spacy_nlp: spacy.language.Language

The spaCy Language object.

Returns:

Language – The spacy model/language.

Return type:

spacy.language.Language

__init__(tokenizer, config)
Parameters:
Return type:

None

_init_nlp(config)
Parameters:

config (medcat.config.Config) –

Return type:

spacy.language.Language

add_tagger(tagger, name=None, additional_fields=[])

Add any kind of a tagger for tokens.

Parameters:
  • tagger (Callable) – Any object/function that takes a spacy doc as an input, does something and returns the same doc.

  • name (Optional[str], optional) – Name for this component in the pipeline. (Default value = None)

  • additional_fields (List[str], optional) – Fields to be added to the _ properties of a token. (Default value = [])

Return type:

None

add_token_normalizer(config, name=None, spell_checker=None)
Parameters:
Return type:

None

add_ner(ner, name=None)

Add NER from CAT to the pipeline, will also add the necessary fields to the document and Span objects.

Parameters:
  • ner (NER) – The NER instance

  • name (Optional[str], optional) – The pipeline name (Default value = None)

Return type:

None

add_linker(linker, name=None)

Add entity linker to the pipeline, will also add the necessary fields to Span object.

Parameters:
Return type:

None

add_meta_cat(meta_cat, name=None)
Parameters:
Return type:

None

add_rel_cat(rel_cat, name=None)
Parameters:
Return type:

None

add_addl_ner(addl_ner, name=None)
Parameters:
Return type:

None

batch_multi_process(texts, n_process=None, batch_size=None)

Batch process a list of texts in parallel.

Parameters:
  • texts (Iterable[str]) – The input sequence of texts to process.

  • n_process (int) – The number of processes running in parallel. Defaults to max(mp.cpu_count() - 1, 1).

  • batch_size (int) – The number of texts to buffer. Defaults to 1000.

Returns:

Generator[Doc] – The output sequence of spacy documents with the extracted entities

Return type:

Iterable[spacy.tokens.Doc]

set_error_handler(error_handler)
Parameters:

error_handler (Callable) –

Return type:

None

reset_error_handler()
Return type:

None

force_remove(component_name)
Parameters:

component_name (str) –

Return type:

None

destroy()
Return type:

None

static _ensure_serializable(doc)
Parameters:

doc (spacy.tokens.Doc) –

Return type:

spacy.tokens.Doc

__call__(text)
Parameters:

text (Union[str, Iterable[str]]) –

Return type:

Union[spacy.tokens.Doc, List[spacy.tokens.Doc]]