medcat.pipe
Module Contents
Classes
A wrapper around the standard spacy pipeline. |
Attributes
- medcat.pipe.logger
- medcat.pipe.DEFAULT_SPACY_MODEL = 'en_core_web_md'
- class medcat.pipe.Pipe(tokenizer, config)
Bases:
objectA wrapper around the standard spacy pipeline.
- Parameters:
tokenizer (spacy.tokenizer.Tokenizer) – What will be used to split text into tokens, can be anything built as a spacy tokenizer.
config (medcat.config.Config) – Global config for medcat.
- Properties:
- nlp (spacy.language.<lng>):
The base spacy NLP pipeline.
- property spacy_nlp: spacy.language.Language
The spaCy Language object.
- Return type:
spacy.language.Language
- __init__(tokenizer, config)
- Parameters:
tokenizer (spacy.tokenizer.Tokenizer) –
config (medcat.config.Config) –
- Return type:
None
- _init_nlp(config)
- Parameters:
config (medcat.config.Config) –
- Return type:
spacy.language.Language
- add_tagger(tagger, name=None, additional_fields=[])
Add any kind of a tagger for tokens.
- Parameters:
tagger (Callable) – Any object/function that takes a spacy doc as an input, does something and returns the same doc.
name (Optional[str], optional) – Name for this component in the pipeline. (Default value = None)
additional_fields (List[str], optional) – Fields to be added to the _ properties of a token. (Default value = [])
- Return type:
None
- add_token_normalizer(config, name=None, spell_checker=None)
- Parameters:
config (medcat.config.Config) –
name (Optional[str]) –
spell_checker (Optional[medcat.utils.normalizers.BasicSpellChecker]) –
- Return type:
None
- add_ner(ner, name=None)
Add NER from CAT to the pipeline, will also add the necessary fields to the document and Span objects.
- Parameters:
ner (NER) – The NER instance
name (Optional[str], optional) – The pipeline name (Default value = None)
- Return type:
None
- add_linker(linker, name=None)
Add entity linker to the pipeline, will also add the necessary fields to Span object.
- Parameters:
linker (Linker) – Any object/function created based on the requirements for a spaCy pipeline components. Have a look at https://spacy.io/usage/processing-pipelines#custom-components
name (Optional[str], optional) – The component name (Default value = None)
- Return type:
None
- add_meta_cat(meta_cat, name=None)
- Parameters:
meta_cat (medcat.meta_cat.MetaCAT) –
name (Optional[str]) –
- Return type:
None
- add_addl_ner(addl_ner, name=None)
- Parameters:
addl_ner (medcat.ner.transformers_ner.TransformersNER) –
name (Optional[str]) –
- Return type:
None
- batch_multi_process(texts, n_process=None, batch_size=None)
Batch process a list of texts in parallel.
- Parameters:
texts (Iterable[str]) – The input sequence of texts to process.
n_process (int) – The number of processes running in parallel. Defaults to max(mp.cpu_count() - 1, 1).
batch_size (int) – The number of texts to buffer. Defaults to 1000.
total (int) – The number of texts in total.
- Returns:
Generator[Doc] – The output sequence of spacy documents with the extracted entities
- Return type:
Iterable[spacy.tokens.Doc]
- set_error_handler(error_handler)
- Parameters:
error_handler (Callable) –
- Return type:
None
- reset_error_handler()
- Return type:
None
- force_remove(component_name)
- Parameters:
component_name (str) –
- Return type:
None
- destroy()
- Return type:
None
- static _ensure_serializable(doc)
- Parameters:
doc (spacy.tokens.Doc) –
- Return type:
spacy.tokens.Doc
- __call__(text)
- Parameters:
text (Union[str, Iterable[str]]) –
- Return type:
Union[spacy.tokens.Doc, List[spacy.tokens.Doc]]