:py:mod:`medcat.pipe` ===================== .. py:module:: medcat.pipe Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.pipe.Pipe Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.pipe.logger medcat.pipe.DEFAULT_SPACY_MODEL .. py:data:: logger .. py:data:: DEFAULT_SPACY_MODEL :value: 'en_core_web_md' .. py:class:: Pipe(tokenizer, config) Bases: :py:obj:`object` A wrapper around the standard spacy pipeline. :param tokenizer: What will be used to split text into tokens, can be anything built as a spacy tokenizer. :type tokenizer: spacy.tokenizer.Tokenizer :param config: Global config for medcat. :type config: medcat.config.Config Properties: nlp (spacy.language.): The base spacy NLP pipeline. .. py:property:: spacy_nlp :type: spacy.language.Language The spaCy Language object. :Returns: **Language** -- The spacy model/language. .. py:method:: __init__(tokenizer, config) .. py:method:: _init_nlp(config) .. py:method:: add_tagger(tagger, name = None, additional_fields = []) Add any kind of a tagger for tokens. :param tagger: Any object/function that takes a spacy doc as an input, does something and returns the same doc. :type tagger: Callable :param name: Name for this component in the pipeline. (Default value = None) :type name: Optional[str], optional :param additional_fields: Fields to be added to the `_` properties of a token. (Default value = []) :type additional_fields: List[str], optional .. py:method:: add_token_normalizer(config, name = None, spell_checker = None) .. py:method:: add_ner(ner, name = None) Add NER from CAT to the pipeline, will also add the necessary fields to the document and Span objects. :param ner: The NER instance :type ner: NER :param name: The pipeline name (Default value = None) :type name: Optional[str], optional .. py:method:: add_linker(linker, name = None) Add entity linker to the pipeline, will also add the necessary fields to Span object. :param linker: Any object/function created based on the requirements for a spaCy pipeline components. Have a look at https://spacy.io/usage/processing-pipelines#custom-components :type linker: Linker :param name: The component name (Default value = None) :type name: Optional[str], optional .. py:method:: add_meta_cat(meta_cat, name = None) .. py:method:: add_rel_cat(rel_cat, name = None) .. py:method:: add_addl_ner(addl_ner, name = None) .. py:method:: batch_multi_process(texts, n_process = None, batch_size = None) Batch process a list of texts in parallel. :param texts: The input sequence of texts to process. :type texts: Iterable[str] :param n_process: The number of processes running in parallel. Defaults to max(mp.cpu_count() - 1, 1). :type n_process: int :param batch_size: The number of texts to buffer. Defaults to 1000. :type batch_size: int :Returns: **Generator[Doc]** -- The output sequence of spacy documents with the extracted entities .. py:method:: set_error_handler(error_handler) .. py:method:: reset_error_handler() .. py:method:: force_remove(component_name) .. py:method:: destroy() .. py:method:: _ensure_serializable(doc) :staticmethod: .. py:method:: __call__(text)