medcat.cat
Module Contents
Classes
The main MedCAT class used to annotate documents, it is built on top of spaCy |
Attributes
- medcat.cat.logger
- medcat.cat.HAS_NEW_SPACY
- class medcat.cat.CAT(cdb, vocab=None, config=None, meta_cats=[], rel_cats=[], addl_ner=[])
Bases:
object
The main MedCAT class used to annotate documents, it is built on top of spaCy and works as a spaCy pipline. Creates an instance of a spaCy pipline that can be used as a spacy nlp model.
- Parameters:
cdb (medcat.cdb.CDB) – The concept database that will be used for NER+L
config (medcat.config.Config) – Global configuration for medcat
vocab (medcat.vocab.Vocab, optional) – Vocabulary used for vector embeddings and spelling. Default: None
meta_cats (list of medcat.meta_cat.MetaCAT, optional) – A list of models that will be applied sequentially on each detected annotation.
rel_cats (list of medcat.rel_cat.RelCAT, optional) – List of models applied sequentially on all detected annotations.
addl_ner (Union[medcat.ner.transformers_ner.TransformersNER, List[medcat.ner.transformers_ner.TransformersNER]]) –
- Attributes (limited):
- cdb (medcat.cdb.CDB):
Concept database used with this CAT instance, please do not assign this value directly.
- config (medcat.config.Config):
The global configuration for medcat. Usually cdb.config will be used for this field. WILL BE REMOVED - TEMPORARY PLACEHOLDER
- vocab (medcat.utils.vocab.Vocab):
The vocabulary object used with this instance, please do not assign this value directly.
Examples
>>> cat = CAT(cdb, vocab) >>> spacy_doc = cat("Put some text here") >>> print(spacy_doc.ents) # Detected entities
- DEFAULT_MODEL_PACK_NAME = 'medcat_model_pack'
- __init__(cdb, vocab=None, config=None, meta_cats=[], rel_cats=[], addl_ner=[])
- Parameters:
cdb (medcat.cdb.CDB) –
vocab (Union[medcat.vocab.Vocab, None]) –
config (Optional[medcat.config.Config]) –
meta_cats (List[medcat.meta_cat.MetaCAT]) –
rel_cats (List[medcat.rel_cat.RelCAT]) –
addl_ner (Union[medcat.ner.transformers_ner.TransformersNER, List[medcat.ner.transformers_ner.TransformersNER]]) –
- Return type:
None
- _create_pipeline(config)
- Parameters:
config (medcat.config.Config) –
- get_spacy_nlp()
Returns the spacy pipeline with MedCAT
- Returns:
Language – The spacy Language being used.
- Return type:
spacy.language.Language
- get_hash(force_recalc=False)
Will not be a deep hash but will try to catch all the changing parts during training.
Able to force recalculation of hash. This is relevant for CDB the hash for which is otherwise only recalculated if it has changed.
- Parameters:
force_recalc (bool) – Whether to force recalculation. Defaults to False.
- Returns:
str – The resulting hash
- Return type:
str
- get_model_card(as_dict=False)
A minimal model card for MedCAT model packs.
- Parameters:
as_dict (bool) – Whether to return the model card as a dictionary instead of a str (Default value False).
- Returns:
str – The string representation of the JSON object.
OR
dict – The dict JSON object.
- _versioning(force_rehash=False)
- Parameters:
force_rehash (bool) –
- create_model_pack(save_dir_path, model_pack_name=DEFAULT_MODEL_PACK_NAME, force_rehash=False, cdb_format='dill')
Will crete a .zip file containing all the models in the current running instance of MedCAT. This is not the most efficient way, for sure, but good enough for now.
- Parameters:
save_dir_path (str) – An id will be appended to this name
model_pack_name (str) – The model pack name. Defaults to DEFAULT_MODEL_PACK_NAME.
force_rehash (bool) – Force recalculation of hash. Defaults to False.
cdb_format (str) – The format of the saved CDB in the model pack. The available formats are: - dill - json Defaults to ‘dill’
- Returns:
str – Model pack name
- Return type:
str
- classmethod attempt_unpack(zip_path)
Attempt unpack the zip to a folder and get the model pack path.
If the folder already exists, no unpacking is done.
- Parameters:
zip_path (str) – The ZIP path
- Returns:
str – The model pack path
- Return type:
str
- classmethod load_model_pack(zip_path, meta_cat_config_dict=None, ner_config_dict=None, load_meta_models=True, load_addl_ner=True, load_rel_models=True)
Load everything within the ‘model pack’, i.e. the CDB, config, vocab and any MetaCAT models (if present)
- Parameters:
zip_path (str) – The path to model pack zip.
meta_cat_config_dict (Optional[Dict]) – A config dict that will overwrite existing configs in meta_cat. e.g. meta_cat_config_dict = {‘general’: {‘device’: ‘cpu’}}. Defaults to None.
ner_config_dict (Optional[Dict]) – A config dict that will overwrite existing configs in transformers ner. e.g. ner_config_dict = {‘general’: {‘chunking_overlap_window’: 6}. Defaults to None.
load_meta_models (bool) – Whether to load MetaCAT models if present (Default value True).
load_addl_ner (bool) – Whether to load additional NER models if present (Default value True).
load_rel_models (bool) – Whether to load RelCAT models if present (Default value True).
- Returns:
CAT – The resulting CAT object.
- Return type:
- __call__(text, do_train=False)
Push the text through the pipeline.
- Parameters:
text (Optional[str]) – The text to be annotated, if the text length is longer than self.config.preprocessing[‘max_document_length’] it will be trimmed to that length.
do_train (bool) – This causes so many screwups when not there, so I’ll force training to False. To run training it is much better to use the self.train() function but for some special cases I’m leaving it here also. Defaults to False.
- Returns:
Optional[Doc] – A single spacy document or multiple spacy documents with the extracted entities
- Return type:
Optional[spacy.tokens.Doc]
- __repr__()
Prints the model_card for this CAT instance.
- Returns:
str – the ‘Model Card’ for this CAT instance. This includes NER+L config and any MetaCATs
- Return type:
str
- _print_stats(data, epoch=0, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None, do_print=True)
TODO: Refactor and make nice Print metrics on a dataset (F1, P, R), it will also print the concepts that have the most FP,FN,TP.
- Parameters:
data (Dict) – The json object that we get from MedCATtrainer on export.
epoch (int) – Used during training, so we know what epoch is it.
use_project_filters (bool) – Each project in MedCATtrainer can have filters, do we want to respect those filters when calculating metrics.
use_overlaps (bool) – Allow overlapping entities, nearly always False as it is very difficult to annotate overlapping entites.
use_cui_doc_limit (bool) – If True the metrics for a CUI will be only calculated if that CUI appears in a document, in other words if the document was annotated for that CUI. Useful in very specific situations when during the annotation process the set of CUIs changed.
use_groups (bool) – If True concepts that have groups will be combined and stats will be reported on groups.
extra_cui_filter (Optional[Set]) – This filter will be intersected with all other filters, or if all others are not set then only this one will be used.
do_print (bool) – Whether to print stats out. Defaults to True.
- Returns:
fps (dict) – False positives for each CUI.
fns (dict) – False negatives for each CUI.
tps (dict) – True positives for each CUI.
cui_prec (dict) – Precision for each CUI.
cui_rec (dict) – Recall for each CUI.
cui_f1 (dict) – F1 for each CUI.
cui_counts (dict) – Number of occurrence for each CUI.
examples (dict) – Examples for each of the fp, fn, tp. Format will be examples[‘fp’][‘cui’][<list_of_examples>].
- Return type:
Tuple
- _init_ckpts(is_resumed, checkpoint)
- train(data_iterator, nepochs=1, fine_tune=True, progress_print=1000, checkpoint=None, is_resumed=False)
Runs training on the data, note that the maximum length of a line or document is 1M characters. Anything longer will be trimmed.
- Parameters:
data_iterator (Iterable) – Simple iterator over sentences/documents, e.g. a open file or an array or anything that we can use in a for loop.
nepochs (int) – Number of epochs for which to run the training.
fine_tune (bool) – If False old training will be removed.
progress_print (int) – Print progress after N lines.
checkpoint (Optional[medcat.utils.checkpoint.CheckpointUT]) – The MedCAT checkpoint object
is_resumed (bool) – If True resume the previous training; If False, start a fresh new training.
- Return type:
None
- add_cui_to_group(cui, group_name)
Adds a CUI to a group, will appear in cdb.addl_info[‘cui2group’]
- Parameters:
cui (str) – The concept to be added.
group_name (str) – The group to whcih the concept will be added.
- Return type:
None
Examples
>>> cat.add_cui_to_group("S-17", 'pain')
- unlink_concept_name(cui, name, preprocessed_name=False)
Unlink a concept name from the CUI (or all CUIs if full_unlink), removes the link from the Concept Database (CDB). As a consequence medcat will never again link the name to this CUI - meaning the name will not be detected as a concept in the future.
- Parameters:
cui (str) – The CUI from which the name will be removed.
name (str) – The span of text to be removed from the linking dictionary.
preprocessed_name (bool) – Whether the name being used is preprocessed.
- Return type:
None
Examples
>>> # To never again link C0020538 to HTN >>> cat.unlink_concept_name('C0020538', 'htn', False)
- add_and_train_concept(cui, name, spacy_doc=None, spacy_entity=None, ontologies=set(), name_status='A', type_ids=set(), description='', full_build=True, negative=False, devalue_others=False, do_add_concept=True)
Add a name to an existing concept, or add a new concept, or do not do anything if the name or concept already exists. Perform training if spacy_entity and spacy_doc are set.
- Parameters:
cui (str) – CUI of the concept.
name (str) – Name to be linked to the concept (in the case of MedCATtrainer this is simply the selected value in text, no preprocessing or anything needed).
spacy_doc (spacy.tokens.Doc) – Spacy representation of the document that was manually annotated.
spacy_entity (Optional[Union[List[Token], Span]]) – Given the spacy document, this is the annotated span of text - list of annotated tokens that are marked with this CUI.
ontologies (Set[str]) – ontologies in which the concept exists (e.g. SNOMEDCT, HPO)
name_status (str) – One of P, N, A
type_ids (Set[str]) – Semantic type identifier (have a look at TUIs in UMLS or SNOMED-CT)
description (str) – Description of this concept.
full_build (bool) – If True the dictionary self.addl_info will also be populated, contains a lot of extra information about concepts, but can be very memory consuming. This is not necessary for normal functioning of MedCAT (Default Value False).
negative (bool) – Is this a negative or positive example.
devalue_others (bool) – If set, cuis to which this name is assigned and are not cui will receive negative training given that negative=False.
do_add_concept (bool) – Whether to add concept to CDB.
- Return type:
None
- train_supervised(data_path, reset_cui_count=False, nepochs=1, print_stats=0, use_filters=False, terminate_last=False, use_overlaps=False, use_cui_doc_limit=False, test_size=0, devalue_others=False, use_groups=False, never_terminate=False, train_from_false_positives=False, extra_cui_filter=None, retain_extra_cui_filter=False, checkpoint=None, retain_filters=False, is_resumed=False)
Train supervised by reading data from a json file.
Refer to train_supervvised_from_json and/or train_supervised_raw for further details.
# noqa: DAR101 # noqa: DAR201
- Parameters:
data_path (str) –
reset_cui_count (bool) –
nepochs (int) –
print_stats (int) –
use_filters (bool) –
terminate_last (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
test_size (int) –
devalue_others (bool) –
use_groups (bool) –
never_terminate (bool) –
train_from_false_positives (bool) –
extra_cui_filter (Optional[Set]) –
retain_extra_cui_filter (bool) –
checkpoint (Optional[medcat.utils.checkpoint.Checkpoint]) –
retain_filters (bool) –
is_resumed (bool) –
- Return type:
Tuple
- train_supervised_from_json(data_path, reset_cui_count=False, nepochs=1, print_stats=0, use_filters=False, terminate_last=False, use_overlaps=False, use_cui_doc_limit=False, test_size=0, devalue_others=False, use_groups=False, never_terminate=False, train_from_false_positives=False, extra_cui_filter=None, retain_extra_cui_filter=False, checkpoint=None, retain_filters=False, is_resumed=False)
Run supervised training on a dataset from MedCATtrainer in JSON format.
Refer to train_supervised_raw for more details.
# noqa: DAR101 # noqa: DAR201
- Parameters:
data_path (str) –
reset_cui_count (bool) –
nepochs (int) –
print_stats (int) –
use_filters (bool) –
terminate_last (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
test_size (int) –
devalue_others (bool) –
use_groups (bool) –
never_terminate (bool) –
train_from_false_positives (bool) –
extra_cui_filter (Optional[Set]) –
retain_extra_cui_filter (bool) –
checkpoint (Optional[medcat.utils.checkpoint.Checkpoint]) –
retain_filters (bool) –
is_resumed (bool) –
- Return type:
Tuple
- train_supervised_raw(data, reset_cui_count=False, nepochs=1, print_stats=0, use_filters=False, terminate_last=False, use_overlaps=False, use_cui_doc_limit=False, test_size=0, devalue_others=False, use_groups=False, never_terminate=False, train_from_false_positives=False, extra_cui_filter=None, retain_extra_cui_filter=False, checkpoint=None, retain_filters=False, is_resumed=False)
Train supervised based on the raw data provided.
The raw data is expected in the following format: {‘projects’:
- [ # list of projects
- { # project 1
‘name’: ‘<some name>’, # list of documents ‘documents’: [{‘name’: ‘<some name>’, # document 1
‘text’: ‘<text of the document>’, # list of annotations ‘annotations’: [{‘start’: -1, # annotation 1
‘end’: 1, ‘cui’: ‘cui’, ‘value’: ‘<text value>’}, …],
}, …]
}, …
]
}
Please take care that this is more a simulated online training then supervised.
When filtering, the filters within the CAT model are used first, then the ones from MedCATtrainer (MCT) export filters, and finally the extra_cui_filter (if set). That is to say, the expectation is: extra_cui_filter ⊆ MCT filter ⊆ Model/config filter.
- Parameters:
data (Dict[str, List[Dict[str, dict]]]) – The raw data, e.g from MedCATtrainer on export.
reset_cui_count (bool) – Used for training with weight_decay (annealing). Each concept has a count that is there from the beginning of the CDB, that count is used for annealing. Resetting the count will significantly increase the training impact. This will reset the count only for concepts that exist in the the training data.
nepochs (int) – Number of epochs for which to run the training.
print_stats (int) – If > 0 it will print stats every print_stats epochs.
use_filters (bool) – Each project in medcattrainer can have filters, do we want to respect those filters when calculating metrics.
terminate_last (bool) – If true, concept termination will be done after all training.
use_overlaps (bool) – Allow overlapping entities, nearly always False as it is very difficult to annotate overlapping entities.
use_cui_doc_limit (bool) – If True the metrics for a CUI will be only calculated if that CUI appears in a document, in other words if the document was annotated for that CUI. Useful in very specific situations when during the annotation process the set of CUIs changed.
test_size (float) – If > 0 the data set will be split into train test based on this ration. Should be between 0 and 1. Usually 0.1 is fine.
devalue_others (bool) – Check add_name for more details.
use_groups (bool) – If True concepts that have groups will be combined and stats will be reported on groups.
never_terminate (bool) – If True no termination will be applied
train_from_false_positives (bool) – If True it will use false positive examples detected by medcat and train from them as negative examples.
extra_cui_filter (Optional[Set]) – This filter will be intersected with all other filters, or if all others are not set then only this one will be used.
retain_extra_cui_filter (bool) – Whether to retain the extra filters instead of the MedCATtrainer export filters. This will only have an effect if/when retain_filters is set to True. Defaults to False.
checkpoint (Optional[Optional[medcat.utils.checkpoint.CheckpointST]) – The MedCAT CheckpointST object
retain_filters (bool) – If True, retain the filters in the MedCATtrainer export within this CAT instance. In other words, the filters defined in the input file will henseforth be saved within config.linking.filters . This only makes sense if there is only one project in the input data. If that is not the case, a ValueError is raised. The merging is done in the first epoch.
is_resumed (bool) – If True resume the previous training; If False, start a fresh new training.
- Raises:
ValueError – If attempting to retain filters with while training over multiple projects.
- Returns:
Tuple – Consisting of the following parts fp (dict):
False positives for each CUI.
- fn (dict):
False negatives for each CUI.
- tp (dict):
True positives for each CUI.
- p (dict):
Precision for each CUI.
- r (dict):
Recall for each CUI.
- f1 (dict):
F1 for each CUI.
- cui_counts (dict):
Number of occurrence for each CUI.
- examples (dict):
FP/FN examples of sentences for each CUI.
- Return type:
Tuple
- get_entities(text, only_cui=False, addl_info=['cui2icd10', 'cui2ontologies', 'cui2snomed'])
- Parameters:
text (str) –
only_cui (bool) –
addl_info (List[str]) –
- Return type:
Dict
- get_entities_multi_texts(texts, only_cui=False, addl_info=['cui2icd10', 'cui2ontologies', 'cui2snomed'], n_process=None, batch_size=None)
Get entities
- Parameters:
texts (Union[Iterable[str], Iterable[Tuple]]) – Text to be annotated
only_cui (bool) – Whether to only return CUIs. Defaults to False.
addl_info (List[str]) – Additional info. Defaults to [‘cui2icd10’, ‘cui2ontologies’, ‘cui2snomed’].
n_process (Optional[int]) – Number of processes. Defaults to None.
batch_size (Optional[int]) – The size of a batch. Defaults to None.
- Raises:
ValueError – If there’s a known issue with multiprocessing.
RuntimeError – If there’s an unknown issue with multprocessing.
- Returns:
List[Dict] – List of entity documents.
- Return type:
List[Dict]
- get_json(text, only_cui=False, addl_info=['cui2icd10', 'cui2ontologies'])
Get output in json format
- Parameters:
text (str) – Text to be annotated
only_cui (bool) – Whether to only get CUIs. Defaults to False.
addl_info (List[str]) – Additional info. Defaults to [‘cui2icd10’, ‘cui2ontologies’].
- Returns:
str – Json with fields {‘entities’: <>, ‘text’: text}.
- Return type:
str
- static _get_training_start(train_set, latest_trained_step)
- _separate_nn_components()
- _run_nn_components(docs, nn_components, id2text)
This will add meta_anns in-place to the docs dict.
# noqa: DAR101
- Parameters:
docs (Dict) –
nn_components (List) –
id2text (Dict) –
- Return type:
None
- _batch_generator(data, batch_size_chars, skip_ids=set())
- Parameters:
data (Iterable) –
batch_size_chars (int) –
skip_ids (Set) –
- _save_docs_to_file(docs, annotated_ids, save_dir_path, annotated_ids_path, part_counter=0)
- Parameters:
docs (Iterable) –
annotated_ids (List[str]) –
save_dir_path (str) –
annotated_ids_path (Optional[str]) –
part_counter (int) –
- Return type:
int
- multiprocessing(data, nproc=2, batch_size_chars=5000 * 1000, only_cui=False, addl_info=['cui2icd10', 'cui2ontologies', 'cui2snomed'], separate_nn_components=True, out_split_size_chars=None, save_dir_path=os.path.abspath(os.getcwd()), min_free_memory=0.1)
- Parameters:
data (Union[List[Tuple], Iterable[Tuple]]) –
nproc (int) –
batch_size_chars (int) –
only_cui (bool) –
addl_info (List[str]) –
separate_nn_components (bool) –
out_split_size_chars (Optional[int]) –
save_dir_path (str) –
- Return type:
Dict
- multiprocessing_batch_char_size(data, nproc=2, batch_size_chars=5000 * 1000, only_cui=False, addl_info=[], separate_nn_components=True, out_split_size_chars=None, save_dir_path=os.path.abspath(os.getcwd()), min_free_memory=0.1, min_free_memory_size=None, enabled_progress_bar=True)
Run multiprocessing for inference, if out_save_path and out_split_size_chars is used this will also continue annotating documents if something is saved in that directory.
This method batches the data based on the number of characters as specified by user.
PS: This method is unlikely to work on a Windows machine.
- Parameters:
data (Union[List[Tuple], Iterable[Tuple]]) – Iterator or array with format: [(id, text), (id, text), …]
nproc (int) – Number of processors. Defaults to 8.
batch_size_chars (int) – Size of a batch in number of characters, this should be around: NPROC * average_document_length * 200. Defaults to 1000000.
only_cui (bool) – Whether to only return the CUIs rather than the full annotations. Dedfaults to False.
addl_info (List[str]) – The additional information. Defaults to [].
separate_nn_components (bool) – If set the medcat pipe will be broken up into NN and not-NN components and they will be run sequentially. This is useful as the NN components have batching and like to process many docs at once, while the rest of the pipeline runs the documents one by one. Defaults to True.
out_split_size_chars (Optional[int]) – If set once more than out_split_size_chars are annotated they will be saved to a file (save_dir_path) and the memory cleared. Recommended value is 20*batch_size_chars.
save_dir_path (str) – Where to save the annotated documents if splitting. Defaults to the current working directory.
min_free_memory (float) – If set a process will not start unless there is at least this much RAM memory left, should be a range between [0, 1] meaning how much of the memory has to be free. Helps when annotating very large datasets because spacy is not the best with memory management and multiprocessing. If both min_free_memory and min_free_memory_size are set, a ValueError is raised. Defaults to 0.1.
min_free_memory_size (Optional[str]) – If set, the process will not start unless there’s the specified amount of memory available. For reference, we would recommend at least 5GB of memory for a full SNOMED model. You can use human readable sizes (e.g 2GB, 2000MB and so on). If both min_free_memory and min_free_memory_size are set, a ValueError is raised. Defaults to None.
enabled_progress_bar (bool) – Whether to enabled the progress bar. Defaults to True.
- Raises:
Exception – If multiprocessing cannot be done.
ValueError – If both free memory specifiers are provided.
- Returns:
Dict – {id: doc_json, id2: doc_json2, …}, in case out_split_size_chars is used the last batch will be returned while that and all previous batches will be written to disk (out_save_dir).
- Return type:
Dict
- _multiprocessing_batch(data, nproc=8, batch_size_chars=1000000, only_cui=False, addl_info=[], nn_components=[], min_free_memory=0.1, min_free_memory_size=None)
Run multiprocessing on one batch.
- Parameters:
data (Union[List[Tuple], Iterable[Tuple]]) – Iterator or array with format: [(id, text), (id, text), …].
nproc (int) – Number of processors. Defaults to 8.
batch_size_chars (int) – Size of a batch in number of characters. Fefaults to 1 000 000.
only_cui (bool) – Whether to get only CUIs. Defaults to False.
addl_info (List[str]) – Additional info. Defaults to [].
nn_components (List) – NN components in case there’s a separation. Defaults to [].
min_free_memory (float) – If set a process will not start unless there is at least this much RAM memory left, should be a range between [0, 1] meaning how much of the memory has to be free. Helps when annotating very large datasets because spacy is not the best with memory management and multiprocessing. Defaults to 0.
min_free_memory_size (Optional[int]) – The minimum human readable memory size required.
- Returns:
Dict – {id: doc_json, id2: doc_json2, …}
- Return type:
Dict
- multiprocessing_pipe(in_data, nproc=None, batch_size=None, only_cui=False, addl_info=[], return_dict=True, batch_factor=2)
- Parameters:
in_data (Union[List[Tuple], Iterable[Tuple]]) –
nproc (Optional[int]) –
batch_size (Optional[int]) –
only_cui (bool) –
addl_info (List[str]) –
return_dict (bool) –
batch_factor (int) –
- Return type:
Union[List[Tuple], Dict]
- multiprocessing_batch_docs_size(in_data, nproc=None, batch_size=None, only_cui=False, addl_info=['cui2icd10', 'cui2ontologies', 'cui2snomed'], return_dict=True, batch_factor=2)
Run multiprocessing NOT FOR TRAINING.
This method batches the data based on the number of documents as specified by the user.
PS: This method supports Windows.
- Parameters:
in_data (Union[List[Tuple], Iterable[Tuple]]) – List with format: [(id, text), (id, text), …]
nproc (Optional[int]) – The number of processors. Defaults to None.
batch_size (Optional[int]) – The number of texts to buffer. Defaults to None.
only_cui (bool) – Whether to get only CUIs. Defaults to False.
addl_info (List[str]) – Additional info. Defaults to [].
return_dict (bool) – Flag for returning either a dict or a list of tuples. Defaults to True.
batch_factor (int) – Batch factor. Defaults to 2.
- Raises:
ValueError – When number of processes is 0.
- Returns:
Union[List[Tuple], Dict] – {id: doc_json, id: doc_json, …} or if return_dict is False, a list of tuples: [(id, doc_json), (id, doc_json), …]
- Return type:
Union[List[Tuple], Dict]
- _mp_cons(in_q, out_list, min_free_memory, lock, min_free_memory_size=None, pid=0, only_cui=False, addl_info=[])
- Parameters:
in_q (multiprocess.queues.Queue) –
out_list (List) –
min_free_memory (float) –
lock (multiprocess.synchronize.Lock) –
min_free_memory_size (Optional[int]) –
pid (int) –
only_cui (bool) –
addl_info (List) –
- Return type:
None
- _add_nested_ent(doc, _ents, _ent)
- Parameters:
doc (spacy.tokens.Doc) –
_ents (List[spacy.tokens.Span]) –
_ent (Union[Dict, spacy.tokens.Span]) –
- Return type:
None
- _doc_to_out(doc, only_cui, addl_info, out_with_text=False)
- Parameters:
doc (spacy.tokens.Doc) –
only_cui (bool) –
addl_info (List[str]) –
out_with_text (bool) –
- Return type:
Dict
- _get_trimmed_text(text)
- Parameters:
text (Optional[str]) –
- Return type:
str
- _generate_trimmed_texts(texts)
- Parameters:
texts (Union[Iterable[str], Iterable[Tuple]]) –
- Return type:
Iterable[str]
- _get_trimmed_texts(texts)
- Parameters:
texts (Union[Iterable[str], Iterable[Tuple]]) –
- Return type:
List[str]
- static _pipe_error_handler(proc_name, proc, docs, e)
- Parameters:
proc_name (str) –
proc (medcat.pipe.Pipe) –
docs (List[spacy.tokens.Doc]) –
e (Exception) –
- Return type:
None
- static _get_doc_annotations(doc)
- Parameters:
doc (spacy.tokens.Doc) –
- destroy_pipe()