medcat.stats.stats
Module Contents
Classes
Functions
|
TODO: Refactor and make nice |
- class medcat.stats.stats.StatsBuilder(filters, addl_info, doc_getter, doc_annotation_getter, cui2group, cui2preferred_name, cui2names, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)
- Parameters:
filters (medcat.config.LinkingFilters) –
addl_info (dict) –
doc_getter (Callable[[Optional[str], bool], Optional[spacy.tokens.Doc]]) –
doc_annotation_getter (Callable[[dict], list]) –
cui2group (Dict[str, str]) –
cui2preferred_name (Dict[str, str]) –
cui2names (Dict[str, Set[str]]) –
use_project_filters (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
use_groups (bool) –
extra_cui_filter (Optional[Set]) –
- __init__(filters, addl_info, doc_getter, doc_annotation_getter, cui2group, cui2preferred_name, cui2names, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)
- Parameters:
filters (medcat.config.LinkingFilters) –
addl_info (dict) –
doc_getter (Callable[[Optional[str], bool], Optional[spacy.tokens.Doc]]) –
doc_annotation_getter (Callable[[dict], list]) –
cui2group (Dict[str, str]) –
cui2preferred_name (Dict[str, str]) –
cui2names (Dict[str, Set[str]]) –
use_project_filters (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
use_groups (bool) –
extra_cui_filter (Optional[Set]) –
- Return type:
None
- _reset_stats()
- process_project(project)
- Parameters:
project (dict) –
- Return type:
None
- process_document(project_name, project_id, doc)
- Parameters:
project_name (str) –
project_id (str) –
doc (dict) –
- Return type:
None
- _process_anns_norm(doc, anns_norm, p_anns_norm, anns_examples)
- Parameters:
doc (dict) –
anns_norm (list) –
p_anns_norm (list) –
anns_examples (list) –
- Return type:
None
- _process_p_anns(project_name, project_id, doc, p_anns)
- Parameters:
project_name (str) –
project_id (str) –
doc (dict) –
p_anns (list) –
- Return type:
Tuple[list, list]
- _count_p_anns_norm(doc, anns_norm, anns_norm_neg, p_anns_norm, p_anns_examples)
- Parameters:
doc (dict) –
anns_norm (list) –
anns_norm_neg (list) –
p_anns_norm (list) –
p_anns_examples (list) –
- Return type:
None
- _create_annoation(project_name, project_id, cui, doc, ann)
- Parameters:
project_name (str) –
project_id (str) –
cui (str) –
doc (dict) –
ann (Dict) –
- Return type:
Dict
- _create_annoation_2(project_name, project_id, cui, doc, ann)
- Parameters:
project_name (str) –
project_id (str) –
cui (str) –
doc (dict) –
- Return type:
Dict
- _preprocess_annotations(project_name, project_id, doc, anns)
- Parameters:
project_name (str) –
project_id (str) –
doc (dict) –
anns (List[Dict]) –
- Return type:
Tuple[list, list, list, list]
- finalise_report(epoch, do_print=True)
- Parameters:
epoch (int) –
do_print (bool) –
- unwrap()
- Return type:
Tuple
- classmethod from_cat(cat, local_filters, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)
- Parameters:
local_filters (medcat.config.LinkingFilters) –
use_project_filters (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
use_groups (bool) –
extra_cui_filter (Optional[Set]) –
- Return type:
- medcat.stats.stats.get_stats(cat, data, epoch=0, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None, do_print=True)
TODO: Refactor and make nice Print metrics on a dataset (F1, P, R), it will also print the concepts that have the most FP,FN,TP.
- Parameters:
cat – (CAT): The model pack.
data (Dict) – The json object that we get from MedCATtrainer on export.
epoch (int) – Used during training, so we know what epoch is it.
use_project_filters (bool) – Each project in MedCATtrainer can have filters, do we want to respect those filters when calculating metrics.
use_overlaps (bool) – Allow overlapping entities, nearly always False as it is very difficult to annotate overlapping entites.
use_cui_doc_limit (bool) – If True the metrics for a CUI will be only calculated if that CUI appears in a document, in other words if the document was annotated for that CUI. Useful in very specific situations when during the annotation process the set of CUIs changed.
use_groups (bool) – If True concepts that have groups will be combined and stats will be reported on groups.
extra_cui_filter (Optional[Set]) – This filter will be intersected with all other filters, or if all others are not set then only this one will be used.
do_print (bool) – Whether to print stats out. Defaults to True.
- Returns:
fps (dict) – False positives for each CUI.
fns (dict) – False negatives for each CUI.
tps (dict) – True positives for each CUI.
cui_prec (dict) – Precision for each CUI.
cui_rec (dict) – Recall for each CUI.
cui_f1 (dict) – F1 for each CUI.
cui_counts (dict) – Number of occurrence for each CUI.
examples (dict) – Examples for each of the fp, fn, tp. Format will be examples[‘fp’][‘cui’][<list_of_examples>].
- Return type:
Tuple