medcat.stats.stats

Module Contents

Classes

StatsBuilder

Functions

get_stats(cat, data[, epoch, use_project_filters, ...])

TODO: Refactor and make nice

class medcat.stats.stats.StatsBuilder(filters, addl_info, doc_getter, doc_annotation_getter, cui2group, cui2preferred_name, cui2names, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)
Parameters:
  • filters (medcat.config.LinkingFilters) –

  • addl_info (dict) –

  • doc_getter (Callable[[Optional[str], bool], Optional[spacy.tokens.Doc]]) –

  • doc_annotation_getter (Callable[[dict], list]) –

  • cui2group (Dict[str, str]) –

  • cui2preferred_name (Dict[str, str]) –

  • cui2names (Dict[str, Set[str]]) –

  • use_project_filters (bool) –

  • use_overlaps (bool) –

  • use_cui_doc_limit (bool) –

  • use_groups (bool) –

  • extra_cui_filter (Optional[Set]) –

__init__(filters, addl_info, doc_getter, doc_annotation_getter, cui2group, cui2preferred_name, cui2names, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)
Parameters:
  • filters (medcat.config.LinkingFilters) –

  • addl_info (dict) –

  • doc_getter (Callable[[Optional[str], bool], Optional[spacy.tokens.Doc]]) –

  • doc_annotation_getter (Callable[[dict], list]) –

  • cui2group (Dict[str, str]) –

  • cui2preferred_name (Dict[str, str]) –

  • cui2names (Dict[str, Set[str]]) –

  • use_project_filters (bool) –

  • use_overlaps (bool) –

  • use_cui_doc_limit (bool) –

  • use_groups (bool) –

  • extra_cui_filter (Optional[Set]) –

Return type:

None

_reset_stats()
process_project(project)
Parameters:

project (dict) –

Return type:

None

process_document(project_name, project_id, doc)
Parameters:
  • project_name (str) –

  • project_id (str) –

  • doc (dict) –

Return type:

None

_process_anns_norm(doc, anns_norm, p_anns_norm, anns_examples)
Parameters:
  • doc (dict) –

  • anns_norm (list) –

  • p_anns_norm (list) –

  • anns_examples (list) –

Return type:

None

_process_p_anns(project_name, project_id, doc, p_anns)
Parameters:
  • project_name (str) –

  • project_id (str) –

  • doc (dict) –

  • p_anns (list) –

Return type:

Tuple[list, list]

_count_p_anns_norm(doc, anns_norm, anns_norm_neg, p_anns_norm, p_anns_examples)
Parameters:
  • doc (dict) –

  • anns_norm (list) –

  • anns_norm_neg (list) –

  • p_anns_norm (list) –

  • p_anns_examples (list) –

Return type:

None

_create_annoation(project_name, project_id, cui, doc, ann)
Parameters:
  • project_name (str) –

  • project_id (str) –

  • cui (str) –

  • doc (dict) –

  • ann (Dict) –

Return type:

Dict

_create_annoation_2(project_name, project_id, cui, doc, ann)
Parameters:
  • project_name (str) –

  • project_id (str) –

  • cui (str) –

  • doc (dict) –

Return type:

Dict

_preprocess_annotations(project_name, project_id, doc, anns)
Parameters:
  • project_name (str) –

  • project_id (str) –

  • doc (dict) –

  • anns (List[Dict]) –

Return type:

Tuple[list, list, list, list]

finalise_report(epoch, do_print=True)
Parameters:
  • epoch (int) –

  • do_print (bool) –

unwrap()
Return type:

Tuple

classmethod from_cat(cat, local_filters, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)
Parameters:
  • local_filters (medcat.config.LinkingFilters) –

  • use_project_filters (bool) –

  • use_overlaps (bool) –

  • use_cui_doc_limit (bool) –

  • use_groups (bool) –

  • extra_cui_filter (Optional[Set]) –

Return type:

StatsBuilder

medcat.stats.stats.get_stats(cat, data, epoch=0, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None, do_print=True)

TODO: Refactor and make nice Print metrics on a dataset (F1, P, R), it will also print the concepts that have the most FP,FN,TP.

Parameters:
  • cat – (CAT): The model pack.

  • data (Dict) – The json object that we get from MedCATtrainer on export.

  • epoch (int) – Used during training, so we know what epoch is it.

  • use_project_filters (bool) – Each project in MedCATtrainer can have filters, do we want to respect those filters when calculating metrics.

  • use_overlaps (bool) – Allow overlapping entities, nearly always False as it is very difficult to annotate overlapping entites.

  • use_cui_doc_limit (bool) – If True the metrics for a CUI will be only calculated if that CUI appears in a document, in other words if the document was annotated for that CUI. Useful in very specific situations when during the annotation process the set of CUIs changed.

  • use_groups (bool) – If True concepts that have groups will be combined and stats will be reported on groups.

  • extra_cui_filter (Optional[Set]) – This filter will be intersected with all other filters, or if all others are not set then only this one will be used.

  • do_print (bool) – Whether to print stats out. Defaults to True.

Returns:
  • fps (dict) – False positives for each CUI.

  • fns (dict) – False negatives for each CUI.

  • tps (dict) – True positives for each CUI.

  • cui_prec (dict) – Precision for each CUI.

  • cui_rec (dict) – Recall for each CUI.

  • cui_f1 (dict) – F1 for each CUI.

  • cui_counts (dict) – Number of occurrence for each CUI.

  • examples (dict) – Examples for each of the fp, fn, tp. Format will be examples[‘fp’][‘cui’][<list_of_examples>].

Return type:

Tuple