`medcat.stats.stats`

Module Contents

Classes

StatsBuilder

Functions

get_stats(cat, data[, epoch, use_project_filters, ...])

TODO: Refactor and make nice

class medcat.stats.stats.StatsBuilder(filters, addl_info, doc_getter, doc_annotation_getter, cui2group, cui2preferred_name, cui2names, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)

Parameters:

filters (medcat.config.LinkingFilters) –
addl_info (dict) –
doc_getter (Callable[[Optional[str], bool], Optional[spacy.tokens.Doc]]) –
doc_annotation_getter (Callable[[dict], list]) –
cui2group (Dict[str, str]) –
cui2preferred_name (Dict[str, str]) –
cui2names (Dict[str, Set[str]]) –
use_project_filters (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
use_groups (bool) –
extra_cui_filter (Optional[Set]) –

__init__(filters, addl_info, doc_getter, doc_annotation_getter, cui2group, cui2preferred_name, cui2names, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)

Parameters:

filters (medcat.config.LinkingFilters) –
addl_info (dict) –
doc_getter (Callable[[Optional[str], bool], Optional[spacy.tokens.Doc]]) –
doc_annotation_getter (Callable[[dict], list]) –
cui2group (Dict[str, str]) –
cui2preferred_name (Dict[str, str]) –
cui2names (Dict[str, Set[str]]) –
use_project_filters (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
use_groups (bool) –
extra_cui_filter (Optional[Set]) –

Return type:

None

_reset_stats()

process_project(project)

Parameters:: project (dict) –
Return type:: None

process_document(project_name, project_id, doc)

Parameters:

project_name (str) –
project_id (str) –
doc (dict) –

Return type:

None

_process_anns_norm(doc, anns_norm, p_anns_norm, anns_examples)

Parameters:

doc (dict) –
anns_norm (list) –
p_anns_norm (list) –
anns_examples (list) –

Return type:

None

_process_p_anns(project_name, project_id, doc, p_anns)

Parameters:

project_name (str) –
project_id (str) –
doc (dict) –
p_anns (list) –

Return type:

Tuple[list, list]

_count_p_anns_norm(doc, anns_norm, anns_norm_neg, p_anns_norm, p_anns_examples)

Parameters:

doc (dict) –
anns_norm (list) –
anns_norm_neg (list) –
p_anns_norm (list) –
p_anns_examples (list) –

Return type:

None

_create_annoation(project_name, project_id, cui, doc, ann)

Parameters:

project_name (str) –
project_id (str) –
cui (str) –
doc (dict) –
ann (Dict) –

Return type:

Dict

_create_annoation_2(project_name, project_id, cui, doc, ann)

Parameters:

project_name (str) –
project_id (str) –
cui (str) –
doc (dict) –

Return type:

Dict

_preprocess_annotations(project_name, project_id, doc, anns)

Parameters:

project_name (str) –
project_id (str) –
doc (dict) –
anns (List[Dict]) –

Return type:

Tuple[list, list, list, list]

finalise_report(epoch, do_print=True)

Parameters:

epoch (int) –
do_print (bool) –

unwrap()

Return type:: Tuple

classmethod from_cat(cat, local_filters, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None)

Parameters:

local_filters (medcat.config.LinkingFilters) –
use_project_filters (bool) –
use_overlaps (bool) –
use_cui_doc_limit (bool) –
use_groups (bool) –
extra_cui_filter (Optional[Set]) –

Return type:

StatsBuilder

medcat.stats.stats.get_stats(cat, data, epoch=0, use_project_filters=False, use_overlaps=False, use_cui_doc_limit=False, use_groups=False, extra_cui_filter=None, do_print=True)

TODO: Refactor and make nice Print metrics on a dataset (F1, P, R), it will also print the concepts that have the most FP,FN,TP.

Parameters:

cat – (CAT): The model pack.
data (Dict) – The json object that we get from MedCATtrainer on export.
epoch (int) – Used during training, so we know what epoch is it.
use_project_filters (bool) – Each project in MedCATtrainer can have filters, do we want to respect those filters when calculating metrics.
use_overlaps (bool) – Allow overlapping entities, nearly always False as it is very difficult to annotate overlapping entites.
use_cui_doc_limit (bool) – If True the metrics for a CUI will be only calculated if that CUI appears in a document, in other words if the document was annotated for that CUI. Useful in very specific situations when during the annotation process the set of CUIs changed.
use_groups (bool) – If True concepts that have groups will be combined and stats will be reported on groups.
extra_cui_filter (Optional[Set]) – This filter will be intersected with all other filters, or if all others are not set then only this one will be used.
do_print (bool) – Whether to print stats out. Defaults to True.

Returns:

fps (dict) – False positives for each CUI.
fns (dict) – False negatives for each CUI.
tps (dict) – True positives for each CUI.
cui_prec (dict) – Precision for each CUI.
cui_rec (dict) – Recall for each CUI.
cui_f1 (dict) – F1 for each CUI.
cui_counts (dict) – Number of occurrence for each CUI.
examples (dict) – Examples for each of the fp, fn, tp. Format will be examples[‘fp’][‘cui’][<list_of_examples>].

Return type:

Tuple

medcat.stats.stats

Module Contents

Classes

Functions

`medcat.stats.stats`