medcat.utils.ner.metrics
Module Contents
Functions
|
Calculate metrics for a model's predictions, based off the tokenized output of a MedCATTrainer project. |
|
Check if a label is within a list of predictions, |
|
Evaluate predictions against sets of collected labels as collected and output from a MedCATTrainer project. |
Attributes
- medcat.utils.ner.metrics.logger
- medcat.utils.ner.metrics.metrics(p, return_df=False, plus_recall=0, tokenizer=None, dataset=None, merged_negative={0, 1, -100}, padding_label=-100, csize=15, subword_label=1, verbose=False)
Calculate metrics for a model’s predictions, based off the tokenized output of a MedCATTrainer project.
- Parameters:
p – The model’s predictions.
return_df – Whether to return a DataFrame of metrics.
plus_recall – The recall to add to the model’s predictions.
tokenizer – The tokenizer used to tokenize the texts.
dataset – The dataset used to train the model.
merged_negative – The negative labels to merge.
padding_label – The padding label.
csize – The size of the context window.
subword_label – The subword label.
verbose – Whether to print the metrics.
- Returns:
Dict – A dictionary of metrics.
- medcat.utils.ner.metrics._anno_within_pred_list(label, preds)
Check if a label is within a list of predictions,
- Parameters:
label (Dict) – an annotation likely from a MedCATTrainer project
preds (List[Dict]) – a list of predictions likely from a cat.__call__
- Returns:
bool – True if the label is within the list of predictions, False otherwise
- Return type:
bool
- medcat.utils.ner.metrics.evaluate_predictions(true_annotations, all_preds, texts, cui2preferred_name)
Evaluate predictions against sets of collected labels as collected and output from a MedCATTrainer project. Counts predictions as correct if the prediction fully encloses the label.
- Parameters:
true_annotations (List[List[Dict]]) – Ground truth predictions by text
all_preds (List[List[Dict]]) – Model predictions by text
texts (List[str]) – Original list of texts
cui2preferred_name (Dict[str, str]) – Dictionary of CUI to preferred name, likely to be cat.cdb.cui2preferred_name.
- Returns:
Tuple[pd.DataFrame, Dict] – A tuple containing a DataFrame of evaluation metrics and a dictionary of missed annotations per CUI.