medcat.utils.ner
Submodules
Package Contents
Functions
|
Creates a new CDB or updates an existing one with new |
- medcat.utils.ner.metrics(p, return_df=False, plus_recall=0, tokenizer=None, dataset=None, merged_negative={0, 1, -100}, padding_label=-100, csize=15, subword_label=1, verbose=False)
Calculate metrics for a model’s predictions, based off the tokenized output of a MedCATTrainer project.
- Parameters:
p – The model’s predictions.
return_df – Whether to return a DataFrame of metrics.
plus_recall – The recall to add to the model’s predictions.
tokenizer – The tokenizer used to tokenize the texts.
dataset – The dataset used to train the model.
merged_negative – The negative labels to merge.
padding_label – The padding label.
csize – The size of the context window.
subword_label – The subword label.
verbose – Whether to print the metrics.
- Returns:
Dict – A dictionary of metrics.
- medcat.utils.ner.make_or_update_cdb(json_path, cdb=None, min_count=0)
Creates a new CDB or updates an existing one with new concepts if the cdb argument is provided. All concepts that are less frequent than min_count will be ignored.
- Parameters:
json_path (str) – The json path
cdb (Optional[CDB]) – The CDB if present. Defaults to None.
min_count (int) – Minimum count to include. Defaults to 0.
- Returns:
CDB – The same or new CDB.
- Return type: