medcat.utils.ner

Submodules

Package Contents

Functions

metrics

make_or_update_cdb(json_path[, cdb, min_count])

Creates a new CDB or updates an existing one with new

medcat.utils.ner.metrics(p, return_df=False, plus_recall=0, tokenizer=None, dataset=None, merged_negative={0, 1, -100}, padding_label=-100, csize=15, subword_label=1, verbose=False)

Calculate metrics for a model’s predictions, based off the tokenized output of a MedCATTrainer project.

Parameters:
  • p – The model’s predictions.

  • return_df – Whether to return a DataFrame of metrics.

  • plus_recall – The recall to add to the model’s predictions.

  • tokenizer – The tokenizer used to tokenize the texts.

  • dataset – The dataset used to train the model.

  • merged_negative – The negative labels to merge.

  • padding_label – The padding label.

  • csize – The size of the context window.

  • subword_label – The subword label.

  • verbose – Whether to print the metrics.

Returns:

Dict – A dictionary of metrics.

medcat.utils.ner.make_or_update_cdb(json_path, cdb=None, min_count=0)

Creates a new CDB or updates an existing one with new concepts if the cdb argument is provided. All concepts that are less frequent than min_count will be ignored.

Parameters:
  • json_path (str) – The json path

  • cdb (Optional[CDB]) – The CDB if present. Defaults to None.

  • min_count (int) – Minimum count to include. Defaults to 0.

Returns:

CDB – The same or new CDB.

Return type:

medcat.cdb.CDB