:py:mod:`medcat.utils.ner.deid` =============================== .. py:module:: medcat.utils.ner.deid .. autoapi-nested-parse:: De-identification model. This describes a wrapper on the regular CAT model. The idea is to simplify the use of a DeId-specific model. It tackles two use cases 1) Creation of a deid model 2) Loading and use of a deid model I.e for use case 1: Instead of: cat = CAT(cdb=ner.cdb, addl_ner=ner) You can use: deid = DeIdModel.create(ner) And for use case 2: Instead of: cat = CAT.load_model_pack(model_pack_path) anon_text = deid_text(cat, text) You can use: deid = DeIdModel.load_model_pack(model_pack_path) anon_text = deid.deid_text(text) Or if/when structured output is desired: deid = DeIdModel.load_model_pack(model_pack_path) anon_doc = deid(text) # the spacy document The wrapper also exposes some CAT parts directly: - config - cdb Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.ner.deid.DeIdModel Functions ~~~~~~~~~ .. autoapisummary:: medcat.utils.ner.deid.match_rules medcat.utils.ner.deid.merge_all_preds medcat.utils.ner.deid.merge_preds Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.ner.deid.logger .. py:data:: logger .. py:class:: DeIdModel(cat) Bases: :py:obj:`medcat.utils.ner.model.NerModel` The DeID model. This wraps a CAT instance and simplifies its use as a de-identification model. It provides methods for creating one from a TransformersNER as well as loading from a model pack (along with some validation). It also exposes some useful parts of the CAT it wraps such as the config and the concept database. .. py:method:: __init__(cat) .. py:method:: train(json_path = None, *args, **kwargs) Train the underlying transformers NER model. All the extra arguments are passed to the TransformersNER train method. :param json_path: The JSON file path to read the training data from. :type json_path: Union[str, list, None] :param train_nr: The number of the NER object in cat._addl_train to train. Defaults to 0. :type train_nr: int :param \*args: Additional arguments for TransformersNER.train . :param \*\*kwargs: Additional keyword arguments for TransformersNER.train . :Returns: **Tuple[Any, Any, Any]** -- df, examples, dataset .. py:method:: eval(json_path, *args, **kwargs) Evaluate the underlying transformers NER model. All the extra arguments are passed to the TransformersNER eval method. :param json_path: The JSON file path to read the training data from. :type json_path: Union[str, list, None] :param train_nr: The number of the NER object in cat._addl_train to train. Defaults to 0. :type train_nr: int :param \*args: Additional arguments for TransformersNER.eval . :param \*\*kwargs: Additional keyword arguments for TransformersNER.eval . :Returns: **Tuple[Any, Any, Any]** -- df, examples, dataset .. py:method:: deid_text(text, redact = False) Deidentify text and potentially redact information. De-identified text. If redaction is enabled, identifiable entities will be replaced with starts (e.g `*****`). Otherwise, the replacement will be the CUI or in other words, the type of information that was hidden (e.g [PATIENT]). :param text: The text to deidentify. :type text: str :param redact: Whether to redact the information. :type redact: bool :Returns: **str** -- The deidentified text. .. py:method:: deid_multi_texts(texts, redact = False, addl_info = ['cui2icd10', 'cui2ontologies', 'cui2snomed'], n_process = None, batch_size = None) Deidentify text on multiple branches :param texts: Text to be annotated :type texts: Union[Iterable[str], Iterable[Tuple]] :param redact: Whether to redact the information. :type redact: bool :param addl_info: Additional info. Defaults to ['cui2icd10', 'cui2ontologies', 'cui2snomed']. :type addl_info: List[str], optional :param n_process: Number of processes. Defaults to None. :type n_process: Optional[int], optional :param batch_size: The size of a batch. Defaults to None. :type batch_size: Optional[int], optional :raises ValueError: In case of unsupported input. :Returns: **List[str]** -- List of deidentified documents. .. py:method:: load_model_pack(model_pack_path, config = None) :classmethod: Load DeId model from model pack. The method first loads the CAT instance. It then makes sure that the model pack corresponds to a valid DeId model. :param config: Config for DeId model pack (primarily for stride of overlap window) :param model_pack_path: The model pack path. :type model_pack_path: str :raises ValueError: If the model pack does not correspond to a DeId model. :Returns: **DeIdModel** -- The resulting DeI model. .. py:method:: _is_deid_model(cat) :classmethod: .. py:method:: _get_reason_not_deid(cat) :classmethod: .. py:function:: match_rules(rules, texts, cui2preferred_name) Match a set of rules - pat / cui combos as post processing labels. Uses a cat DeID model for pretty name mapping. :param rules: List of tuples of pattern and cui :type rules: List[Tuple[str, str]] :param texts: List of texts to match rules on :type texts: List[str] :param cui2preferred_name: Dictionary of CUI to preferred name, likely to be cat.cdb.cui2preferred_name. :type cui2preferred_name: Dict[str, str] .. rubric:: Examples >>> cat = CAT.load_model_pack(model_pack_path) ... >>> rules = [ ('(123) 456-7890', '134'), ('1234567890', '134'), ('123.456.7890', '134'), ('1234567890', '134'), ('1234567890', '134'), ] >>> texts = [ 'My phone number is (123) 456-7890', 'My phone number is 1234567890', 'My phone number is 123.456.7890', 'My phone number is 1234567890', ] >>> matches = match_rules(rules, texts, cat.cdb.cui2preferred_name) :Returns: **List[List[Dict]]** -- List of lists of predictions from `match_rules` .. py:function:: merge_all_preds(model_preds_by_text, rule_matches_per_text, accept_preds = True) Conveniance method to merge predictions from rule based and deID model predictions. :param model_preds_by_text: list of predictions from `cat.get_entities()`, then `[list(m['entities'].values()) for m in model_preds]` :type model_preds_by_text: List[Dict] :param rule_matches_per_text: list of predictions from output of running `match_rules` :type rule_matches_per_text: List[Dict] :param accept_preds: uses the predicted label from the model, model_preds_by_text, over the rule matches if they overlap. Defaults to using model preds over rules. :type accept_preds: bool :Returns: **List[List[Dict]]** -- List of lists of predictions from `merge_all_preds` .. py:function:: merge_preds(model_preds, rule_matches, accept_preds = True) Merge predictions from rule based and deID model predictions. :param model_preds: predictions from `cat.get_entities()` :type model_preds: List[Dict] :param rule_matches: predictions from output of running `match_rules` on a text :type rule_matches: List[Dict] :param accept_preds: uses the predicted label from the model, model_preds, over the rule matches if they overlap. Defaults to using model preds over rules. :type accept_preds: bool .. rubric:: Examples >>> # a list of predictions from `cat.get_entities()` >>> model_preds = [ [ {'cui': '134', 'start': 10, 'end': 20, 'acc': 1.0, 'pretty_name': 'Phone Number'}, {'cui': '134', 'start': 25, 'end': 35, 'acc': 1.0, 'pretty_name': 'Phone Number'} ] ] >>> # a list of predictions from `match_rules` >>> rule_matches = [ [ {'cui': '134', 'start': 10, 'end': 20, 'acc': 1.0, 'pretty_name': 'Phone Number'}, {'cui': '134', 'start': 25, 'end': 35, 'acc': 1.0, 'pretty_name': 'Phone Number'} ] ] >>> merged_preds = merge_preds(model_preds, rule_matches) :Returns: **List[Dict]** -- List of predictions from `merge_preds`