`medcat.config_meta_cat`

Module Contents

Classes

`General`	The General part of the MetaCAT config
`Model`	The model part of the metaCAT config
`Train`	The train part of the metaCAT config
`ConfigMetaCAT`	The MetaCAT part of the config

class medcat.config_meta_cat.General

Bases: medcat.config.MixingConfig, medcat.config.BaseModel

The General part of the MetaCAT config

class Config

extra

validate_assignment = True

device: str = 'cpu'

disable_component_lock: bool = False

seed: int = 13

description: str = 'No description': Should provide a basic description of this MetaCAT model

category_name: medcat.config.Optional[str]: What category is this meta_cat model predicting/training

category_value2id: Dict: Map from category values to ID, if empty it will be autocalculated during training

vocab_size: medcat.config.Optional[int]: Will be set automatically if the tokenizer is provided during meta_cat init

lowercase: bool = True: If true all input text will be lowercased

cntx_left: int = 15: Number of tokens to take from the left of the concept

cntx_right: int = 10: Number of tokens to take from the right of the concept

replace_center: medcat.config.Optional[Any]: If set the center (concept) will be replaced with this string

batch_size_eval: int = 5000: Number of annotations to be meta-annotated at once in eval

annotate_overlapping: bool = False: If set meta_anns will be calcualted for doc._.ents, otherwise for doc.ents

tokenizer_name: str = 'bbpe': Tokenizer name used with of MetaCAT

save_and_reuse_tokens: bool = False: This is a dangerous option, if not sure ALWAYS set to False. If set, it will try to share the pre-calculated context tokens between MetaCAT models when serving. It will ignore differences in tokenizer and context size, so you need to be sure that the models for which this is turned on have the same tokenizer and context size, during a deployment.

pipe_batch_size_in_chars: int = 20000000: How many characters are piped at once into the meta_cat class

span_group: medcat.config.Optional[str]: If set, the spacy span group that the metacat model will assign annotations. Otherwise defaults to doc._.ents or doc.ents per the annotate_overlapping settings

class medcat.config_meta_cat.Model

Bases: medcat.config.MixingConfig, medcat.config.BaseModel

The model part of the metaCAT config

class Config

extra

validate_assignment = True

model_name: str = 'lstm'

When changing model, make sure to change the tokenizer as well

Type:: NOTE

model_variant: str = 'bert-base-uncased'

model_freeze_layers: bool = True

num_layers: int = 2

input_size: int = 300

hidden_size: int = 300

dropout: float = 0.5

phase_number: int = 0: Indicates whether or not two phase learning is being performed. 1: Phase 1 - Train model on undersampled data 2: Phase 2 - Continue training on full data 0: None - 2 phase learning is not performed

category_undersample: str = ''

model_architecture_config: Dict

num_directions: int = 2: 2 - bidirectional model, 1 - unidirectional

nclasses: int = 2: Number of classes that this model will output

padding_idx: int

emb_grad: bool = True: If True the embeddings will also be trained

ignore_cpos: bool = False: If set to True center positions will be ignored when calculating representation

class medcat.config_meta_cat.Train

Bases: medcat.config.MixingConfig, medcat.config.BaseModel

The train part of the metaCAT config

class Config

extra

validate_assignment = True

batch_size: int = 100

nepochs: int = 50

lr: float = 0.001

test_size: float = 0.1

shuffle_data: bool = True: Used only during training, if set the dataset will be shuffled before train/test split

class_weights: medcat.config.Optional[Any]

compute_class_weights: bool = False: If true and if class weights are not provided, the class weights will be calculated based on the data

score_average: str = 'weighted': What to use for averaging F1/P/R across labels

prerequisites: dict

cui_filter: medcat.config.Optional[Any]: If set only this CUIs will be used for training

auto_save_model: bool = True: Should do model be saved during training for best results

last_train_on: medcat.config.Optional[int]: When was the last training run

metric: Dict[str, str]: What metric should be used for choosing the best model

loss_funct: str = 'cross_entropy': Loss function for the model

gamma: int = 2: Focal Loss - how much the loss focuses on hard-to-classify examples.

class medcat.config_meta_cat.ConfigMetaCAT

Bases: medcat.config.MixingConfig, medcat.config.BaseModel

The MetaCAT part of the config

class Config

extra

validate_assignment = True

general: General

model: Model

train: Train

medcat.config_meta_cat

Module Contents

Classes

`medcat.config_meta_cat`