medcat.config_meta_cat
Module Contents
Classes
The General part of the MetaCAT config |
|
The model part of the metaCAT config |
|
The train part of the metaCAT config |
|
The MetaCAT part of the config |
- class medcat.config_meta_cat.General
Bases:
medcat.config.MixingConfig
,medcat.config.BaseModel
The General part of the MetaCAT config
- device: str = 'cpu'
- disable_component_lock: bool = False
- seed: int = 13
- description: str = 'No description'
Should provide a basic description of this MetaCAT model
- category_name: medcat.config.Optional[str]
What category is this meta_cat model predicting/training
- category_value2id: Dict
Map from category values to ID, if empty it will be autocalculated during training
- vocab_size: medcat.config.Optional[int]
Will be set automatically if the tokenizer is provided during meta_cat init
- lowercase: bool = True
If true all input text will be lowercased
- cntx_left: int = 15
Number of tokens to take from the left of the concept
- cntx_right: int = 10
Number of tokens to take from the right of the concept
- replace_center: medcat.config.Optional[Any]
If set the center (concept) will be replaced with this string
- batch_size_eval: int = 5000
Number of annotations to be meta-annotated at once in eval
- annotate_overlapping: bool = False
If set meta_anns will be calcualted for doc._.ents, otherwise for doc.ents
- tokenizer_name: str = 'bbpe'
Tokenizer name used with of MetaCAT
- save_and_reuse_tokens: bool = False
This is a dangerous option, if not sure ALWAYS set to False. If set, it will try to share the pre-calculated context tokens between MetaCAT models when serving. It will ignore differences in tokenizer and context size, so you need to be sure that the models for which this is turned on have the same tokenizer and context size, during a deployment.
- pipe_batch_size_in_chars: int = 20000000
How many characters are piped at once into the meta_cat class
- span_group: medcat.config.Optional[str]
If set, the spacy span group that the metacat model will assign annotations. Otherwise defaults to doc._.ents or doc.ents per the annotate_overlapping settings
- class medcat.config_meta_cat.Model
Bases:
medcat.config.MixingConfig
,medcat.config.BaseModel
The model part of the metaCAT config
- model_name: str = 'lstm'
When changing model, make sure to change the tokenizer as well
- Type:
NOTE
- model_variant: str = 'bert-base-uncased'
- model_freeze_layers: bool = True
- num_layers: int = 2
- input_size: int = 300
- dropout: float = 0.5
- phase_number: int = 0
Indicates whether or not two phase learning is being performed. 1: Phase 1 - Train model on undersampled data 2: Phase 2 - Continue training on full data 0: None - 2 phase learning is not performed
- category_undersample: str = ''
- model_architecture_config: Dict
- num_directions: int = 2
2 - bidirectional model, 1 - unidirectional
- nclasses: int = 2
Number of classes that this model will output
- padding_idx: int
- emb_grad: bool = True
If True the embeddings will also be trained
- ignore_cpos: bool = False
If set to True center positions will be ignored when calculating representation
- class medcat.config_meta_cat.Train
Bases:
medcat.config.MixingConfig
,medcat.config.BaseModel
The train part of the metaCAT config
- batch_size: int = 100
- nepochs: int = 50
- lr: float = 0.001
- test_size: float = 0.1
- shuffle_data: bool = True
Used only during training, if set the dataset will be shuffled before train/test split
- class_weights: medcat.config.Optional[Any]
- compute_class_weights: bool = False
If true and if class weights are not provided, the class weights will be calculated based on the data
- score_average: str = 'weighted'
What to use for averaging F1/P/R across labels
- prerequisites: dict
- cui_filter: medcat.config.Optional[Any]
If set only this CUIs will be used for training
- auto_save_model: bool = True
Should do model be saved during training for best results
- last_train_on: medcat.config.Optional[int]
When was the last training run
- metric: Dict[str, str]
What metric should be used for choosing the best model
- loss_funct: str = 'cross_entropy'
Loss function for the model
- gamma: int = 2
Focal Loss - how much the loss focuses on hard-to-classify examples.