:py:mod:`medcat.utils.meta_cat.ml_utils` ======================================== .. py:module:: medcat.utils.meta_cat.ml_utils Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: medcat.utils.meta_cat.ml_utils.FocalLoss Functions ~~~~~~~~~ .. autoapisummary:: medcat.utils.meta_cat.ml_utils.set_all_seeds medcat.utils.meta_cat.ml_utils.create_batch_piped_data medcat.utils.meta_cat.ml_utils.predict medcat.utils.meta_cat.ml_utils.split_list_train_test medcat.utils.meta_cat.ml_utils.print_report medcat.utils.meta_cat.ml_utils.train_model medcat.utils.meta_cat.ml_utils.eval_model Attributes ~~~~~~~~~~ .. autoapisummary:: medcat.utils.meta_cat.ml_utils.logger .. py:data:: logger .. py:function:: set_all_seeds(seed) .. py:function:: create_batch_piped_data(data, start_ind, end_ind, device, pad_id) Creates a batch given data and start/end that denote batch size, will also add padding and move to the right device. :param data: Data in the format: [[<[input_ids]>, , Optional[int]], ...], the third column is optional and represents the output label :type data: List[Tuple[List[int], int, Optional[int]]] :param start_ind: Start index of this batch :type start_ind: int :param end_ind: End index of this batch :type end_ind: int :param device: Where to move the data :type device: torch.device :param pad_id: Padding index :type pad_id: int :Returns: * **x ()** -- Same as data, but subsetted and as a tensor * **cpos ()** -- Center positions for the data * **attention_mask** -- Indicating padding mask for the data * **y** -- class label of the data .. py:function:: predict(model, data, config) Predict on data used in the meta_cat.pipe :param model: The model. :type model: nn.Module :param data: Data in the format: [[, ], ...] :type data: List[Tuple[List[int], int, Optional[int]]] :param config: Configuration for this meta_cat instance. :type config: ConfigMetaCAT :Returns: * **predictions** (*List[int]*) -- For each row of input data a prediction * **confidence** (*List[float]*) -- For each prediction a confidence value .. py:function:: split_list_train_test(data, test_size, shuffle = True) Shuffle and randomly split data :param data: The data. :type data: List :param test_size: The test size. :type test_size: float :param shuffle: Whether to shuffle the data. Defaults to True. :type shuffle: bool :Returns: **Tuple** -- The train data, and the test data. .. py:function:: print_report(epoch, running_loss, all_logits, y, name = 'Train') Prints some basic stats during training :param epoch: Number of epochs. :type epoch: int :param running_loss: The loss :type running_loss: List :param all_logits: List of logits :type all_logits: List :param y: The y array. :type y: Any :param name: The name of the report. Defaults to Train. :type name: str .. py:class:: FocalLoss(alpha=None, gamma=2) Bases: :py:obj:`torch.nn.Module` Base class for all neural network modules. Your models should also subclass this class. Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:: import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x)) Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:`to`, etc. .. note:: As per the example above, an ``__init__()`` call to the parent class must be made before assignment on the child. :ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool .. py:method:: __init__(alpha=None, gamma=2) Initialize internal Module state, shared by both nn.Module and ScriptModule. .. py:method:: forward(inputs, targets) .. py:function:: train_model(model, data, config, save_dir_path = None) Trains a LSTM model and BERT with autocheckpoints :param model: The model :type model: nn.Module :param data: The data. :type data: List :param config: MetaCAT config. :type config: ConfigMetaCAT :param save_dir_path: The save dir path if required. Defaults to None. :type save_dir_path: Optional[str] :Returns: **Dict** -- The classification report for the winner. :raises Exception: If auto-save is enabled but no save dir path is provided. .. py:function:: eval_model(model, data, config, tokenizer) Evaluate a trained model on the provided data :param model: The model. :type model: nn.Module :param data: The data. :type data: List :param config: The MetaCAT config. :type config: ConfigMetaCAT :param tokenizer: The tokenizer. :type tokenizer: TokenizerWrapperBase :Returns: **Dict** -- Results (precision, recall, f1, examples, confusion matrix)