跳转至

tasks.classification#

All Text classification models share the same API.

__init__#

def __init__(self,
             embedding: Optional[Embedding] = None,
             hyper_parameters: Optional[Dict[str, Dict[str, Any]]] = None)

Args:

  • embedding: model embedding
  • hyper_parameters: a dict of hyper_parameters.

You could change customize hyper_parameters like this:

# get default hyper_parameters
hyper_parameters = BiLSTM_Model.get_default_hyper_parameters()
# change lstm hidden unit to 12
hyper_parameters['layer_blstm']['units'] = 12
# init new model with customized hyper_parameters
labeling_model = BiLSTM_Model(hyper_parameters=hyper_parameters)
labeling_model.fit(x, y)

Properties#

token2idx#

Returns model's token index map, type: Dict[str, int]

label2idx#

Returns model's label index map, type: Dict[str, int]

Methods#

get_default_hyper_parameters#

Return the defualt hyper parameters

You must implement this function when customizing a model

When you are customizing your own model, you must implement this function.

Customization example: customize-your-own-mode

@classmethod
def get_default_hyper_parameters(cls) -> Dict[str, Dict[str, Any]]:

Returns:

  • dict of the defualt hyper parameters

build_model_arc#

build model architectural, define models structure in this function.

You must implement this function when customizing a model

When you are customizing your own model, you must implement this function.

Customization example: customize-your-own-mode

def build_model_arc(self):

build_model#

build model with corpus

def build_model(self,
                x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                y_train: Union[List[List[str]], List[str]],
                x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                y_validate: Union[List[List[str]], List[str]] = None)

Args:

  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data

build_multi_gpu_model#

Build multi-GPU model with corpus

def build_multi_gpu_model(self,
                            gpus: int,
                            x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                            y_train: Union[List[List[str]], List[str]],
                            cpu_merge: bool = True,
                            cpu_relocation: bool = False,
                            x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                            y_validate: Union[List[List[str]], List[str]] = None):

Args:

  • gpus: Integer >= 2, number of on GPUs on which to create model replicas.
  • cpu_merge: A boolean value to identify whether to force merging model weights under the scope of the CPU or not.
  • cpu_relocation: A boolean value to identify whether to create the model's weights under the scope of the CPU. If the model is not defined under any preceding device scope, you can still rescue it by activating this option.
  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data

build_tpu_model#

Build TPU model with corpus

def build_tpu_model(self, strategy: tf.contrib.distribute.TPUStrategy,
                    x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                    y_train: Union[List[List[str]], List[str]],
                    x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                    y_validate: Union[List[List[str]], List[str]] = None):

Args:

  • strategy: TPUDistributionStrategy. The strategy to use for replicating model across multiple TPU cores.
  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data

compile_model#

Configures the model for training.

Using compile() function of tf.keras.Model

def compile_model(self, **kwargs):

Args:

  • **kwargs: arguments passed to compile() function of tf.keras.Model

Defaults:

  • loss: categorical_crossentropy
  • optimizer: adam
  • metrics: ['accuracy']

get_data_generator#

data generator for fit_generator

def get_data_generator(self,
                        x_data,
                        y_data,
                        batch_size: int = 64,
                        shuffle: bool = True)

Args:

  • x_data: Array of feature data (if the model has a single input), or tuple of feature data array (if the model has multiple inputs)
  • y_data: Array of label data
  • batch_size: Number of samples per gradient update, default to 64.
  • shuffle:

Returns:

  • data generator

fit#

Trains the model for a given number of epochs with fit_generator (iterations on a dataset).

def fit(self,
        x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
        y_train: Union[List[List[str]], List[str]],
        x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
        y_validate: Union[List[List[str]], List[str]] = None,
        batch_size: int = 64,
        epochs: int = 5,
        callbacks: List[keras.callbacks.Callback] = None,
        fit_kwargs: Dict = None):

Args:

  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data
  • batch_size: Number of samples per gradient update, default to 64.
  • epochs: Integer. Number of epochs to train the model. default 5.
  • callbacks:
  • fit_kwargs: additional arguments passed to fit_generator() function from tensorflow.keras.Model

Returns:

  • A tf.keras.callbacks.History object.

fit_without_generator#

Trains the model for a given number of epochs (iterations on a dataset). Large memory Cost.

def fit_without_generator(self,
                            x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                            y_train: Union[List[List[str]], List[str]],
                            x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                            y_validate: Union[List[List[str]], List[str]] = None,
                            batch_size: int = 64,
                            epochs: int = 5,
                            callbacks: List[keras.callbacks.Callback] = None,
                            fit_kwargs: Dict = None):

Args:

  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data
  • batch_size: Number of samples per gradient update, default to 64.
  • epochs: Integer. Number of epochs to train the model. default 5.
  • callbacks:
  • fit_kwargs: additional arguments passed to fit_generator() function from tensorflow.keras.Model

Returns:

  • A tf.keras.callbacks.History object.

predict#

Generates output predictions for the input samples. Computation is done in batches.

def predict(self,
            x_data,
            batch_size=None,
            multi_label_threshold: float = 0.5,
            debug_info=False,
            predict_kwargs: Dict = None):

Args:

  • x_data: The input data, as a Numpy array (or list of Numpy arrays if the model has multiple inputs).
  • batch_size: Integer. If unspecified, it will default to 32.
  • multi_label_threshold:
  • debug_info: Bool, Should print out the logging info.
  • predict_kwargs: Dict, arguments passed to predict() function of tensorflow.keras.Model

Returns:

  • array of predictions.

predict_top_k_class#

Generates output predictions with confidence for the input samples.

Computation is done in batches.

def predict_top_k_class(self,
                        x_data,
                        top_k=5,
                        batch_size=32,
                        debug_info=False,
                        predict_kwargs: Dict = None) -> List[Dict]:

Args:

  • x_data: The input data, as a Numpy array (or list of Numpy arrays if the model has multiple inputs).
  • top_k: int
  • batch_size: Integer. If unspecified, it will default to 32.
  • debug_info: Bool, Should print out the logging info.
  • predict_kwargs: Dict, arguments passed to predict() function of tensorflow.keras.Model

Returns:

array(s) of prediction result dict.

  • sample result of single-label classification:
[
  {
    "label": "chat",
    "confidence": 0.5801531,
    "candidates": [
      { "label": "cookbook", "confidence": 0.1886314 },
      { "label": "video", "confidence": 0.13805099 },
      { "label": "health", "confidence": 0.013852648 },
      { "label": "translation", "confidence": 0.012913573 }
    ]
  }
]
  • sample result of multi-label classification:
[
  {
    "candidates": [
      { "confidence": 0.9959336, "label": "toxic" },
      { "confidence": 0.9358089, "label": "obscene" },
      { "confidence": 0.6882098, "label": "insult" },
      { "confidence": 0.13540423, "label": "severe_toxic" },
      { "confidence": 0.017219543, "label": "identity_hate" }
    ]
  }
]

evaluate#

Evaluate model

def evaluate(self,
            x_data,
            y_data,
            batch_size=None,
            digits=4,
            debug_info=False) -> Tuple[float, float, Dict]:

Args:

  • x_data:
  • y_data:
  • batch_size:
  • digits:
  • debug_info:

save#

Save model info json and model weights to given folder path

def save(self, model_path: str):

Args:

  • model_path: target model folder path

info#

Returns a dictionary containing the configuration of the model.

def info(self)