Language Embeddings#
Kashgari provides several embeddings for language representation. Embedding layers will convert input sequence to tensor for downstream task. Availabel embeddings list:
class name | description |
---|---|
BareEmbedding | random init tf.keras.layers.Embedding layer for text sequence embedding |
WordEmbedding | pre-trained Word2Vec embedding |
BERTEmbedding | pre-trained BERT embedding |
GPT2Embedding | pre-trained GPT-2 embedding |
NumericFeaturesEmbedding | random init tf.keras.layers.Embedding layer for numeric feature embedding |
StackedEmbedding | stack other embeddings for multi-input model |
All embedding classes inherit from the Embedding
class and implement the embed()
to embed your input sequence and embed_model
property which you need to build you own Model. By providing the embed()
function and embed_model
property, Kashgari hides the the complexity of different language embedding from users, all you need to care is which language embedding you need.
Quick start#
Feature Extract From Pre-trained Embedding#
Feature Extraction is one of the major way to use pre-trained language embedding. Kashgari provides simple API for this task. All you need to is init a embedding object then call embed
function. Here is the example. All embedding shares same embed API.
import kashgari
from kashgari.embeddings import BERTEmbedding
# need to spesify task for the downstream task,
# if use embedding for feature extraction, just set `task=kashgari.CLASSIFICATION`
bert = BERTEmbedding('<BERT_MODEL_FOLDER>',
task=kashgari.CLASSIFICATION,
sequence_length=100)
# call for bulk embed
embed_tensor = bert.embed([['语', '言', '模', '型']])
# call for single embed
embed_tensor = bert.embed_one(['语', '言', '模', '型'])
print(embed_tensor)
# array([[-0.5001117 , 0.9344998 , -0.55165815, ..., 0.49122602,
# -0.2049343 , 0.25752577],
# [-1.05762 , -0.43353617, 0.54398274, ..., -0.61096823,
# 0.04312163, 0.03881482],
# [ 0.14332692, -0.42566583, 0.68867105, ..., 0.42449307,
# 0.41105768, 0.08222893],
# ...,
# [-0.86124015, 0.08591427, -0.34404194, ..., 0.19915134,
# -0.34176797, 0.06111742],
# [-0.73940575, -0.02692179, -0.5826528 , ..., 0.26934686,
# -0.29708537, 0.01855129],
# [-0.85489404, 0.007399 , -0.26482674, ..., 0.16851354,
# -0.36805922, -0.0052386 ]], dtype=float32)
Classification and Labeling#
See details at classification and labeling tutorial.
Customized model#
You can access the tf.keras model of embedding and add your own layers or any kind customizion. Just need to access the embed_model
property of the embedding object.