dlpy.layers.MultiHeadAttention¶
-
class
dlpy.layers.
MultiHeadAttention
(n, n_attn_heads, name=None, act='AUTO', init=None, std=None, mean=None, truncation_factor=None, dropout=None, attn_dropout=None, include_bias=True, src_layers=None, **kwargs)¶ Bases: dlpy.layers.Layer
Multi-head attention layer from “Attention is All You Need” (Vaswani et al., NIPS 2017)
Parameters: - n : int
Specifies the number of neurons.
- n_attn_heads : int
Specifies the number of attention heads.
- name : string, optional
Specifies the name of the layer.
- act : string, optional
Specifies the activation function.
Valid Values: AUTO, IDENTITY, LOGISTIC, SIGMOID, EXP, TANH, RECTIFIER, RELU, GELU
Default: AUTO- init : string, optional
Specifies the initialization scheme for the layer.
Valid Values: XAVIER, UNIFORM, NORMAL, CAUCHY, XAVIER1, XAVIER2, MSRA, MSRA1, MSRA2
Default: XAVIER- std : float, optional
Specifies the standard deviation value when the init parameter is set to NORMAL.
- mean : float, optional
Specifies the mean value when the init parameter is set to NORMAL.
- truncation_factor : float, optional
Specifies the truncation threshold (truncationFactor x std), when the init parameter is set to NORMAL
- dropout : float, optional
Specifies the dropout rate.
Default: 0- attn_dropout : float, optional
Specifies the attention dropout rate.
Default: 0- include_bias : bool, optional
Includes bias neurons.
Default: True- src_layers : iter-of-Layers, optional
Specifies the layers directed to this layer.
Returns: -
__init__
(n, n_attn_heads, name=None, act='AUTO', init=None, std=None, mean=None, truncation_factor=None, dropout=None, attn_dropout=None, include_bias=True, src_layers=None, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__(n, n_attn_heads[, name, act, init, …]) Initialize self. count_instances() format_name([block_num, local_count]) Format the name of the layer get_number_of_instances() to_model_params() Convert the model configuration to CAS action parameters