dlpy.layers.MultiHeadAttention

class dlpy.layers.MultiHeadAttention(n, n_attn_heads, name=None, act='AUTO', init=None, std=None, mean=None, truncation_factor=None, dropout=None, attn_dropout=None, include_bias=True, src_layers=None, **kwargs)

Bases: dlpy.layers.Layer

Multi-head attention layer from “Attention is All You Need” (Vaswani et al., NIPS 2017)

Parameters:
n : int

Specifies the number of neurons.

n_attn_heads : int

Specifies the number of attention heads.

name : string, optional

Specifies the name of the layer.

act : string, optional

Specifies the activation function.
Valid Values: AUTO, IDENTITY, LOGISTIC, SIGMOID, EXP, TANH, RECTIFIER, RELU, GELU
Default: AUTO

init : string, optional

Specifies the initialization scheme for the layer.
Valid Values: XAVIER, UNIFORM, NORMAL, CAUCHY, XAVIER1, XAVIER2, MSRA, MSRA1, MSRA2
Default: XAVIER

std : float, optional

Specifies the standard deviation value when the init parameter is set to NORMAL.

mean : float, optional

Specifies the mean value when the init parameter is set to NORMAL.

truncation_factor : float, optional

Specifies the truncation threshold (truncationFactor x std), when the init parameter is set to NORMAL

dropout : float, optional

Specifies the dropout rate.
Default: 0

attn_dropout : float, optional

Specifies the attention dropout rate.
Default: 0

include_bias : bool, optional

Includes bias neurons.
Default: True

src_layers : iter-of-Layers, optional

Specifies the layers directed to this layer.

Returns:
MultiHeadAttention
__init__(n, n_attn_heads, name=None, act='AUTO', init=None, std=None, mean=None, truncation_factor=None, dropout=None, attn_dropout=None, include_bias=True, src_layers=None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(n, n_attn_heads[, name, act, init, …]) Initialize self.
count_instances()
format_name([block_num, local_count]) Format the name of the layer
get_number_of_instances()
to_model_params() Convert the model configuration to CAS action parameters