dlpy.layers.MultiHeadAttention¶

class dlpy.layers.MultiHeadAttention(n, n_attn_heads, name=None, act='AUTO', init=None, std=None, mean=None, truncation_factor=None, dropout=None, attn_dropout=None, include_bias=True, src_layers=None, **kwargs)¶

Bases: dlpy.layers.Layer

Multi-head attention layer from “Attention is All You Need” (Vaswani et al., NIPS 2017)

Parameters:

n : int: Specifies the number of neurons.
n_attn_heads : int: Specifies the number of attention heads.
name : string, optional: Specifies the name of the layer.
act : string, optional: Specifies the activation function.
Valid Values: AUTO, IDENTITY, LOGISTIC, SIGMOID, EXP, TANH, RECTIFIER, RELU, GELU
Default: AUTO
init : string, optional: Specifies the initialization scheme for the layer.
Valid Values: XAVIER, UNIFORM, NORMAL, CAUCHY, XAVIER1, XAVIER2, MSRA, MSRA1, MSRA2
Default: XAVIER
std : float, optional: Specifies the standard deviation value when the init parameter is set to NORMAL.
mean : float, optional: Specifies the mean value when the init parameter is set to NORMAL.
truncation_factor : float, optional: Specifies the truncation threshold (truncationFactor x std), when the init parameter is set to NORMAL
dropout : float, optional: Specifies the dropout rate.
Default: 0
attn_dropout : float, optional: Specifies the attention dropout rate.
Default: 0
include_bias : bool, optional: Includes bias neurons.
Default: True
src_layers : iter-of-Layers, optional: Specifies the layers directed to this layer.

Returns:

MultiHeadAttention

__init__(n, n_attn_heads, name=None, act='AUTO', init=None, std=None, mean=None, truncation_factor=None, dropout=None, attn_dropout=None, include_bias=True, src_layers=None, **kwargs)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(n, n_attn_heads[, name, act, init, …])	Initialize self.
count_instances()
format_name([block_num, local_count])	Format the name of the layer
get_number_of_instances()
to_model_params()	Convert the model configuration to CAS action parameters