dlpy.model.Optimizer¶

class dlpy.model.Optimizer(algorithm=<dlpy.model.VanillaSolver object>, mini_batch_size=1, seed=0, max_epochs=1, reg_l1=0, reg_l2=0, dropout=0, dropout_input=0, dropout_type='standard', stagnation=0, threshold=1e-08, f_conv=0, snapshot_freq=0, log_level=0, bn_src_layer_warnings=True, freeze_layers_to=None, flush_weights=False, total_mini_batch_size=None, mini_batch_buf_size=None, freeze_layers=None, freeze_batch_norm_stats=None)¶

Bases: dlpy.utils.DLPyDict

Optimizer object

Parameters:

algorithm : Algorithm, optional: Specifies the deep learning algorithm.
mini_batch_size : int, optional: Specifies the number of observations per thread in a mini-batch. You can use this parameter to control the number of observations that the action uses on each worker for each thread to compute the gradient prior to updating the weights. Larger values use more memory. When synchronous SGD is used (the default), the total mini-batch size is equal to miniBatchSize * number of threads * number of workers. When asynchronous SGD is used (by specifying the elasticSyncFreq parameter), each worker trains its own local model. In this case, the total mini-batch size for each worker is miniBatchSize * number of threads.
seed : double, optional: Specifies the random number seed for the random number generator in SGD. The default value, 0, and negative values indicate to use random number streams based on the computer clock. Specify a value that is greater than 0 for a reproducible random number sequence.
max_epochs : int, optional: Specifies the maximum number of epochs. For SGD with a single-machine server or a session that uses one worker on a distributed server, one epoch is reached when the action passes through the data one time. For a session that uses more than one worker, one epoch is reached when all the workers exchange the weights with the controller one time. The syncFreq parameter specifies the number of times each worker passes through the data before exchanging weights with the controller. For L-BFGS with full batch, each L-BFGS iteration might process more than one epoch, and final number of epochs might exceed the maximum number of epochs.
reg_l1 : double, optional: Specifies the weight for the L1 regularization term. By default, L1 regularization is not performed and a value of 0 also disables the regularization. Begin with small values such as 1e-6. L1 regularization can be combined with L2 regularization.
reg_l2 : double, optional: Specifies the weight for the L2 regularization term. By default, L2 regularization is not performed and a value of 0 also disables the regularization. Begin with small values such as 1e-3. L1 regularization can be combined with L2 regularization.
dropout : double, optional: Specifies the probability that the output of a neuron in a fully connected layer will be set to zero during training. The specified probability is recalculated each time an observation is processed.
dropout_input : double, optional: Specifies the probability that an input variable will be set to zero during training. The specified probability is recalculated each time an observation is processed.
dropout_type : string, optional: Specifies what type of dropout to use.
Valid Values: STANDARD, INVERTED
Default: STANDARD
stagnation : int, optional: Specifies the number of successive iterations without improvement before stopping the optimization early. When the validTable parameter is not specified, the loss error is monitored for stagnation. When the validTable parameter is specified, the validation scores are monitored for stagnation.
threshold : double, optional: Specifies the threshold that is used to determine whether the loss error or validation score is improving or is stagnating. When abs(current_score - previous_score) <= abs(current_score)*threshold, the current iteration does not improve the optimization and the stagnation counter is incremented. Otherwise, the stagnation counter is set to zero.
f_conv : double, optional: Specifies the relative function convergence criterion. If the relative loss error abs(previous_loss - current_loss) / abs(previous_loss) does not result in a change in the objective function, then the optimization is stopped. By default, the relative function convergence is not checked.
snapshot_freq : int, optional: Specifies the frequency for generating snapshots of the neural weights and storing the weights in a weight table during the training process. When asynchronous SGD is used, the action synchronizes all the weights before writing out the weights.
log_level : int, optional: Specifies how progress messages are sent to the client. The default value, 0, indicates that no messages are sent. Specify 1 to receive start and end messages. Specify 2 to include the iteration history.
bn_src_layer_warnings : bool, optional: Turns warning on or off, if batch normalization source layer has an atypical type, activation, or include_bias setting.
Default: False
total_mini_batch_size : int, optional: specifies the number of observations in a mini-batch. You can use this parameter to control the number of observations that the action uses to compute the gradient prior to updating the weights. Larger values use more memory. If the specified size cannot be evenly divided by the number of threads (if using asynchronous SGD), or the number of threads * number of workers (if using synchronous SGD), then the action will terminate with an error unless the round parameter was specified to be TRUE, in which case, the total mini-batch size will be rounded up so that it will be evenly divided.
flush_weights : bool, optional: Specifies whether flush the weight table to the disk.
Default: False
mini_batch_buf_size : int, optional: specifies the size of a buffer that is used to save input data and intermediate calculations. By default, each layer allocates an input buffer that is equal to the number of input channels multiplied by the input feature map size multiplied by the bufferSize value. You can reduce memory usage by specifying a value that is smaller than the bufferSize. The only disadvantage to specifying a small value is that run time can increase because multiple smaller matrices must be multiplied instead of a single large matrix multiply.
freeze_layers_to : string: Specifies a layer name to freeze this layer and all the layers before this layer.
freeze_batch_norm_stats : Boolean: When set to True, freezes the statistics of all batch normalization layers.
freeze_layers : list of string: Specifies a list of layer names whose trainable parameters will be frozen.

Returns:

Optimizer

__init__(algorithm=<dlpy.model.VanillaSolver object>, mini_batch_size=1, seed=0, max_epochs=1, reg_l1=0, reg_l2=0, dropout=0, dropout_input=0, dropout_type='standard', stagnation=0, threshold=1e-08, f_conv=0, snapshot_freq=0, log_level=0, bn_src_layer_warnings=True, freeze_layers_to=None, flush_weights=False, total_mini_batch_size=None, mini_batch_buf_size=None, freeze_layers=None, freeze_batch_norm_stats=None)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([algorithm, mini_batch_size, seed, …])	Initialize self.
add_optimizer_mode([solver_mode_type, …])	Sets the mode of the solver.
clear()
get(k[,d])
items()
keys()
pop(k[,d])	If key is not found, d is returned if given, otherwise KeyError is raised.
popitem()	as a 2-tuple; but raise KeyError if D is empty.
setdefault(k[,d])
update([E, ]**F)	If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
values()