dlpy.applications.Faster_RCNN

dlpy.applications.Faster_RCNN(conn, model_table='Faster_RCNN', n_channels=3, width=1000, height=496, scale=1, norm_stds=None, offsets=(102.9801, 115.9465, 122.7717), random_mutation=None, n_classes=20, anchor_num_to_sample=256, anchor_ratio=[0.5, 1, 2], anchor_scale=[8, 16, 32], base_anchor_size=16, coord_type='coco', max_label_per_image=200, proposed_roi_num_train=2000, proposed_roi_num_score=300, roi_train_sample_num=128, roi_pooling_height=7, roi_pooling_width=7, nms_iou_threshold=0.3, detection_threshold=0.5, max_object_num=50, number_of_neurons_in_fc=4096, backbone='vgg16', random_flip=None, random_crop=None)

Generates a deep learning model with the faster RCNN architecture.

Parameters:
conn : CAS

Specifies the connection of the CAS connection.

model_table : string, optional

Specifies the name of CAS table to store the model.

n_channels : int, optional

Specifies the number of the channels (i.e., depth) of the input layer.
Default: 3

width : int, optional

Specifies the width of the input layer.
Default: 1000

height : int, optional

Specifies the height of the input layer.
Default: 496

scale : double, optional

Specifies a scaling factor to be applied to each pixel intensity values.
Default: 1

norm_stds : double or iter-of-doubles, optional

Specifies a standard deviation for each channel in the input data. The final input data is normalized with specified means and standard deviations.

offsets : double or iter-of-doubles, optional

Specifies an offset for each channel in the input data. The final input data is set after applying scaling and subtracting the specified offsets.

random_mutation : string, optional

Specifies how to apply data augmentations/mutations to the data in the input layer.
Valid Values: ‘none’, ‘random’

n_classes : int, optional

Specifies the number of classes. If None is assigned, the model will automatically detect the number of classes based on the training set.
Default: 20

anchor_num_to_sample : int, optional

Specifies the number of anchors to sample for training the region proposal network
Default: 256

anchor_ratio : iter-of-float

Specifies the anchor height and width ratios (h/w) used.

anchor_scale : iter-of-float

Specifies the anchor scales used based on base_anchor_size

base_anchor_size : int, optional

Specifies the basic anchor size in width and height (in pixels) in the original input image dimension
Default: 16

coord_type : int, optional

Specifies the coordinates format type in the input label and detection result.
Valid Values: RECT, COCO, YOLO
Default: COCO

proposed_roi_num_score: int, optional

Specifies the number of ROI (Region of Interest) to propose in the scoring phase
Default: 300

proposed_roi_num_train: int, optional

Specifies the number of ROI (Region of Interest) to propose used for RPN training, and also the pool to sample from for FastRCNN Training in the training phase
Default: 2000

roi_train_sample_num: int, optional

Specifies the number of ROIs(Regions of Interests) to sample after NMS(Non-maximum Suppression) is performed in the training phase.
Default: 128

roi_pooling_height : int, optional

Specifies the output height of the region pooling layer.
Default: 7

roi_pooling_width : int, optional

Specifies the output width of the region pooling layer.
Default: 7

max_label_per_image : int, optional

Specifies the maximum number of labels per image in the training.
Default: 200

nms_iou_threshold: float, optional

Specifies the IOU threshold of maximum suppression in object detection
Default: 0.3

detection_threshold : float, optional

Specifies the threshold for object detection.
Default: 0.5

max_object_num: int, optional

Specifies the maximum number of object to detect
Default: 50

number_of_neurons_in_fc: int, or list of int, optional

Specifies the number of neurons in the last two fully connected layers. If one int is set, then both of the layers will have the same values. If a list is set, then the layers get different number of neurons.
Default: 4096

backbone: string, optional

Specifies the architecture to be used as the feature extractor.
Valid Values: vgg16
Default: vgg16, resnet50, resnet18, resnet34, mobilenetv1, mobilenetv2

random_flip : string, optional

Specifies how to flip the data in the input layer when image data is used. Approximately half of the input data is subject to flipping.
Valid Values: ‘h’, ‘hv’, ‘v’, ‘none’

random_crop : string, optional

Specifies how to crop the data in the input layer when image data is used. Images are cropped to the values that are specified in the width and height parameters. Only the images with one or both dimensions that are larger than those sizes are cropped.
Valid Values: ‘none’, ‘unique’, ‘randomresized’, ‘resizethencrop’

Returns:
Sequential

References

https://arxiv.org/abs/1506.01497