dlpy.timeseries.TimeseriesTable.prepare_subsequences

TimeseriesTable.prepare_subsequences(seq_len, target, predictor_timeseries=None, timeid=None, groupby=None, input_length_name='xlen', target_length_name='ylen', missing_handling='drop')

Prepare the subsequences that will be pass into RNN

Parameters:
seq_len : int

subsequence length that will be passed onto RNN.

target : string

the target variable for RNN. Currenly only support univariate target, so only string is accepted here, not list of strings.

predictor_timeseries : string or list-of-strings, optional

Timeseries that will be used to predict target. They will be preprocessed into subsequences as well. If None, it will take the target timeseries as the predictor, which corresponds to auto-regressive models.
Default: None

timeid : string, optional

Specifies the column name for the timeid. If None, it will take the timeid specified in timeseries_accumlation.
Default: None

groupby : string or list-of-strings, optional

The groupby variables. if None, it will take the groupby specified in timeseries_accumlation.
Default: None

input_length_name : string, optional

The column name in the CASTable specifying input sequence length.
Default: xlen

target_length_name : string, optional

The column name in the CASTable specifying target sequence length. currently target length only support length 1 for numeric sequence.
Default: ylen

missing_handling : string, optional

How to handle missing value in the subsequences.
Default: drop

Examples

>>> from swat import CAS
>>> from dlpy.timeseries import TimeseriesTable
>>> s=CAS("cloud.example.com", 5570)
>>> time_tbl = TimeseriesTable.from_localfile(s, "path/to/file.csv", casout=dict(name='time_tbl', replace=True))
>>> time_tbl.timeseries_formatting(timeid='datetime',
...                              timeseries='series',
...                              timeid_informat='ANYDTDTM19.',
...                              timeid_format='DATETIME19.')
>>> time_tbl.timeseries_accumlation(acc_interval='day',
...                               timeseries = 'series',
...                               groupby=['id1var', 'id2var'])
>>> time_tbl.prepare_subsequences(seq_len=3,
...                             target='series',
...                             predictor_timeseries=['series', 'covar'],
...                             missing_handling='drop')