dlpy.timeseries.TimeseriesTable.prepare_subsequences¶

TimeseriesTable.prepare_subsequences(seq_len, target, predictor_timeseries=None, timeid=None, groupby=None, input_length_name='xlen', target_length_name='ylen', missing_handling='drop')¶

Prepare the subsequences that will be pass into RNN

Parameters:

seq_len : int: subsequence length that will be passed onto RNN.
target : string: the target variable for RNN. Currenly only support univariate target, so only string is accepted here, not list of strings.
predictor_timeseries : string or list-of-strings, optional: Timeseries that will be used to predict target. They will be preprocessed into subsequences as well. If None, it will take the target timeseries as the predictor, which corresponds to auto-regressive models.
Default: None
timeid : string, optional: Specifies the column name for the timeid. If None, it will take the timeid specified in timeseries_accumlation.
Default: None
groupby : string or list-of-strings, optional: The groupby variables. if None, it will take the groupby specified in timeseries_accumlation.
Default: None
input_length_name : string, optional: The column name in the CASTable specifying input sequence length.
Default: xlen
target_length_name : string, optional: The column name in the CASTable specifying target sequence length. currently target length only support length 1 for numeric sequence.
Default: ylen
missing_handling : string, optional: How to handle missing value in the subsequences.
Default: drop

Examples

>>> from swat import CAS
>>> from dlpy.timeseries import TimeseriesTable
>>> s=CAS("cloud.example.com", 5570)
>>> time_tbl = TimeseriesTable.from_localfile(s, "path/to/file.csv", casout=dict(name='time_tbl', replace=True))
>>> time_tbl.timeseries_formatting(timeid='datetime',
...                              timeseries='series',
...                              timeid_informat='ANYDTDTM19.',
...                              timeid_format='DATETIME19.')
>>> time_tbl.timeseries_accumlation(acc_interval='day',
...                               timeseries = 'series',
...                               groupby=['id1var', 'id2var'])
>>> time_tbl.prepare_subsequences(seq_len=3,
...                             target='series',
...                             predictor_timeseries=['series', 'covar'],
...                             missing_handling='drop')