pipefitter.transformer.imputer.Imputer.transform

Imputer.transform(table, value=None)

Perform the imputation on the given data set

Parameters:

table : data set

The data set to impute

value : ImputerMethod or scalar or dict, optional

Same as for constructor

Returns:

data set

Data set of the same type as table

Examples

Sample data set used for imputing examples:

>>> data.head()
      A     B     C     D     E  F  G  H
0   1.0   2.0   3.0   4.0   5.0  a  b  c
1   6.0   NaN   8.0   9.0   NaN  j  e  f
2  11.0   NaN  13.0  14.0   NaN     h  i
3  16.0  17.0  18.0   NaN  20.0  j     l
4   NaN  22.0  23.0  24.0   NaN     n  o

Impute values using the mean:

>>> meanimp = Imputer(Imputer.MEAN)
>>> newdata = meanimp.transform(data)
>>> newdata.head()
      A          B     C      D     E  F  G  H
0   1.0   2.000000   3.0   4.00   5.0  a  b  c
1   6.0  13.666667   8.0   9.00  12.5  j  e  f
2  11.0  13.666667  13.0  14.00  12.5     h  i
3  16.0  17.000000  18.0  12.75  20.0  j     l
4   8.5  22.000000  23.0  24.00  12.5     n  o

Impute values using the mode:

>>> modeimp = Imputer(Imputer.MODE)
>>> newdata = modeimp.transform(data)
>>> newdata.head()
      A     B     C     D     E  F  G  H
0   1.0   2.0   3.0   4.0   5.0  a  b  c
1   6.0   2.0   8.0   9.0   5.0  j  e  f
2  11.0   2.0  13.0  14.0   5.0  j  h  i
3  16.0  17.0  18.0   4.0  20.0  j  b  l
4   1.0  22.0  23.0  24.0   5.0  j  n  o

Impute a constant value:

>>> cimp = Imputer(100)
>>> newdata = cimp.transform(data)
>>> newdata.head()
       A      B     C      D      E  F  G  H
0    1.0    2.0   3.0    4.0    5.0  a  b  c
1    6.0  100.0   8.0    9.0  100.0  j  e  f
2   11.0  100.0  13.0   14.0  100.0     h  i
3   16.0   17.0  18.0  100.0   20.0  j     l
4  100.0   22.0  23.0   24.0  100.0     n  o

Impute values in specified columns:

>>> dimp = Imputer({'A': 1, 'B': 100,
...                 'F': 'none', 'G': 'miss'})
>>> newdata = cimp.transform(data)
>>> newdata.head()
      A      B     C     D     E     F     G  H
0   1.0    2.0   3.0   4.0   5.0     a     b  c
1   6.0  100.0   8.0   9.0   NaN     j     e  f
2  11.0  100.0  13.0  14.0   NaN  none     h  i
3  16.0   17.0  18.0   NaN  20.0     j  miss  l
4   1.0   22.0  23.0  24.0   NaN  none     n  o