pipefitter.transformer.imputer.Imputer¶
-
class
pipefitter.transformer.imputer.
Imputer
(value=mean)¶ Bases:
pipefitter.base.BaseImputer
Impute missing values in a data set
The values specified to replace missing values can be statistics or constant values. To specify a statistic, use one of the following pre-defined contants on the Imputer class.
- Imputer.MAX
- Imputer.MEAN
- Imputer.MEDIAN
- Imputer.MIDRANGE
- Imputer.MIN
- Imputer.MODE
- Imputer.RANDOM
Parameters: value : ImputerMethod or scalar or dict, optional
- Specifies the value to use in place of missing values.
- If an ImputerMethod is specified, that method is used for all missing values.
- If a scalar is specified, that value is used to substitute for all missings.
- If a dict is specified, the keys correspond to the columns and the values are the substitution values (which may also be ImputerMethod instances).
Examples
Sample data set used for imputing examples:
>>> data.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 NaN 8.0 9.0 NaN j e f 2 11.0 NaN 13.0 14.0 NaN h i 3 16.0 17.0 18.0 NaN 20.0 j l 4 NaN 22.0 23.0 24.0 NaN n o
Impute values using the mean:
>>> meanimp = Imputer(Imputer.MEAN) >>> newdata = meanimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.000000 3.0 4.00 5.0 a b c 1 6.0 13.666667 8.0 9.00 12.5 j e f 2 11.0 13.666667 13.0 14.00 12.5 h i 3 16.0 17.000000 18.0 12.75 20.0 j l 4 8.5 22.000000 23.0 24.00 12.5 n o
Impute values using the mode:
>>> modeimp = Imputer(Imputer.MODE) >>> newdata = modeimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 2.0 8.0 9.0 5.0 j e f 2 11.0 2.0 13.0 14.0 5.0 j h i 3 16.0 17.0 18.0 4.0 20.0 j b l 4 1.0 22.0 23.0 24.0 5.0 j n o
Impute a constant value:
>>> cimp = Imputer(100) >>> newdata = cimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 100.0 8.0 9.0 100.0 j e f 2 11.0 100.0 13.0 14.0 100.0 h i 3 16.0 17.0 18.0 100.0 20.0 j l 4 100.0 22.0 23.0 24.0 100.0 n o
Impute values in specified columns:
>>> dimp = Imputer({'A': 1, 'B': 100, 'F': 'none', 'G': 'miss'}) >>> newdata = cimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 100.0 8.0 9.0 NaN j e f 2 11.0 100.0 13.0 14.0 NaN none h i 3 16.0 17.0 18.0 NaN 20.0 j miss l 4 1.0 22.0 23.0 24.0 NaN none n o
-
__init__
(value=mean)¶
Methods
__init__
([value])get_combined_params
(\*args, \*\*kwargs)Merge all parameters and verify that they valid get_filtered_params
(\*args, \*\*kwargs)Merge parameters that keys that belong to self get_param
(\*names)Return a copy of the requested parameters get_params
(\*names)Return a copy of the requested parameters has_param
(name)Does the parameter exist? set_param
(\*args, \*\*kwargs)Set one or more parameters set_params
(\*args, \*\*kwargs)Set one or more parameters transform
(table[, value])Perform the imputation on the given data set Attributes
MAX
Constant that indicates the maximum data value of the column MEAN
Constant that indicates the mean data value of the column MEDIAN
Constant that indicates the median data value of the column MIDRANGE
Constant that indicates the midrange data value of the column MIN
Constant that indicates the minimum data value of the column MODE
Constant that indicates the mode data value of the column RANDOM
Constant that indicates that random data should be used param_defs
static_params