pipefitter.transformer.imputer.Imputer¶
-
class
pipefitter.transformer.imputer.Imputer(value=mean)¶ Bases:
pipefitter.base.BaseImputerImpute missing values in a data set
The values specified to replace missing values can be statistics or constant values. To specify a statistic, use one of the following pre-defined contants on the Imputer class.
- Imputer.MAX
- Imputer.MEAN
- Imputer.MEDIAN
- Imputer.MIDRANGE
- Imputer.MIN
- Imputer.MODE
- Imputer.RANDOM
Parameters: value : ImputerMethod or scalar or dict, optional
- Specifies the value to use in place of missing values.
- If an ImputerMethod is specified, that method is used for all missing values.
- If a scalar is specified, that value is used to substitute for all missings.
- If a dict is specified, the keys correspond to the columns and the values are the substitution values (which may also be ImputerMethod instances).
Examples
Sample data set used for imputing examples:
>>> data.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 NaN 8.0 9.0 NaN j e f 2 11.0 NaN 13.0 14.0 NaN h i 3 16.0 17.0 18.0 NaN 20.0 j l 4 NaN 22.0 23.0 24.0 NaN n o
Impute values using the mean:
>>> meanimp = Imputer(Imputer.MEAN) >>> newdata = meanimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.000000 3.0 4.00 5.0 a b c 1 6.0 13.666667 8.0 9.00 12.5 j e f 2 11.0 13.666667 13.0 14.00 12.5 h i 3 16.0 17.000000 18.0 12.75 20.0 j l 4 8.5 22.000000 23.0 24.00 12.5 n o
Impute values using the mode:
>>> modeimp = Imputer(Imputer.MODE) >>> newdata = modeimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 2.0 8.0 9.0 5.0 j e f 2 11.0 2.0 13.0 14.0 5.0 j h i 3 16.0 17.0 18.0 4.0 20.0 j b l 4 1.0 22.0 23.0 24.0 5.0 j n o
Impute a constant value:
>>> cimp = Imputer(100) >>> newdata = cimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 100.0 8.0 9.0 100.0 j e f 2 11.0 100.0 13.0 14.0 100.0 h i 3 16.0 17.0 18.0 100.0 20.0 j l 4 100.0 22.0 23.0 24.0 100.0 n o
Impute values in specified columns:
>>> dimp = Imputer({'A': 1, 'B': 100, 'F': 'none', 'G': 'miss'}) >>> newdata = cimp.transform(data) >>> newdata.head() A B C D E F G H 0 1.0 2.0 3.0 4.0 5.0 a b c 1 6.0 100.0 8.0 9.0 NaN j e f 2 11.0 100.0 13.0 14.0 NaN none h i 3 16.0 17.0 18.0 NaN 20.0 j miss l 4 1.0 22.0 23.0 24.0 NaN none n o
-
__init__(value=mean)¶
Methods
__init__([value])get_combined_params(\*args, \*\*kwargs)Merge all parameters and verify that they valid get_filtered_params(\*args, \*\*kwargs)Merge parameters that keys that belong to self get_param(\*names)Return a copy of the requested parameters get_params(\*names)Return a copy of the requested parameters has_param(name)Does the parameter exist? set_param(\*args, \*\*kwargs)Set one or more parameters set_params(\*args, \*\*kwargs)Set one or more parameters transform(table[, value])Perform the imputation on the given data set Attributes
MAXConstant that indicates the maximum data value of the column MEANConstant that indicates the mean data value of the column MEDIANConstant that indicates the median data value of the column MIDRANGEConstant that indicates the midrange data value of the column MINConstant that indicates the minimum data value of the column MODEConstant that indicates the mode data value of the column RANDOMConstant that indicates that random data should be used param_defsstatic_params