The built-in mlxtend.feature_selection

1 min readApr 19, 2020

The built-in mlxtend.feature_selection module does not provide a way to access the p-values of significant features at every step but the chosen evaluation measure(R-square) score at every step can be accessed using theget_metric_dict() method of the SequentialFeatureSelector object.

Alternatively, the same can be achieved by adding a simpleprint(“p-value of added feature — {} is {}”.format(new_pval.idxmin(),min_p_value)) and print(“p-value of dropped feature — {} is {}”.format(p_values.idxmax(),max_p_value)) statement in significance comparison block of user-defined function forward_selection() and backward_elimination() function respectively, for e.g.

def forward_selection(data, target, significance_level=0.05):
    initial_features = data.columns.tolist()
    best_features = []
    while (len(initial_features)>0):
        remaining_features = list(set(initial_features)-set(best_features))
        new_pval = pd.Series(index=remaining_features)
        for new_column in remaining_features:
            model = sm.OLS(target, sm.add_constant(data[best_features+[new_column]])).fit()
            new_pval[new_column] = model.pvalues[new_column]
        min_p_value = new_pval.min()
        if(min_p_value<significance_level):
            print("p-value of added feature - {} is {}".format(new_pval.idxmin(),min_p_value))
            best_features.append(new_pval.idxmin())
        else:
            break
    return best_features

Written by vikashraj luhaniwal