The built-in mlxtend.feature_selection
module does not provide a way to access the p-values of significant features at every step but the chosen evaluation measure(R-square) score at every step can be accessed using theget_metric_dict()
method of the SequentialFeatureSelector object.
Alternatively, the same can be achieved by adding a simpleprint(“p-value of added feature — {} is {}”.format(new_pval.idxmin(),min_p_value))
and print(“p-value of dropped feature — {} is {}”.format(p_values.idxmax(),max_p_value))
statement in significance comparison block of user-defined function forward_selection()
and backward_elimination()
function respectively, for e.g.
def forward_selection(data, target, significance_level=0.05):
initial_features = data.columns.tolist()
best_features = []
while (len(initial_features)>0):
remaining_features = list(set(initial_features)-set(best_features))
new_pval = pd.Series(index=remaining_features)
for new_column in remaining_features:
model = sm.OLS(target, sm.add_constant(data[best_features+[new_column]])).fit()
new_pval[new_column] = model.pvalues[new_column]
min_p_value = new_pval.min()
if(min_p_value<significance_level):
print("p-value of added feature - {} is {}".format(new_pval.idxmin(),min_p_value))
best_features.append(new_pval.idxmin())
else:
break
return best_features