Feature selection is essentially a model selection problem. If we take a frequentist maximum likelihood approach, we will, in the limit, select all features (unless, as is typical, we apply some sort of “early stopping” critereon). Additionally, basing the next feature to selected solely on standard measures such as likelihood gain, we fail to account for the variance of the estimate of this fe...