Statistical Tests for Optimization Efficiency
نویسندگان
چکیده
Learning problems, such as logistic regression, are typically formulated as pure optimization problems defined on some loss function. We argue that this view ignores the fact that the loss function depends on stochastically generated data which in turn determines an intrinsic scale of precision for statistical estimation. By considering the statistical properties of the update variables used during the optimization (e.g. gradients), we can construct frequentist hypothesis tests to determine the reliability of these updates. We utilize subsets of the data for computing updates, and use the hypothesis tests for determining when the batch-size needs to be increased. This provides computational benefits and avoids overfitting by stopping when the batch-size has become equal to size of the full dataset. Moreover, the proposed algorithms depend on a single interpretable parameter – the probability for an update to be in the wrong direction – which is set to a single value across all algorithms and datasets. In this paper, we illustrate these ideas on three L1 regularized coordinate descent algorithms: L1-regularized L2-loss SVMs, L1-regularized logistic regression, and the Lasso, but we emphasize that the underlying methods are much more generally applicable.
منابع مشابه
Experimental investigation, modeling, and optimization of combined electro-(fenton/coagulation/flotation) process: design of experiments and artificial intelligence systems
In this study, a combined electro-(Fenton/coagulation/flotation) (EF/EC/El) process was studied via degradation of Disperse Orange 25 (DO25) organic dye as a case study. Influences of seven operational parameters on the dye removal efficiency (DR%) were measured: initial pH of the solution (pH0), applied voltage between the anode and cathode (V), initial ferrous ion concentration (CFe), initial...
متن کاملOptimization and increase production and efficiency of gas turbines GE-F9 using Media evaporative cooler in Fars combined cycle power plant
Gas turbines play an important role in supplying power for the country especially in peak electricity load. The main disadvantages are the turbines, they produce large changes are a result of climate change. However, in times of peak electricity grid and at the same time warm months, can produce gas turbines under the effect of ambient temperature, the amount of reduced considerably. The method...
متن کاملStatistical Comparison of Classifiers for Multi-objective Feature Selection in Instrument Recognition
Many published articles in automatic music classification deal with the development and experimental comparison of algorithms however the final statements are often based on figures and simple statistics in tables and only a few related studies apply proper statistical testing for a reliable discussion of results and measurements of the propositions’ significance. Therefore we provide two simpl...
متن کاملA statistical test for outlier identification in data envelopment analysis
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...
متن کاملAn empirical study on statistical analysis and optimization of EDM process parameters for inconel 718 super alloy using D-optimal approach and genetic algorithm
Among the several non-conventional processes, electrical discharge machining (EDM) is the most widely and successfully applied for the machining of conductive parts. In this technique, the tool has no mechanical contact with the work piece and also the hardness of work piece has no effect on the machining pace. Hence, this technique could be employed to machine hard materials such as super allo...
متن کامل