On Learning Decision Heuristics
نویسندگان
چکیده
This document provides details on the random forest implementation and describes the 56 public data sets used in the empirical analysis. Random forest. We used the implementation of random forest in the R package randomForest (Liaw and Wiener, 2002). Typically, the only parameter tuned when using random forests is mtry, which specifies how many attributes should be randomly selected for consideration when splitting a branch (Hastie et al., 2009). We used 10-fold cross-validation to find the best value of mtry from the range 1, 2, ..., k, where k is the number of available attributes. A second important parameter is ntree, which specifies the number of trees built. The default setting in the package (and used frequently in the literature) is 500. We repeated our experiments with ntree set to 500 and to 1,000; we did not see an observable difference. We report results with ntree set to 1,000. Random forests, in addition, have parameters that control how each individual tree is built. In the literature, these parameters are typically not tuned. We used the defaults in the package. In his original paper on random forests, Breiman (2001) recommended setting mtry to b √ kc, where k is the number of attributes. We tried this value as well. With few exceptions, the performance on individual learning curves either declined or remained the same. This result is consistent with the earlier study by Fernández-Delgado et al. (2014). We report only the results obtained when setting mtry using cross-validation. 1. Data Sets AFL Objects: 41 Australian Football League (AFL) games at the Melbourne Cricket Ground in 1993 and 1994. Criterion: attendance. Attributes: forecasted maximum temperature on the day of the game, total attendance at other AFL games in Melbourne and Geelong on the day of the game, total membership in the two clubs whose teams were playing, number of players in the top 50 who participated in the game, number of days since the earliest game of the season. Source: This data set was assembled by Rowan Todd and Mark McNaughton for a class project at the University of Queensland in a statistics course taught by Margaret Mackisack. The data sources were The Football Bible ’94 by Rex Hunt, The Weekend Australian, Inside Football, and Football Record. The data set is
منابع مشابه
Ranking Pharmaceutics Industry Using SD-Heuristics Approach
In recent years stock exchange has become one of the most attractive and growing businesses in respect of investment and profitability. But applying a scientific approach in this field is really troublesome because of variety and complexity of decision making factors in the field. This paper tries to deliver a new solution for portfolio selection based on multi criteria decision making literatu...
متن کاملDecision Heuristics for Comparison: How Good Are They?
Simple decision heuristics are cognitive models of human and animal decision making. They examine few pieces of information and combine the pieces in simple ways, for example, by considering them sequentially or giving them equal weight. They have been studied most extensively for the problem of comparison, where the objective is to identify which of a given number of alternatives has the highe...
متن کاملLearning From Small Samples: An Analysis of Simple Decision Heuristics
Simple decision heuristics are models of human and animal behavior that use few pieces of information—perhaps only a single piece of information—and integrate the pieces in simple ways, for example, by considering them sequentially, one at a time, or by giving them equal weight. It is unknown how quickly these heuristics can be learned from experience. We show, analytically and empirically, tha...
متن کاملChoosing Search Heuristics by Non-Stationary Reinforcement Learning
Search decisions are often made using heuristic methods because realworld applications can rarely be tackled without any heuristics. In many cases, multiple heuristics can potentially be chosen, and it is not clear a priori which would perform best. In this article, we propose a procedure that learns, during the search process, how to select promising heuristics. The learning is based on weight...
متن کاملOn the Power of Top-Down Branching Heuristics
We study the relative best-case performance of DPLL-based structure-aware SAT solvers in terms of the power of the underlying proof systems. The systems result from (i) varying the style of branching and (ii) enforcing dynamic restrictions on the decision heuristics. Considering DPLL both with and without clause learning, we present a relative efficiency hierarchy for refinements of DPLL result...
متن کاملEnumerating Distinct Decision Trees
The search space for the feature selection problem in decision tree learning is the lattice of subsets of the available features. We provide an exact enumeration procedure of the subsets that lead to all and only the distinct decision trees. The procedure can be adopted to prune the search space of complete and heuristics search methods in wrapper models for feature selection. Based on this, we...
متن کامل