LOGIT Modelling
نویسنده
چکیده
The main feature of the CoIL challenge data is that the observed response is discrete: we only observe whether or not the customers buy caravan insurance. Models for such data were developed in the 1940s and 1950s in the Bio-statistics literature (see references in Cox and Snell, 1989), in particular LOGIT and PROBIT. In the 1970s such techniques became popular in Economics, where a utility maximizing interpretation was added. By now these techniques are part of the standard statistical toolbox. LOGIT models assume that there is an underlying unobserved continuous variable which determines the response. This then, is modelled as a linear function of observed characteristics and a random term. Together they determine the predicted probabilities of a particular response (“buy” or “not buy”). Maximum likelihood is used to estimate the coefficients on the observed characteristics in the linear function. Our objective was to make a submission with as little effort as possible, which is why LOGIT Modelling was adopted. The main problem we faced were the large number of poorly documented attributes (to our benefit one of the authors is a native Dutch speaker). The other problem, which is common to marketing data, is the low incidence of positive outcomes. The basic procedure was standard LOGIT analysis as implemented in version 10 beta of PcGive (Hendry and Doornik, 1999). Because of the large number of attributes in the data set, this was complemented with the PcGets automatic search procedure. Roughly, the following steps were taken: 1) LOGIT modelling of CARAVAN using the whole data set to get a feel for the data. Splitting attributes according to their value gave potential improvements for: PPERSAUTO, APERSAUTO, PBRAND, PPLEZIER. 2) It was observed that the zip-code attributes (attribute 1-43) did not have much explanatory power by regressing the product ownership attributes (44-85) on all zip-code attributes. 3) PcGets (Hendry & Krolzig, 1999) was used to reduce the model. Strictly speaking, using regression methods when the dependent variable is discrete is incorrect, but it was found to be very helpful. 4) Starting from the most general LOGIT model, the model was simplified, partially guided by the PcGets results. With hindsight we found that the model was over-simplified for forecasting purposes, and this might be an avenue for future research. Note that the first two steps are based on standard statistical procedures, while the remaining two steps involve machine learning methods (for example, general-to-specific model search as implemented in PcGets). The model (see figure 1) was used to select the 800 records from the evaluation data based on ranking customer records by predicted probability of purchasing caravan insurance.
منابع مشابه
Modelling Departure Time, Destination and Travel Mode Choices by Using Generalized Nested Logit Model: Discretionary Trips (Research Note)
Despite traditional four-step model is the most prominent model in majority of travel demand analysis, it does not represent the potential correlations within different travel dimensions. As a result, some researches have suggested the use of choice modelling instead. However, most of them have represented travel dimensions individually rather than jointly. This research aims to fill this gap t...
متن کاملComparison of Vehicle-Ownership Models
Empirical studies on household car ownership have used two types of discrete choice modelling structures, the ordered and the unordered. In ordered structures such as the ordered logit and ordered probit models, the choice of the number of household-vehicles arises from a uni-dimensional latent index that reflects the propensity of a household to own vehicles. Unordered response models, on the ...
متن کاملApplication of Discrete 3-level Nested Logit Model in Travel Demand Forecasting as an Alternative to Traditional 4-Step Model
This paper aims to introduce a new modelling approach that represents departure time, destination and travel mode choice under a unified framework. Through it, it is possible to overcome shortages of the traditional 4-step model associated with the lack of introducing actual travellers’ behaviours. This objective can be achieved through adopting discrete 3-level Nested Logit model that represen...
متن کاملMarine trade-offs: comparing the benefits of off-shore wind farms and marine protected areas
The drive to increase renewable electricity production in many parts of Europe has led to an increasing concentration of new wind energy sites at sea. This results in a range of environmental impacts which should be taken into account in a benefit-cost analysis of such proposals. In this paper, we use choice modelling to investigate the relative gains and losses from siting new windfarms off th...
متن کاملLinear regression with special coefficient features attained via parameterization in exponential, logistic, and multinomial-logit forms
Multiple linear regression with special properties of its coefficients parameterized by exponent, logit, and multinomial functions is considered. To obtain always positive coefficients the exponential parameterization is applied. To get coefficients in an assigned range, the logistic parameterization is used. Such coefficients permit us to evaluate the impact of individual predictors in the mod...
متن کاملThe Mixed Logit Model: The State of Practice
The mixed logit model is considered to be the most promising state of the art discrete choice model currently available. Increasingly researchers and practitioners are estimating mixed logit models of various degrees of sophistication with mixtures of revealed preference and stated preference data. It is timely to review progress in model estimation since the learning curve is steep and the unw...
متن کامل