Constrained linear regression models for symbolic interval-valued variables

نویسندگان

  • Eufrasio de Andrade Lima Neto
  • Francisco de A. T. de Carvalho
چکیده

This paper introduces an approach to fitting a constrained linear regression model to interval-valued data. Each example of the learning set is described by a feature vector for which each feature value is an interval. The new approach fits a constrained linear regression model on the midpoints and range of the interval values assumed by the variables in the learning set. The prediction of the lower and upper boundaries of the interval value of the dependent variable is accomplished from its midpoint and range, which are estimated from the fitted linear regression models applied to the midpoint and range of each interval value of the independent variables. This new method shows the importance of range information in prediction performance as well as the use of inequality constraints to ensure mathematical coherence between the predicted values of the lower (ŷLi) and upper (ŷUi) boundaries of the interval. The authors also propose an expression for the goodness-of-fit measure denominated determination coefficient. The assessment of the proposed prediction method is based on the estimation of the average behavior of the root-mean-square error and square of the correlation coefficient in the framework of a Monte Carlo experiment with different data set configurations. Among other aspects, the synthetic data sets take into account the dependence, or lack thereof, between themidpoint and range of the intervals. The bias produced by the use of inequality constraints over the vector of parameters is also examined in terms of the mean-square error of the parameter estimates. Finally, the approaches proposed in this paper are applied to a real data set and performances are compared. © 2009 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Não preencher

Symbolic Data Analysis is concerned with data tables where the values in each cell are not single values but elements that express the variability of the records, e.g., intervals or histograms. Symbolic linear regression aims at investigating the linear relationship between histogram or interval-valued variables. In this paper, we study two real data problems: in a first one, symbolic models ar...

متن کامل

Multiple Fuzzy Regression Model for Fuzzy Input-Output Data

A novel approach to the problem of regression modeling for fuzzy input-output data is introduced.In order to estimate the parameters of the model, a distance on the space of interval-valued quantities is employed.By minimizing the sum of squared errors, a class of regression models is derived based on the interval-valued data obtained from the $alpha$-level sets of fuzzy input-output data.Then,...

متن کامل

Linear regression model with histogram-valued variables

Histogram-valued variables are a particular kind of variables studied in Symbolic Data Analysis where to each entity under analysis corresponds a distribution that may be represented by a histogram or by a quantile function. Linear regression models for this type of data are necessarily more complex than a simple generalization of the classical model: the parameters cannot be negative; still th...

متن کامل

Linear regression analysis for interval-valued data based on the Lasso technique

A new method for linear regression analysis of interval-valued data is proposed. In particular, the linear relationship between an interval-valued response variable and a set of interval-valued explanatory variables is investigated by considering two regression models, one for the midpoints (the locations of the intervals) of the response and explanatory variables and the other one for the radi...

متن کامل

A Suggested Approach for Stochastic Interval-Valued Linear Fractional Programming problem

In this paper, we considered a Stochastic Interval-Valued Linear Fractional Programming problem(SIVLFP). In this problem, the coefficients and scalars in the objective function are fractional-interval, and technological coefficients and the quantities on the right side of the constraints were random variables with the specific distribution. Here we changed a Stochastic Interval-Valued Fractiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2010