A review on consistency and robustness properties of support vector machines for heavy-tailed distributions
نویسندگان
چکیده
Support vector machines (SVMs) belong to the class of modern statistical machine learning techniques and can be described as M-estimators with a Hilbert norm regularization term for functions. SVMs are consistent and robust for classification and regression purposes if based on a Lipschitz continuous loss and a bounded continuous kernel with a dense reproducing kernel Hilbert space. For regression, one of the conditions used is that the output variable Y has a finite first absolute moment. This assumption, however, excludes heavy-tailed distributions. Recently, the applicability of SVMs was enlarged to these distributions by considering shifted loss functions. In this review paper, we briefly describe the approach of SVMs based on shifted loss functions and list some properties of such SVMs. Then, we prove that SVMs based on a bounded continuous kernel and on a convex and Lipschitz continuous, but not necessarily differentiable, shifted loss function have a bounded Bouligand influence function for all distributions, even for heavy-tailed distributions including extreme value distributions and Cauchy distributions. SVMs are thus robust in this sense. Our result covers the important loss functions 2-insensitive for regression and pinball for quantile regression, which were not covered by earlier results on the influence function. We demonstrate the usefulness of SVMs even for heavy-tailed distributions by applying SVMs to a simulated data set with Cauchy errors and to a data set of large fire insurance claims of Copenhagen Re.
منابع مشابه
On consistency and robustness properties of Support Vector Machines for heavy-tailed distributions
Support Vector Machines (SVMs) are known to be consistent and robust for classification and regression if they are based on a Lipschitz continuous loss function and on a bounded kernel with a dense and separable reproducing kernel Hilbert space. These facts are even true in the regression context for unbounded output spaces, if the target function f is integrable with respect to the marginal di...
متن کاملMining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کاملA Comparative Approximate Economic Behavior Analysis Of Support Vector Machines And Neural Networks Models
متن کامل
Qualitative Robustness of Support Vector Machines
Support vector machines have attracted much attention in theoretical and in applied statistics. Main topics of recent interest are consistency, learning rates and robustness. In this article, it is shown that support vector machines are qualitatively robust. Since support vector machines can be represented by a functional on the set of all probability measures, qualitative robustness is proven ...
متن کاملنمودار شوهارت ناپارامتری رتبه علامت دار با فاصله نمونه گیری متغیر
Nonparametric control chart based on rank is used for detecting changes in median(mean). In this article ,Signed-rank control chart is considered with variable sampling interval. We compared the performance of Signed-rank with variable sampling interval (VSI-SR) to Signed-rank with Fixed Sampling interval (FSI-SR),the numerical results demonstrated the VSI feature is so useful. Bakir[1] showed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Adv. Data Analysis and Classification
دوره 4 شماره
صفحات -
تاریخ انتشار 2010