Discovery in Complex or Massive Datasets: Common Statistical Themes

ثبت نشده
چکیده

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Datasets Lead to Overly Complex Models: An Explanation and a Solution

This paper explores unexpected results that tie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many different datasets and several model construction algorithms (including tree learning algorithms such as c4.5 with three different pruning methods, and rule learning algorithms such as C4.5RULES and RIPPER) s...

متن کامل

Automated discovery of relationships, models and principles in ecology

Ecological systems are the quintessential complex systems, involving numerous high-order interactions and non-linear relationships. The most commonly used statistical modelling techniques can hardly reflect the complexity of ecological patterns and processes. Finding hidden relationships in complex data is now possible through the use of massive computational power, particularly by means of Art...

متن کامل

Automated Detection of Terrorist Activities through Link Discovery within Massive Datasets

This paper describes link discovery technology that is designed to detect threat activities by extracting and piecing together transactional evidence from massive datasets that are composed mostly of noise and clutter. The approach is an integration of several innovative component technologies, including partial pattern matching, hypothesis evaluation and hypothesis merging.

متن کامل

Workshop on current challenges in statistical learning (11w5051)

In recent years, statistical learning has seen rapid growth within statistics and computer sciences. This growth has been driven primarily by the need to analyze data of complex structures and process massive amounts of data from scientific investigations. In a discovery process, statistical uncertainty is usually high, given the limited amount of information contained in the data. In gene func...

متن کامل

Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS.

Tandem mass spectrometry has been used increasingly for high-throughput analysis of complex protein samples. A major challenge lies in the consistent, objective and transparent analysis of the large amounts of data generated by such experiments and in their dissemination and publication. Here, we review currently available computational tools and discuss the need for statistical criteria in the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010