scale mining

Mining Traces of Large Scale Systems

2005

Christophe Cérin Michel Koskas

Large scale distributed computing infrastructure captures the use of high number of nodes, poor communication performance and continously varying resources that are not available at any time. In this paper, we focus on the different tools available for mining traces of the activities of such aforementioned architecture. In this paper we propose new techniques for fast management of a frequent i...

متن کامل

Large-Scale Image Mining with Flickr Groups

2015

Alexandru-Lucian Gînsca Adrian Popescu Hervé Le Borgne Nicolas Ballas Dinh-Phong Vo Ioannis Kanellos

The availability of large annotated visual resources, such as ImageNet, recently led to important advances in image mining tasks. However, the manual annotation of such resources is cumbersome. Exploiting Web datasets as a substitute or complement is an interesting but challenging alternative. The main problems to solve are the choice of the initial dataset and the noisy character of Web text-i...

متن کامل

Automated Pattern Mining with a Scale Dimension

1996

Jan M. Zytkow Robert Zembowicz

An important but neglected aspect of automated data mining is discovering patterns at different scale in the Sitllle data. 8C& DhVS the r& ar?&ZOUS to er21X, It can be used to focus the search for patterns on differences that exceed the given scale and to disregard those smaller. We introduce a discovery mechanism that ap plies to bi-variate data. It combines search for maxima and minima with s...

متن کامل

Large-Scale Multimedia Retrieval and Mining

2011

Rong Yan Benoit Huet Rahul Sukthankar

R ecent years have witnessed an explosive growth of multimedia data due to higher processor speeds, faster networks, wider availability of high-capacity mass-storage devices, and the advent of cloud computing. Stimulated by current work in scalable machine learning, feature indexing and multimodal analysis techniques, researchers are increasingly interested in exploring challenges and new oppor...

متن کامل

Petabyte Scale Data Mining: Dream or Reality?

Journal: :CoRR 2002

Alexander S. Szalay Jim Gray Jan vandenBerg

Science is becoming very data intensive. Today’s astronomy datasets with tens of millions of galaxies already present substantial challenges for data mining. In less than 10 years the catalogs are expected to grow to billions of objects, and image archives will reach Petabytes. Imagine having a 100GB database in 1996, when disk scanning speeds were 30MB/s, and database tools were immature. Such...

متن کامل

Towards Web Search Engine Scale Data Mining

2009

Jian Pei

Data mining is one of the most critical driving technologies behind Web search engines. Web search engine scale data mining posts many grand challenges, ranging from efficiency and scalability to diversity and adaptability. In this talk, I will review our recent effort on mining a very large amount of data accumulated in one of the major commercial search engines. Particularly, we tackle the pr...

متن کامل

Mining Billion-Scale Graphs: Patterns and Algorithms

2012

Christos Faloutsos U Kang

Graphs are everywhere: social networks, the World Wide Web, biological networks, and many more. The sizes of graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns in large graphs, spanning Giga, Tera, and heading toward Peta bytes? What are the best tools, and how can they help us solve graph mining problems? How do we scale up algori...

متن کامل

Large Scale Data Mining: Challenges and Responses

1997

Jaturon Chattratichat John Darlington Moustafa Ghanem Yike Guo Harald Frank Hüning Martin Köhler Janjao Sutiwaraphun Hing Wing To Dan Yang

Data mining over large data-sets is important due to its obvious commercial potential, However, it is also a major challenge due to its computational complexity. Exploiting the inherent parallelism of data mining algorithms provides a direct solution by utilising the large data retrieval and processing power of parallel architectures. In this paper, we present some results of our intensive rese...

متن کامل

Mining Tables from Large Scale HTML Texts

2000

Hsin-Hsi Chen Shih-Chung Tsai Jin-He Tsai

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering, recognition, interpretation, and presentation are discussed. Heuristic rules and cell similarities are employed to identify tables. The F-measure of table recognition is 86.50%. We also propose an algorithm to...

متن کامل

Large Scale Data Mining : The Challenges andThe

1997

Jaturon Chattratichat John Darlington Moustafa Ghanem Yike Guo Janjao Sutiwaraphun Hing Wing To Dan Yang

Data mining over large data sets is considered to be a very important research subject due to its obvious commercial potential. However, it is also a major challenge due to its complexity and computational intensity. Exploiting the inherent parallelism of data mining algorithms provides a direct solution by utilising the large data retrieval and processing power of parallel architectures. In th...

متن کامل