Empirical comparison of network sampling techniques
نویسندگان
چکیده
In the past few years, the storage and analysis of large-scale and fast evolving networks present a great challenge. Therefore, a number of different techniques have been proposed for sampling large networks. In general, network exploration techniques approximate the original networks more accurately than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction also to random walk and forest-fire sampling. We analyze different real-world networks and the changes of their properties introduced by sampling. We compare several sampling techniques based on the match between the original networks and their sampled variants. The results reveal that the techniques with subgraph induction underestimate the degree and clustering distribution, while overestimate average degree and density of the original networks. Techniques without subgraph induction step exhibit exactly the opposite behavior. Hence, the performance of the sampling techniques from random selection category compared to network exploration sampling does not differ significantly, while clear differences exist between the techniques with subgraph induction step and the ones without it.
منابع مشابه
Network Planning Using Iterative Improvement Methods and Heuristic Techniques
The problem of minimum-cost expansion of power transmission network is formulated as a genetic algorithm with the cost of new lines and security constraints and Kirchhoff’s Law at each bus bar included. A genetic algorithm (GA) is a search or optimization algorithm based on the mechanics of natural selection and genetics. An applied example is presented. The results from a set of tests carried ...
متن کاملComparison of large networks with sub-sampling strategies
Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide...
متن کاملDaily Pan Evaporation Modelling With ANFIS and NNARX
Evaporation, as a major component of the hydrologic cycle, plays a key role in water resources development and management in arid and semi-arid climatic regions. Although there are empirical formulas available, their performances are not all satisfactory due to the complicated nature of the evaporation process and the data availability. This paper explores evaporation estimation methods based o...
متن کاملComparison of Random Walk Based Techniques for Estimating Network Averages
Function estimation on Online Social Networks (OSN) is an important field of study in complex network analysis. An efficient way to do function estimation on large networks is to use random walks. We can then defer to the extensive theory of Markov chains to do error analysis of these estimators. In this work we compare two existing techniques, Metropolis-Hastings MCMC and Respondent-Driven Sam...
متن کاملComprising the Empirical Equations of Runoff- Sediment Resulted from Sediment Rating Curves and Artificial Neural Network (Case Study: Ghadarkhosh Watershed, Ilam Province)
Being available the accurate data on carried sediment has accounted as an important factor for making decision about constructing of river structures and determining of dam life. To accomplish this object, a number methods have been proposed so that sediment rate curving as a hydrological method has been developed for doing it. Ignoring differences between season's values causes to lower the pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1506.02449 شماره
صفحات -
تاریخ انتشار 2015