Efficient Bulk Loading of Large High-Dimensional Indexes
نویسندگان
چکیده
Efficient index construction in multidimensional data spaces is important for many knowledge discovery algorithms, because construction times typically must be amortized by performance gains in query processing. In this paper, we propose a generic bulk loading method which allows the application of user-defined split strategies in the index construction. This approach allows the adaptation of the index properties to the requirements of a specific knowledge discovery algorithm. As our algorithm takes into account that large data sets do not fit in main memory, our algorithm is based on external sorting. Decisions of the split strategy can be made according to a sample of the data set which is selected automatically. The sort algorithm is a variant of the well-known Quicksort algorithm, enhanced to work on secondary storage. The index construction has a runtime complexity of O(n log n). We show both analytically and experimentally that the algorithm outperforms traditional index construction methods by large factors.
منابع مشابه
Improving the Query Performance of High-Dimensional Index Structures Using Bulk-Load Operations
In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is very inefficient to construct an index for a large amount of data by dynamic insertion of single objects, there is an increasing interest in bulk-loading techniques. In contrast to previous approaches, our technique exploits a pri...
متن کاملImproving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations
In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is very inefficient to construct an index for a large amount of data by dynamic insertion of single objects, there is an increasing interest in bulk-loading techniques. In contrast to previous approaches, our technique exploits a pri...
متن کاملEfficient Bulk Operations on Dynamic R-Trees1
In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensively in the database community. The continuous arrival of massive amounts of new data makes it import...
متن کاملEfficient Bulk Deletes for Multi Dimensionally Clustered Tables in DB2
In data warehousing applications, the ability to efficiently delete large chunks of data from a table is very important. This feature is also known as Rollout or Bulk Deletes. Rollout is generally carried out periodically and is often done on more than one dimension or attribute. The ability to efficiently handle the updates of RID indexes while doing Rollouts is a well known problem for databa...
متن کاملGeneration of High Efficient Quasi-Single-Cycle 3 and 6THZ Pulses using Multilayer Structures OH1/SiO2 and DSTMS/SiO2
We propose that high efficient terahertz (THz) multilayer structures are composed of DSTMS/SiO2 and OH1/SiO2 at 3 and 6THz frequencies. We show that the efficiencies of these structures are higher than DAST/SiO2 structure in both of 3 and 6THz frequencies. OH1/SiO2 structure at 6THz has an efficiency as large as 10-1; at 3THz frequency, DSTMS/SiO2 structure has an efficiency as large as 10-...
متن کامل