Automatic Reclustering of Objects in Very Large Databases for High Energy Physics
نویسندگان
چکیده
In the very large object database systems planned for some future particle physics experiments, typical physics analysis jobs will traverse millions of read-only objects, many more objects than fit in the database cache. Thus, a good clustering of objects on disk is highly critical to database performance. We present the implementation and performance measurements of a prototype reclustering mechanism which was developed to optimise I/O performance under the changing access patterns in a high energy physics database. Reclustering is done automatically and on-line. The methods used by our prototype differ greatly from those commonly found in proposed general-purpose reclustering systems. By exploiting some special characteristics of the access patterns of physics analysis jobs, the prototype manages to keep database I/O throughput close to the optimum throughput of raw sequential disk access.
منابع مشابه
Reclustering of High Energy Physics Data
The coming high-energy physics experiments will store Petabytes of data into object databases. Analysis jobs will frequently traverse collections containing millions of stored objects. Clustering is one of the most effective means to enhance the performance of these applications. This paper presents a reclustering algorithm for independent objects contained in multiple possibly overlapping coll...
متن کاملReclustering of HEP Data in Object-Oriented Databases
The Large Hadron Collider (LHC), build at CERN, will enter operation in 2005. The experiments at the LHC will generate some 5 PB of data per year, which are stored in an ODBMS. A good object clustering on the disk drives will be critical to achieve a high data throughput required by future analysis scenarios. This paper presents a new reclustering algorithm for HEP data that maximizes the read ...
متن کاملKohonen Self Organizing for Automatic Identification of Cartographic Objects
Automatic identification and localization of cartographic objects in aerial and satellite images have gained increasing attention in recent years in digital photogrammetry and remote sensing. Although the automatic extraction of man made objects in essence is still an unresolved issue, the man made objects can be extracted from aerial photos and satellite images. Recently, the high-resolution s...
متن کاملData clustering research in CMS
The clustering of objects in an object database is the mapping of objects to locations on physical storage media like disk farms and tapes. The performance of the database, and the physics application on top of it, depends crucially on having a good match between the object clustering and the database access patterns of the physics application. We discuss the results and conclusions of a 3-year...
متن کاملClustering and Reclustering HEP Data in Object Databases
As part of the CMS contribution to the RD45 [1] collaboration, database clustering and reclustering have been under investigation for about 1.5 years. The clustering of objects in an object database is the mapping of objects to locations on physical storage media like disk farms and tapes. The performance of the database, and the physics application on top of it, depends crucially on having a g...
متن کامل