Word count 3971

نویسندگان

Alexandre Savaris

Theo Härder

Aldo von Wangenheim

چکیده

Objective To design, build and evaluate a storage model able to manage heterogeneous DICOM images. The model must be simple, but flexible enough to accommodate variable content without structural modifications; must be effective on answering query/retrieval operations according to the DICOM standard, and must provide performance gains on querying/retrieving content to justify its adoption by image-related projects. Materials and Methods The proposal adapts the original Decomposed Storage Model, incorporating structural and organizational characteristics present in DICOM image files. Tag values are stored according to their data types/domains, in a schema built on top of a standard RDBMS. Evaluation includes storing heterogeneous DICOM images, querying metadata using a variable number of predicates, and retrieving full-content images for different hierarchical levels. Results and Discussion When compared to a well-established DICOM image archive, the proposal is from 0.6 to 7.2 times slower in storing content; however, in querying individual tags, it is about 48.0% faster. In querying groups of tags, DCMDSM is outperformed in scenarios with a large number of tags and low selectivity (being 66.5% slower); however, when the number of tags is balanced with better selectivity predicates, the performance gains are up to 79.1%. In executing full-content retrieval, in turn, the proposal is about 48.3% faster. Conclusion DCMDSM is a model built for the storage of heterogeneous DICOM content, based on a straightforward database design. The results obtained through its evaluation attest its suitability as a storage layer for projects where DICOM images are stored once, and queried/retrieved whenever necessary. INTRODUCTION The Digital Imaging and Communications in Medicine (DICOM) standard, first released in 1985 as ACR/NEMA 300, comprises a set of non-proprietary specifications regarding structure, format, and exchange protocols for digitalbased medical images.[1-2] Combining alphanumerical and binary content into the same image file, the standard defines a self-contained approach for data storage and communication, organized through a hierarchy composed of patient, study, series and image levels. Once acquired, DICOM images are stored according to the particular needs of the involved stakeholders. The storage policies vary from simple file persistence in ordinary file systems, to extraction and indexing of particular image attributes in metadata catalogs and/or databases, demanding full-content parsing on the former and index lookups on the latter in executing query/retrieval operations.[3] Usually, simpler strategies on storage demand complex and timeconsuming routines for searching and retrieving content. Aiming to contribute in reducing the time spent for both query and retrieval workloads, this work defines and evaluates a data model designed to provide full-content storage for DICOM images, as well as full-metadata indexing, allowing the execution of flexible search operations. Originally conceived to accept heterogeneous content, the proposal is well adapted to manage images from different examination modalities and medical device manufacturers, being able to boost query/retrieval through extraction/indexing of attributes according to their data types/domains. BACKGROUND AND SIGNIFICANCE Structure and organization of DICOM image files Physically, the content of a DICOM image file can be seen as structured at the attribute level and as semi-structured at the file level. At the lowest organizational level, tags identified by group/element ordered pairs represent attributes. DICOM tags are characterized by Value Representations (VRs) and Value Multiplicities (VMs), which specify content data types/domains, formatting rules, and number of data elements allowed per tag.[4] The DICOM standard defines a data dictionary composed by a set of tags with reserved group/element identifiers, allowing its expansion through the use of proprietary tags.[5] At the file level, in turn, a DICOM image is structured as a set of tags. The number of tags in a file varies according to the availability of information during the examination scheduling and execution, as well as the examination modality to be performed (e.g., Computed Tomography, Magnetic Resonance) and the medical device manufacturer. Motivations for metadata indexing and adaptive, full-content storage Although the image-specific data stored in DICOM files be the most relevant part of the standard defined content, the accompanying metadata play an important role as a complement or as a self-contained dataset. As a complement, metadata can be used in searching similar images based on attribute-value matching between an image source and an image database.[6-7] The DICOM standard itself specify query (C-FIND) and retrieval (C-GET, C-MOVE) operations in terms of comparisons performed on key attribute values.[8] The prediction of compressed-image quality is also possible, through the evaluation of specific DICOM tags.[9] As a self-contained dataset, in turn, metadata can be used in calculations and in monitoring radiation dose levels to which patients are exposed, allowing the identification of relevant variations and level shifts.[10-12] Considering that the achievement of different goals implies the use of different subsets of data, storage strategies capable of managing heterogeneous and evolving content become a necessity. Current approaches used on managing DICOM image data incur in limited or even inexistent support to content-variant datasets, which reduces their suitability to scenarios where new tags can be available over time. Strategies for managing DICOM content The DICOM structure and organization allows the adoption of a number of strategies aiming the execution of storage, query and retrieval operations, according to constraints like hardware and software availability, volumes of data to be managed, and usage contexts (e.g., image visualization and/or manipulation, image exchange, statistical and operational analysis over metadata). In the simplest approach, the content storage is made using common file system architectures running on ordinary hardware; the DICOM semantically defined hierarchy is physically constructed using directories and subdirectories, translating the patient/study/series/image levels into a directory tree.[13] It is a low-cost option for quick setups, easing expansions in terms of storage capacity. However, deep content searches are quite restricting, demanding individual filecontent parsing and imposing a significant overhead considering high volumes of data. Improvements using combined techniques based on enhanced data models (e.g., Hierarchical Data Format HDF, Network Common Data Format NetCDF), and enhanced file systems (e.g., Parallel Virtual File System PVFS), bring distribution and partitioning to the file system strategy.[14-16] The drawback for such approach remains in the search for specific attribute values, which still demands file-content parsing. The lack of metadata for query/retrieval in file-system-based storage can be addressed through lowand high-level strategies, using extended file attributes and distributed metadata catalogs.[17-18] Both approaches improve the hierarchical and distributed alternatives based on file systems, allowing searches using metadata comparisons. The restrictions in these strategies are directly related to which metadata are used, considering that the extended attributes and metadata catalogs are defined in terms of a subset of the available content. Relational Database Management Systems (RDBMSs) are alternatives available for structured storage. In these systems, the hierarchical relationship between patients, studies, series, and images is usually implemented through joins performed among tables defined for each level of the hierarchy, improving search performance through the use of indexes.[19-21] The database schemas follow the horizontal model, characterized by a reduced number of tables with numerous fields per table. Such organization maps the conceptual data model accordingly; however, it imposes restrictions regarding its use in heterogeneous use-case scenarios. For fixed-structured datasets and applications designed for individual healthcare institutions and/or examination modalities, the approaches mentioned above are quite consistent. However, they lack in flexibility, demanding maintenance whenever new data elements become available. In scenarios of dynamic content like DICOM, such fixed structures limit the data content and, consequently, the search capabilities. MATERIALS AND METHODS The Decomposed Storage Model DSM (Decomposed Storage Model) is a storage model based on the decomposition of relations from a conceptual schema into a set of simpler, binary relations. For each attribute originally defined in the conceptual schema, the model proposes creating a binary relation composed by a surrogate key and the attribute value (clustered on the surrogate key), and a binary relation with the same structure, but clustered on the attribute.[22] This storage model can be considered the predecessor for the current column-oriented architectures, providing a number of improvements (e.g., optimization on a per-column access, reduction/elimination of data sparsity), and known drawbacks (e.g., reduced performance on manipulating sets of correlated attributes, for both reading and writing operations, involving a large number of inter-table joins and database insertions), when compared to row-oriented architectures.[23] Due to its simplicity, the DSM storage model is suitable to extensions and customizations. In its original/modified form, the model is adopted in a number of scenarios including database self-tuning, multi-tenant Software as a Service (SaaS) applications, management of semantic Web data and heterogeneous biomedical data.[24-27] In this work, the original DSM architecture is adapted to incorporate characteristics found in DICOM image files, aiming to provide a fullcontent storage model with performance gains for query/retrieval operations. The proposal: DICOM Decomposed Storage Model The DICOM Decomposed Storage Model (DCMDSM) proposed in this work is based on the original DSM, adapting its characteristics to a scenario of fullcontent storage for medical image data organized according to the DICOM standard, expecting a 1:n storage-query/retrieval ratio. Assuming that a DICOM file is stored once, and its content is queried/retrieved whenever necessary, the proposal focuses in enhancing query/retrieval aiming to reduce its execution time, to the detriment of the storage execution time. The model differs from already known approaches in the following: All standard and proprietary tags extracted from DICOM image files are stored/indexed, aiming to provide full flexibility on query construction and execution. This generalization allows managing content from heterogeneous examination modalities, as well as content acquired from devices of different manufacturers, without schema modifications; Metadata access through Hierarchical Search Methods and RelationalQueries, as defined by the DICOM standard, can be performed using predicates combining any unique/required/optional search key; Content retrieval can be performed at pixel data (image) level, or at fullcontent (metadata + pixel data) level. The proposal is built on top of a standard RDBMS, and physically it is centered on the hierarchical_key table. This table is responsible for the management of surrogate keys (a characteristic from the original DSM), and their binding to the four tag values that identify each level of the DICOM hierarchy (i.e., patientid, studyinstanceuid, seriesinstanceuid, sopinstanceuid). For each DICOM image file stored, a new record is inserted in the hierarchical_key table, generating a new surrogate key. Tags extracted from DICOM image files during parsing time are stored in different tables, according to their VRs. This approach modifies the definition of the original DSM, which states that values for the same attribute must be stored in particular/exclusive tables, clustered by key and value. Mapping the DICOM structure to the original DSM definition implies in the creation of a physical model with more than 6,000 tables, only for standard tags, incurring in schema modifications for each new standard/proprietary tag added over time. The vertical partitioning by VR used in this proposal is simpler and quite consistent, considering that VR definitions have suffered minimal changes since their adoption as part of the DICOM standard, allowing the inclusion of new tags without further modifications in the database schema. The model is further simplified by creating a single table per VR (instead of two tables), replacing the clustering on key/value by indexes. Another difference between the proposal and the original DSM is the number of fields per table. While the conceptual DSM is based on binary relations (composed by the surrogate key and one attribute from the conceptual schema), the physical DCMDSM uses n-ary tables, with n varying according to each VR. The surrogate key field is used as a foreign key to the hierarchical_key table, allowing the establishment of a relationship between tag values and hierarchical level identifiers, related to a specific DICOM image file. For all VR tables, indexes are created on the primary key and on the group/element fields; textual and numerical VRs are indexed, also, by value. An excerpt extracted from the proposed model is presented in figure 1. The model is complemented with a table designed to store the whole content for each original DICOM image file, unmodified, aiming to simplify and to improve performance to the retrieval operation. Although it is possible to fully rebuild a file from the individually stored tag values, empirical tests show that transposing the vertical database schema to the original, horizontal representation is a very time-consuming task, unfeasible for practical purposes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Metaheuristics in Combinatorial Optimization

The emergence of metaheuristics for solving diÆcult combinatorial optimization problems is one of the most notable achievements of the last two decades in operations research. This paper provides an account of the most recent developments in the eld and identi es some common issues and trends. Examples of applications are also reported for vehicle routing and scheduling problems.

متن کامل

Corrections and Retraction

APPLIED MATHEMATICS, POPULATION BIOLOGY. For the article ‘‘Global asymptotic coherence in discrete dynamical systems,’’ by David J. D. Earn and Simon A. Levin, which appeared in issue 11, March 14, 2006, of Proc Natl Acad Sci USA (103:3968–3971; first published March 7, 2006; 10.1073 pnas.0511000103), the authors note that on page 3971, inequality 25 holds only for particular classes of matrice...

متن کامل

Running head: NOVEL SOCIAL CONJUNCTIONS IN WORKING MEMORY The Formation of Novel Social Category Conjunctions in Working Memory: A Possible Role for the Episodic Buffer?

word count: 153 Main text word count: 8453 References word count: 1965 Notes word count: 129 No of Table: 0

متن کامل

Improving Mobile Grid Performance Using Fuzzy Job Replica Count Determiner

Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common computational platform. Mobile Computing is a Generic word that introduces using of movable, handheld devices with wireless communication, for processing data. Mobile Computing focused on providing access to data, information, services and communications anywhere an...

متن کامل