The Macromolecular Crystallographic Information File (mmCIF)
نویسندگان
چکیده
Introduction The Protein Data Bank (PDB) format provides a standard representation for macromolecular structure data derived from X-ray diffraction and NMR studies. This representation has served the community well since its inception in the 1970's (Bernstein et al. 1) and a large amount of software that uses this representation has been written. However, it is widely recognized that the current PDB format cannot express adequately the large amount of data (content) associated with a single macromolecular structure and the experiment from which it was derived in a way (context) that is consistent and permits direct comparison with other structure entries. Structure comparison, for such purposes as better understanding biological function, assisting in the solution of new structures, drug design, and structure prediction, becomes increasingly valuable as the number of macromolecular structures continues to grow at a near exponential rate. It could be argued that the description of the required content of a structure submission could be met by additional PDB record types. However, this format does not permit the maintenance of the automated level of consistency, accuracy, and reproducibility required for such a large body of data. A variety of approaches for improved scientific data representation is being explored (IEEE 2). The approach described here, which has been developed under the auspices of the International Union of Crystallography (IUCr), is to extend the Crystallographic Information File (CIF) data representation used for describing small molecule structures and associated diffraction experiments. This extension is referred to as the macromolecular Crystallographic Information File (mmCIF) and is the subject of this paper. The paper briefly covers the history of mmCIF, similarities to and differences from the PDB format, contents of the mmCIF dictionary, and how to represent structures using mmC IF. The mmCIF home page (mmCIF 3) contains a historic description of the development of the dictionary, current versions of the dictionary in text and HTML formats, software tools, archives of the mmCIF discussion list, and a detailed on-line tutorial (Bourne 4). Background CIF was developed to describe small molecule organic structures and the crystallographic experiment by the International Union of Crystallography (IUCr) Working Party on Crystallographic Information at the behest of the IUCr Commission on Crystallographic Data and the IUCr Commission on Journals. The result of this effort was a core dictionary of data items 1 sufficient for archiving the small molecule crystallographic experiment and its results (Hall et al. 5 , IUCr …
منابع مشابه
Code Generation through Annotation of Macromolecular Structure Data
The maintenance of software which uses a rapidly evolving data annotation scheme is time consuming and expensive. At the same time without current software the annotation scheme itself becomes limited and is less likely to be widely adopted. A solution to this problem has been developed for the macromolecular Crystallographic Information File (mmCIF) annotation scheme. The approach could be gen...
متن کامل3 . 6 . Classification and Use of Macromolecular Data
The sole data item in the category ENTRY, _entry.id, is a label that identifies the current data block. This label is used as the formal key in several categories that record information that is relevant to the entire data block (e.g. _cell.entry_id, _geom.entry_id), so care should be taken to select a label that is informative and unique. Data items in the ENTRY_LINK category record the relati...
متن کاملThe Protein Data Bank: unifying the archive
The Protein Data Bank (PDB; http://www.pdb.org/) is the single worldwide archive of structural data of biological macromolecules. This paper describes the progress that has been made in validating all data in the PDB archive and in releasing a uniform archive for the community. We have now produced a collection of mmCIF data files for the PDB archive (ftp://beta.rcsb.org/pub/pdb/uniformity/data...
متن کاملDevelopment of PDBj-ML
The methodological advance of protein structure determination has resulted in accumulation of a large amount of structural information of biological macromolecules. With the advent of the structural genomics, the amount is expected to increase even more rapidly in the near future. Efficient methods of its storage and handling are, therefore, required to fully explore the information available. ...
متن کاملNGL Viewer: a web application for molecular visualization
The NGL Viewer (http://proteinformatics.charite.de/ngl) is a web application for the visualization of macromolecular structures. By fully adopting capabilities of modern web browsers, such as WebGL, for molecular graphics, the viewer can interactively display large molecular complexes and is also unaffected by the retirement of third-party plug-ins like Flash and Java Applets. Generally, the we...
متن کامل