Database design for ecologists: Composing core entities with observations
نویسندگان
چکیده
Article history: Received 1 January 2007 Received in revised form 11 June 2007 Accepted 30 July 2007 The ecoinformatics community recognizes that ecological synthesis across studies, space, and time will require new informatics tools and infrastructure. Recent advances have been encouraging, butmany problems still face ecologists whomanage their own datasets, prepare data for archiving, and searchdata stores for synthetic research. In this paper,wedescribehow work by the Canopy Database Project (CDP) might enable use of database technology by field ecologists: increasing the quality of database design, improving data validation, and providing structural and semantic metadata — all of which might improve the quality of data archives and thereby help drive ecological synthesis. The CDP has experimented with conceptual components for database design, templates, to address information technology issues facingecologists. Templates represent forest structures and observational measurements on these structures. Using our software, researchers select templates to represent their study’s data and can generate normalized relational databases. Information hidden in those databases is used by ancillary tools, including data intake forms and simple data validation, data visualization, andmetadata export. The primary questionwe address in this paper is, which templates are the right templates. We argue for defining simple templates (with relatively few attributes) that describe the domain's major entities, and for coupling those with focused and flexible observation templates. We present a conceptual model for the observation data type, and show how we have implemented themodel as an observation entity in the DataBank database designer and generator. We show how our visualization tool CanopyView exploits metadata made explicit by DataBank to help scientists with analysis and synthesis. We conclude by presenting future plans for tools to conduct statistical calculations common to forest ecology and to enhance data mining with DataBank databases. DataBank could be extended to another domain by replacing our forest–ecology-specific templates with those for the new domain. This work extends the basic computer science idea of abstract data types and user-defined types to ecology-specific database design tools for individual users, and applies to ecoinformatics the software engineering innovations of domain-specific languages, software patterns, components, refactoring, and end-user programming. © 2007 Elsevier B.V. All rights reserved.
منابع مشابه
Geomatics and Architectural Heritage: a Multi-layer Interactive Map of Tuscia-Italy
The main aims of this research are the design and implementation of a multilayered and interactive geomatic map of the cultural heritage of Tuscia, one of the richest and most complex cultural areas of Italy, thanks to the presence of different civilizations, from Etruscans and Romans to the Middle Age. Its cultural heritage is very rich, valuable and above all diversified because including tan...
متن کاملProvenanceMatrix: A Visualization Tool for Multi-taxonomy Alignments
Visualizing and analyzing the relationships between taxonomic entities represented in multiple input classifications is both challenging and required due to recurrent new discoveries and inferences of taxa and their phylogenetic relationships. Despite the availability of numerous visualization techniques, the large size of hierarchical classifications and complex relations between taxonomic ent...
متن کاملIDEL: In-Database Entity Linking with Neural Embeddings
We present a novel architecture, In-Database Entity Linking (IDEL), in which we integrate the analytics-optimized RDBMS MonetDB with neural text mining abilities. Our system design abstracts core tasks of most neural entity linking systems for MonetDB. To the best of our knowledge, this is the first defacto implemented system integrating entity-linking in a database. We leverage the ability of ...
متن کاملAn Investigation into the Individual Differences Correlates of Iranian Undergraduate EFL Learners’ Writing Competence: A Mixed Methods Approach
The present study adopted a mixed-methods research design and explored the role of a set of cognitive (i.e., aptitude and working memory) and motivational (i.e., self-regulatory capacity and self-efficacy beliefs) individual difference variables in the writing quality and composing behavior of 78 Iranian undergraduate EFL learners. The necessary data were collected through a series of instrumen...
متن کاملA new versatile database created for geneticists and breeders to link molecular and phenotypic data in perennial crops: the AppleBreed DataBase
OBJECTIVE AppleBreed DataBase (DB) aims to store genotypic and phenotypic data from multiple pedigree verified plant populations (crosses, breeding selections and commercial cultivars) so that they are easily accessible for geneticists and breeders. It will help in elucidating the genetics of economically important traits, in identifying molecular markers associated with agronomic traits, in al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Ecological Informatics
دوره 2 شماره
صفحات -
تاریخ انتشار 2007