data preparation

نتایج جستجو برای: data preparation

تعداد نتایج: 2555826 فیلتر نتایج به سال:

A Realistic Data Cleansing and Preparation Project

2012

Kwok-Bun Yue

Although data cleansing and preparation are significant tasks in many real-world data projects, they are rarely found in project assignments in IS database courses. This paper describes a pilot study of a relatively open-ended project assignment in a graduate database course. The project required the students to cleanse and prepare five datasets on educational statistics from United Nations Dat...

متن کامل

Construction of some new families of nested orthogonal arrays

2017

Tian-fang Zhang Guobin Wu Aloke Dey

Nested orthogonal arrays have been used in the design of an experimental setup consisting of two experiments, the expensive one of higher accuracy being nested in a larger and relatively less expensive one of lower accuracy. In this paper, we provide new methods of construction of two types of nested orthogonal arrays. MSC: 62K15

متن کامل

Studies of Some Problems in Nonparametric Inference

2008

S. Sathe

The as~totic behavior of same nonpa~etric test criteria used in analysis of variance (one-'WaY and two-'WaY classification) under the Pitman type of alternative is considered. Step-down procedure is suggested for bivariate location parameter problem. Waldt s test is used for testing hypotheses in the categorical setup. • • • 1i ACKNOWLEDGMENTS I wish to express my sincere tharu~s to Professor S...

متن کامل

Corpora and Data Preparation for Information Extraction

1993

Lynn Carlson Boyan A. Onyshkevych Mary Ellen Okurowski

The data selection and data preparation efforts which led to the TIPSTER and Fifth Message Understanding Conference (MUC-5) corpora involved substantial effort, time and resources. The Government commitment to these selection and preparation efforts stems from four TIPSTER Program objectives: (1) to provide training data that would promote the development of information extraction technology, (...

متن کامل

Ontology-Driven Data Preparation for Association Mining

2008

Martin Zeman Martin Ralbovský Vojtěch Svátek Jan Rauch

Ontologies can convey domain semantics to various phases of a KDD application through a mapping established between ontology entities and columns of the data matrix. The approach implemented in the Ferda tool focuses on providing support for the data preparation phase. Information about important data values and column groupings, once injected into a domain ontology, can be repeatedly used for ...

متن کامل

Data Preparation for Web Mining – A survey

2012

Amog Rajenderan

An accepted trend is to categorize web mining into three main areas: web content mining, web structure mining and web usage mining. Web content mining involves extracting details/information from the contents of webpages and performing things like knowledge synthesis. Web structure mining involves the usage of graph theory to understand website structure/hierarchy. Web usage mining involves the...

متن کامل

LinkedPipes ETL: Evolved Linked Data Preparation

2016

Jakub Klímek Petr Skoda Martin Necaský

As Linked Data gains traction, the proper support for its publication and consumption is more important than ever. Even though there is a multitude of tools for preparation of Linked Data, they are still either quite limited, difficult to use or not compliant with recent W3C Recommendations. In this demonstration paper, we present LinkedPipes ETL, a lightweight, Linked Data preparation tool. It...

متن کامل

Spatial Data Preparation for Knowledge Discovery

2005

Vania Bogorny Paulo Martins Engel Luis Otavio Alvares

There is a well known necessity to extract knowledge from spatial databases. Dozens of algorithms for data mining and knowledge discovery are reported in the specific literature to supply this necessity. However, these algorithms have some general drawbacks. Some consider only spatial data and others, only non-spatial data. Most are pseudo-codes, which are usually not implemented in toolkits, a...

متن کامل

BIR Pipeline for Preparation of Phylogenomic Data

2015

Surendra Kumar Anders K Krabberød Ralf S Neumann Katerina Michalickova Sen Zhao Xiaoli Zhang Kamran Shalchian-Tabrizi

SUMMARY We present a pipeline named BIR (Blast, Identify and Realign) developed for phylogenomic analyses. BIR is intended for the identification of gene sequences applicable for phylogenomic inference. The pipeline allows users to apply their own manually curated sequence alignments (seed) in search for homologous genes in sequence databases and available genomes. BIR automatically adds the id...

متن کامل

XML Source Preparation for Building Data Warehouses

2008

Yasser Hachaichi Jamel Feki Hanêne Ben-Abdallah

Faced with the high economic competition, today’s enterprises are forced to rely on decision support systems to assist them in the analysis of large data volumes. Traditionally, the analyzed data are mainly issued from the enterprise’s operational information system. However, due to the international nature of the competition, enterprises are increasingly pressed to explore other, external data...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید