Analysis of Impact of stop words on domain specific document set

نویسندگان

Swati joshi

Dharmendra Sharma

چکیده

Data mining is a technique of data evaluation for discovering hidden patterns over the raw data. The data is found in unstructured manner in the real world, therefore for extracting the meaningful data contents form this raw data, the data mining methods and techniques are helpful.In this presented study the, the data mining techniques are utilized for finding the valuable contents from the text document. In order to find more informative contents from text directories, various text mining techniques are recently developed. These text mining techniques are efficient and providing efficient results. But in order to find the user query relevance data from huge amount of data contents is a complex issue in this domain. In this proposed work a new technique for text mining is introduced which is provide more relevant text documents according to the user query. The proposed text document retrieval methodology is developed in visual studio environment and performance of the proposed system is evaluated in terms of accuracy, error and the precision value of the text search. According to the obtained results the performance of the proposed system is adoptable for large text directories. We are using static approach so that we can check efficiency of search for the documents. We are searching the database with and without the stop words and compairing their results, by taking difference of overall time taken by search. The process will results in time graph and the most relevant result of search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

متن کامل

Corpus Specific Stop Words to Improve the Textual Analysis in Scientometrics

With the availability of vast collection of research articles on internet, textual analysis is an increasingly important technique in scientometric analysis. While the context in which it is used and the specific algorithms implemented may vary, typically any textual analysis exercise involves intensive pre-processing of input text which includes removing topically uninteresting terms (stop wor...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

Cultural Elements in the Translation of Children's Literature: Persian translation of Roald Dahl’s Matilda in focus

Translation can have long-term effects on all languages and cultures. It is not a mere linguistic act, but mostly a cultural act, since language is by nature one of the major carriers of cultural elements. Thus, the translator’s job is not just transferring the meaning of words and sentences from the source text to the target text. Culture-specific items often cause translation problems. Identi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Analysis of Impact of stop words on domain specific document set

نویسندگان

چکیده

منابع مشابه

A Linguistic Analysis of Conference Titles in Applied Linguistics

A Linguistic Analysis of Conference Titles in Applied Linguistics

Corpus Specific Stop Words to Improve the Textual Analysis in Scientometrics

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

Cultural Elements in the Translation of Children's Literature: Persian translation of Roald Dahl’s Matilda in focus

عنوان ژورنال:

اشتراک گذاری