Analysis of Impact of stop words on domain specific document set
نویسندگان
چکیده
Data mining is a technique of data evaluation for discovering hidden patterns over the raw data. The data is found in unstructured manner in the real world, therefore for extracting the meaningful data contents form this raw data, the data mining methods and techniques are helpful.In this presented study the, the data mining techniques are utilized for finding the valuable contents from the text document. In order to find more informative contents from text directories, various text mining techniques are recently developed. These text mining techniques are efficient and providing efficient results. But in order to find the user query relevance data from huge amount of data contents is a complex issue in this domain. In this proposed work a new technique for text mining is introduced which is provide more relevant text documents according to the user query. The proposed text document retrieval methodology is developed in visual studio environment and performance of the proposed system is evaluated in terms of accuracy, error and the precision value of the text search. According to the obtained results the performance of the proposed system is adoptable for large text directories. We are using static approach so that we can check efficiency of search for the documents. We are searching the database with and without the stop words and compairing their results, by taking difference of overall time taken by search. The process will results in time graph and the most relevant result of search.
منابع مشابه
A Linguistic Analysis of Conference Titles in Applied Linguistics
Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...
متن کاملA Linguistic Analysis of Conference Titles in Applied Linguistics
Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...
متن کاملCorpus Specific Stop Words to Improve the Textual Analysis in Scientometrics
With the availability of vast collection of research articles on internet, textual analysis is an increasingly important technique in scientometric analysis. While the context in which it is used and the specific algorithms implemented may vary, typically any textual analysis exercise involves intensive pre-processing of input text which includes removing topically uninteresting terms (stop wor...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملCultural Elements in the Translation of Children's Literature: Persian translation of Roald Dahl’s Matilda in focus
Translation can have long-term effects on all languages and cultures. It is not a mere linguistic act, but mostly a cultural act, since language is by nature one of the major carriers of cultural elements. Thus, the translator’s job is not just transferring the meaning of words and sentences from the source text to the target text. Culture-specific items often cause translation problems. Identi...
متن کامل