Finding Non-trivially Similar Documents from a Large Document Corpus Master's Thesis (30 Eap)

نویسنده

  • Oskar Gross
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Textmining and Organization in Large Corpus

Nowadays a common size of document corpus might have more than 5000 documents. It is almost impossible for a reader to read thought all documents within the corpus and find out relative information in a couple of minutes. In this master thesis project we propose text clustering as a potential solution to organizing large document corpus. As a sub-field of data mining, text mining is to discover...

متن کامل

Automatic indexing: an approach using an index term corpus and combining linguistic and statistical methods

This thesis discusses the problems and the methods of finding relevant information in large collections of documents. The contribution of this thesis to this problem is to develop better content analysis methods which can be used to describe document content with

متن کامل

Evaluation of EAP Programs in Iran: Document Analysis and Expert ‎Perspectives

This study aimed to examine the policies in the Iranian English for Academic Purposes (EAP) education and the extent to which objectives match the policies and are materialized in practice. To this end, course descriptions in the syllabi for the EAP programs were evaluated through document analysis and triangulated with the experts’ perspectives through interviews to examine the current status ...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Document Classification Using Machine Learning and Ontologies

This master's thesis explores a way in which documents can be automatically classified based on their contents. Automatic classification of data is one of the main applications of machine learning. With the help of already classified data a model for the most likely class can be learned. Whether adding background knowledge from ontologies can be added to the model in order to improve the classi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011