Finding the Best Page using Synonyms

نویسندگان

  • Yumao Lu
  • Xuerui Wang
  • Angelos Hliaoutakis
  • Giannis Varelas
  • Epimenidis Voutsakis
  • Euripides G. M. Petrakis
چکیده

Rating a page to be a best one, based only on Page Ranking algorithm of Brin and Page would be insufficient. This method relied totally on Link information alone. However, due to application of Soft Computing in Data Mining and Knowledge Discovery, machines were made more effective, additional features of a Page involving its indexing, terms used, capitalizations, anchor texts, hit information, etc. were considered. The classification problem helped to induce this to a great extent. The complexity of dealing with a large number of web pages on the net made researchers to think of solutions dealing with sampling pages randomly and then making an analysis of the features of these pages. Soft Computing techniques were used for analysis of the features of the page. These techniques involved Genetic Algorithms, Neural Networks, Fuzzy Logic and Rough sets. User’ profiles of pages were created from the retrieved ones. Good and bad Pages were categorised on the basis of the terms they contained and these profiles were preserved for further reference. Pages were compared with each other for their similarity using Jaccard score and Best First search algorithm with developed software agents. Adaptive methods were used. Such methods were close to the concept of Genetic algorithm applications. The frequency at which a user visited web pages was also considered as a parameter of interest. Techniques to generate features of pages using co-occurance analysis were developed and web pages were classified based on machine learning. A good method of rating a page provided benefits like relevance, efficiency and indirectly on a crawl priority of a search engine which was more preferred. The web content designed as on date is for human reading and not typically tractable for machines. The semantic web had to provide structured content by adding annotations. Tools were made available to do these conversions. User-generated metadata that expresses a user taste and interest was used to personalize information to an individual user. Specifically, a machine learning method that analyzed a corpus of tagged content was to be used to find hidden topics. It then used these learned topics to select content that matched a users’ interest, thus returning best relevant information pages. Even though Google scholar does not use synonyms and is strict to article text for searching a document, the use of synonyms reduce irrelevant search, causes intent drifting but synonym discovery is context sensitive these features motivate the use of synonyms to expediate the search and to rank relevant documents at a higher position. Google and Wordnet use synonyms but no documentation mentions using combination of synonyms for a term to generate a better relevant search, The present paper will concentrate on presenting a developed search technique to find a best page based on synonyms. The technique is based on the concept of adaptive search using synonyms of a search keyword extracted from a dictionary. These synonyms are then combined in different sets and given to a search engine which will return most relevant documents required by the user at a higher ranking.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding the Best page using Synonyms

Rating a page to be a best one, based only on Page Ranking algorithm of Brin and Page would be insufficient. This method relied totally on Link information alone. However, due to application of Soft Computing in Data Mining and Knowledge Discovery, machines were made more effective, additional features of a Page involving its indexing, terms used, capitalizations, anchor texts, hit information,...

متن کامل

Mining and Ranking Biomedical Synonym Candidates from Wikipedia

Biomedical synonyms are important resources for Natural Language Processing in Biomedical domain. Existing synonym resources (e.g., the UMLS) are not complete. Manual efforts for expanding and enriching these resources are prohibitively expensive. We therefore develop and evaluate approaches for automated synonym extraction from Wikipedia. Using the inter-wiki links, we extracted the candidate ...

متن کامل

UBA: Using Automatic Translation and Wikipedia for Cross-Lingual Lexical Substitution

This paper presents the participation of the University of Bari (UBA) at the SemEval2010 Cross-Lingual Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which fit in that context. This task has a strict relation with the task of automatic machine translation, b...

متن کامل

SynFinder: A System for Domain-Based Detection of Synonyms Using WordNet and the Web of Data

The detection of synonyms is a challenge that has attracted many contributions for the possible applications in many areas, including Semantic Web and Information Retrieval. An open challenge is to identify synonyms of a term that are appropriate for a specific domain, not just all the synonyms. Moreover, the execution time is critical when handling big data. Therefore, it is needed an algorith...

متن کامل

Using a Bilingual Resource to Add Synonyms to a Wordnet: FinnWordNet and Wikipedia as an Example

This paper presents a simple method for finding new synonym candidates for a bilingual wordnet by using another bilingual resource. Our goal is to add new synonyms to the existing synsets of the Finnish WordNet, which has direct word sense translation correspondences to the Princeton WordNet. For this task, we use Wikipedia and its links between the articles of the same topic in Finnish and Eng...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013