Tag Weighting and Its Applications

نویسنده

  • Vasudeva Varma
چکیده

Information explosion on the internet popularly known as the web has led to a phenomenon of the user unable to find the information relevant to him or her. There is plethora of information available and thousands of pages are being added everyday. Hence more and more information is available on each and every subject. Hence, it has become increasing difficult for users to remain up to date on one’s own areas of interest. This has increased the need for indexing of documents drastically. Indexing can take place at the user end or at the search engine end. However traditionally, indexing used to take place at the middle level by authorities or librarians. With the vast amount of content available on the internet, it is practically impossible for any authority to index the data. This led to the evolution of social bookmarking systems where users are able to maintain individual collections of resources and index them using tags for retrieval purposes. This model has evolved into a collaborative model where users share their resource collection and tags with others leading to the emergence of collaborative tagging systems. These systems are also referred to as “Folksonomies”. Due to the flexibility and ease of use, the usage of these systems has increased very rapidly. Many tagging systems like Delicious, Flickr, Bibsonomy, Citeseer etc. have evolved and became very popular. This has created a new form of rich user generated content which can be exploited to good use in a wide range of applications. The data in these systems typically consists of three dimensions resources, users and tags. In this thesis, we have focused on some of the basic metrics for extracting information from folksonomies Tag weight and Tag similarity and the usage of them in applications like Tag Search and Web document summarization. First, we propose a model of tag weighting extending the traditional Vector Space Model for term weighting. We show how tag weighting can be used in tag similarity and also show how we compute tag similarity. We evaluate our proposed tag similarity measure by grounding them against semantic lexicon WordNet using correlation measures Kendall’s tau and Spearman’s rho correlation coefficients. We further demonstrate the application of our tag weighting in summarization of web documents. We have proposed a summarization feature based on tag weights which can be used with some of the existing features in generating summaries. We have evaluated the generated summaries by comparing them with manually generated summaries to find how close they are to the manual summaries and hence more readable. Further, we also demonstrated the application of our tag similarity in tag search. Our experimental results proved that our proposed approaches are able to achieve good improvement over our baselines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of Weighting Functions Used in Oppermann Codes in Polyphase Pulse Compression Radars

Polyphase is a common class of pulse compression waveforms in the radar systems. Oppermann code is one of the used codes with polyphone pattern. After compression, this code has little tolerant against Doppler shift in addition to its high side lobe level. This indicates that the use of Oppermann code is an unsuitable scheme to radars applications. This paper shows that the use of amplitude wei...

متن کامل

Feature Weighting Improvement of Web Text Categorization Based on Particle Swarm Optimization Algorithm

It is usually true that some structures like title can express the main content of texts, and these structures may have an influence on the effectiveness of text categorization. However, the most common feature weighting algorithms, called term frequency-inverse document frequency (TF-IDF) doesn’t think about the structural information of texts. To solve this problem, a new feature weighting al...

متن کامل

Content- and Graph-based Tag Recommendation: Two Variations

We describe two variants of our approach to tackle the task 1 & 2 of the ECML PKDD Discovery Challenge 2009 where each contenter had to identify up to 5 tags for each resource of a given set of either bibtex-like references to publications or bookmarks. The quality of the results was measured against the tags that users of the data source (www.bibsonomy.org) had originally assigned to the resou...

متن کامل

Analyzing Tag Distributions in Folksonomies for Resource Classification

Recent research has shown the usefulness of social tags as a data source to feed resource classification. Little is known about the effect of settings on folksonomies created on social tagging systems. In this work, we consider the settings of social tagging systems to further understand tag distributions in folksonomies. We analyze in depth the tag distributions on three large-scale social tag...

متن کامل

An Intelligent Algorithm based Controller for Multiple Output DC-DC Converters with Voltage Mode Weighting Factor

Multiple output DC-DC converters are widely used in many applications such as aerospace, industrial and medical equipments. The purpose of this paper is to present an intelligent control system for the multiple output DC-DC converters. In order to perform this purpose, a double ended forward DC-DC converter with three output voltages (+5 V/ 50W, +15 V/ 45W and -15 V/ 15W) is considered and anal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012