Tag prediction for personalization
MetadataShow full item record
Collaborative tagging systems are social data repositories in which users manage resources using descriptive keywords (tags). An important element of collaborative tagging systems is the tag recommender, which proposes a set of tags to each newly posted resource. In this paper, we propose five recommender systems for tag recommendation. The first system uses a hybrid approach that compiles a set of resource specific tags, which includes tags related to the title, tags previously used to describe the same resource (resource profile) and tags previously used to describe similar resources (related resource profile). These tags are checked against user profile tags - a rich, but imprecise source of information about user interests. The result is a set of tags related both to the resource and user. Depending on the nature of processed posts, this set can be an extension of the most common tag recommendation sources, namely the title and resource profile. The second system uses a slightly different hybrid approach originally developed by Melville et al. to predict movie ratings. It uses a content-based predictor to enhance existing user data, and then provides personalized suggestions through collaborative filtering. We present experimental results that show how this approach, Content-Boosted Collaborative Filtering (CBCF), performs better than a hybrid approach as well as other previously developed recommender systems described in this work. Three other systems are built as an extension to the CBCF approach, where a content-based preprocessing step generates a set of imprecise predictions for every user-document pair in the training dataset. This results in a dense dataset on which techniques like Association Rule Mining (ARM) can be applied to leverage the density of content in the dataset. The first of these three systems, CB_ARM , uses the same content-based preprocessing step that is used in the CBCF algorithm mentioned above. However, the collaborative filtering based recommendation step is replaced with a Weighted ARM algorithm. The second system, CB_LDA_ARM , uses Latent Dirichlet Allocation (LDA) in the content-based preprocessing step. LDA reduces the dimensionality of the dataset, post which, distance/similarity computations, required for making content-based predictions, can be performed on low dimensional data. The output of the preprocessing step is fed to the Weighted ARM algorithm mentioned above. The third system, LDA_ARM , uses LDA in the recommendation step prior to the execution of the Weighted ARM algorithm. LDA is applied to the tag sets predicted using the content-based preprocessing step. This generates a set of latent tag-topics which reduces dimensionality of the dataset by clustering tags into tag-topics. The Weighted ARM algorithm is then applied to the new dataset consisting of tag-topics and their probabilities. The most frequent tags from the topics predicted by ARM are used for expanding the existing tag set. We finally present results which show that our CB_ARM recommender system outperforms all other systems discussed in this work, when evaluated on a subset of the challenge dataset.