A regression framework for learning to rank in web information retrieval
MetadataShow full item record
Machine learning approaches for learning ranking functions have been generating much interest from both the web information retrieval community and the machine learning community recently. It has the promise of improved relevancy of search engines and reduced demand for manual parameter tuning. We focus on developing a regression framework for learning to rank with complex loss functions. More specifically, this framework first applies functional iterative or boosting algorithm to compute updates for a given loss function and then fit the updates with a standard regression base learner. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: (1) absolute relevance judgments arising from explicit labeling of query-document pairs; and (2) relative relevance judgments extracted from user click-throughs of search results or converted from the absolute relevance judgments. Within the framework, we propose three novel ranking algorithms and illustrate their application to web search ranking. The first one is to calibrate the existing point-wise(univariant) regression loss to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we present an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. It is an improvement over the existing approach within the category of learning to rank using point-wise regression loss. The second is an extension of gradient boosting methods for point-wise regression loss to complex(multi-variant) loss functions. It is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. We illustrate an application of this approach in pair-wise preference learning to rank for Web search by combining both preference data and labeled data. The third one is a list-wise approach based on minimum effort optimization that takes into account the entire training data within a query at each iteration. We tackle this optimization problem using functional iterative methods where the update at each iteration is computed by solving an isotonic regression problem. This more global approach results in faster convergency and significantly improved performance of the learned ranking functions over existing state-of-the-art methods. Experimental results are carried out using both data sets obtained from a commercial search engine and widely used IR benchmarking data, namely OHSUMED and TREC. Our results show significant improvements of our proposed methods over existing state-of-the-art methods.