Genetic programming-based discovery of ranking functions for effective Web search

成果类型:
Article
署名作者:
Fan, WG; Gordon, MD; Pathak, P
署名单位:
Virginia Polytechnic Institute & State University; University of Michigan System; University of Michigan; State University System of Florida; University of Florida
刊物名称:
JOURNAL OF MANAGEMENT INFORMATION SYSTEMS
ISSN/ISSBN:
0742-1222
DOI:
10.1080/07421222.2005.11045828
发表日期:
2005
页码:
37-56
关键词:
information-retrieval relevance
摘要:
Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very fast in terms of their response time to a user query. But their usefulness to the user in terms of retrieval performance leaves a lot to be desired. Typically, the user has to sift through a lot of nonrelevant documents to get only a few relevant ones for the user's information needs. Ranking functions play a very important role in the search engine retrieval performance. In this paper, we describe a methodology using genetic programming to discover new ranking functions for the Web-based information-seeking task. We exploit the content as well as structural information in the Web documents in the discovery process. The discovery process is carried out for both the ad hoc task and the routing task in retrieval. For either of the retrieval tasks, the retrieval performance of these newly discovered ranking functions has been found to be superior to the performance obtained by well-known ranking strategies in the information retrieval literature.