ABSTRACT: Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very fast in terms of their response time to a user query. But their usefulness to the user in terms of retrieval performance leaves a lot to be desired. Typically, the user has to sift through a lot of nonrelevant documents to get only a few relevant ones for the user's information needs. Ranking functions play a very important role in the search engine retrieval performance. In this paper, we describe a methodology using genetic programming to discover new ranking functions for the Web-based information-seeking task. We exploit the content as well as structural information in the Web documents in the discovery process. The discovery process is carried out for both the ad hoc task and the routing task in retrieval. For either of the retrieval tasks, the retrieval performance of these newly discovered ranking functions has been found to be superior to the performance obtained by well-known ranking strategies in the information retrieval literature.
Key words and phrases: business intelligence, genetic programming, information retrieval, machine learning, ranking function, search engines, text mining, Web mining