|Detective Brooklynk: System for Automatic Recovery of Broken Web Links|
Detective Brooklynk is an information retrieval system to provide a list of possible web pages to substitute the one pointed by a broken link. The system uses natural language techniques, such as named entity recognition, information extraction techniques and language models to extract information related to the considered broken link. This information is then used to make several queries, which are submitted to different search engines to retrieve documents related to the missing web page. In order to tune the results, the pages recovered in this way are ranked according to relevance measures obtained by applying information retrieval (IR) techniques, and finally this ordered list of results is presented to the user.
Technology: The system integrates an IR technology such as information extraction, ranking functions and natural language techniques.
Technical Requirements: The system needs an IR module which retrieves a set of relevant documents from a query constructed from the information extracted from the web pages. The system is currently using the Yahoo! BOSS API to retrieve documents from the Web, but this is an independent module.
Innovation: Most existing technologies to deal with the problem of broken links are based on the storage of information related to the site links in advance. Thus, Detective Brooklynk is a novel technology able to recommend, with high accuracy, candidate web pages to substitute a link without the need of information previously stored.
Juan Martinez-Romo and Lourdes Araujo: “Analyzing Information Retrieval Methods to Recover Broken Web Links”. ECIR 2010: LNCS 5993, pp. 26-37, Springer (2010).
Juan Martinez-Romo and Lourdes Araujo "Retrieving Broken Web Links using an Approach based on Contextual Information". ACM conference on Hypertext. Torino, Italy. June 29th - July 1th, 2009.
Juan Martinez-Romo and Lourdes Araujo: “Recommendation System for Automatic Recovery of Broken Web Links”. IBERAMIA 2008: LNCS 5290, pp. 302-311, Springer (2008).