Detective Brooklynk: System for Automatic Recovery of Broken Web Links

Detective Brooklynk is an information retrieval system to provide a list of possible web pages to substitute the one pointed by a broken link. The system uses natural language techniques, such as named entity recognition, information extraction techniques and language models to extract information related to the considered broken link. This information is then used to make several queries, which are submitted to different search engines to retrieve documents related to the missing web page. In order to tune the results, the pages recovered in this way are ranked according to relevance measures obtained by applying information retrieval (IR) techniques, and finally this ordered list of results is presented to the user.

Technology: The system integrates an IR technology such as information extraction, ranking functions and natural language techniques.

Technical Requirements: The system needs an IR module which retrieves a set of relevant documents from a query constructed from the information extracted from the web pages. The system is currently using the Yahoo! BOSS API to retrieve documents from the Web, but this is an independent module.

Innovation: Most existing technologies to deal with the problem of broken links are based on the storage of information related to the site links in advance. Thus, Detective Brooklynk is a novel technology able to recommend, with high accuracy, candidate web pages to substitute a link without the need of information previously stored.


