UM E-Theses Collection (澳門大學電子學位論文庫)

check Full Text

Effectiveness of Web page archiving methods

English Abstract

Web page data is ephemeral, while Web archiving has played a key role in preserving this valuable information for the future. Recent research on web archiving has focused on the consistency between archived data in a local system and real data in a remote Web server. These archiving methods are mainly designed for search-engine applications. However, since archiving data is preserved for future applications, we argue that the completeness of archived data is a more valuable factor for future utility. In this work, we study web-page archiving methods which aim at completeness of archived data with predefined available resources. First, we study an archiving method that assumes complete knowledge on web-page updates. While this assumption may not be realistic, the performance of this method provides an upper-bound for others which assume the unknown on web-page updates. We subsequently propose a practical archiving method without any knowledge assumption. Performance of this algorithm is compared. Meanwhile, our newly proposed algorithm is shown to significantly out-perform the periodic method that has been traditionally used in web archiving.

Issue date



Huang, Ya Jun


Faculty of Science and Technology


Department of Computer and Information Science




Web archiving

Digital preservation

Files In This Item

Full-text (Internet)

1/F Zone C
Library URL