school

UM E-Theses Collection (澳門大學電子學位論文庫)

Title

Topic-based segmentation of web pages

English Abstract

TOPIC-BASED SEGMENTATION OF WEB PAGES by Li Yong Thesis Supervisor: Dr. Gong Zhiguo Master of Science and Technology in E-Commerce Nowadays, the world-wide web has become so popular throughout the world. Users can visit the Web pages using the web browsers. And most of the Web pages with numerous and various information contents published as HTML on the Internet. (Although some Web pages written in XML, the majority of them use HTML.) These Web pages using HTML on the Internet called on-line documents are semistructured. However, Many web pages are semantic diverse. That is, the whole content of a web page is not consistent to address one topic. However, current search engines are page-oriented (other than topic-oriented). But, most web users retrieve their target information by topics. Therefore, how to partition web pages by semantics is one of interesting research topics. In this thesis, we firstly build up a tree (called Semantic Part, SP) based on the web page tags-its nature structural signs. Then we analyze the characteristics of the words (or terms) appearing on the web page in order to build a term weighting formula. Based on these term weight values we employ the similarity formula to calculate the semantic similar degree between each two SPs. Finally, we consider the balance point of precision and recall as the reference value of the similarity-threshold. Though the work above we can find the topic-related segmentation of a web page. And we achieved a satisfied result.

Issue date

2005.

Author

Li, Yong

Faculty
Faculty of Science and Technology
Department
Department of Computer and Information Science
Degree

M.Sc.

Subject

Web usage mining

Data mining

Supervisor

Gong, Zhi Guo

Files In This Item

View the Table of Contents

View the Abstract

Location
1/F Zone C
Library URL
991008400309706306