school

UM E-Theses Collection (澳門大學電子學位論文庫)

Title

Research on outlier mining method to web content

English Abstract

University of Macau Abstract RESEARCH ON OUTLIER MINING METHOD TO WEBCONTENT by PengCheng Wang Thesis Supervisor: Associate Professor, JingZhi Guo Master of Science in E-commerce Technology With the rapid development of cloud computing, big data and Internet technology, mass data has been widely used in the process of work, life and learning, such as E-government, E-commerce, online learning,etc., leading to the birth of a huge amount of data resource information. Mass data resource is a great convenience to people, because the scale of network data resources is increasing. It is becoming more and more complex, and it has brought obstacles to the use of information. Therefore, it is needed to build a fast Web content data mining algorithm, improve the filtering of data noise, more accurate access to valuable information. In this paper, I analyze the background and significance of outlier data mining, elaborate the contents of data mining, outlier mining, Web outlier mining, and sum up the research status and the innovation of this paper. Secondly, this paper summarizes the concept and definition of outlier data mining, analyzes the application of outlier mining technology, the traditional outlier mining algorithm based on statistics, based on proximity, based on density and clustering based on four categories, and then analyzes the data mining content of outliers. Then, this paper studies and analyzes Web outlier data mining, describes the classification of web data mining, and then summarizes the framework and steps of web data mining. Also, I analyze in detail the existing problems of algorithm based on bottom-up clustering, and put forward a new dual-path web outlier mining algorithm. Dual-path algorithm can analyze the relation of the data from multiple angles, such as mapping, according to each data that contains relevant information content, then choose a reasonable data analysis and evaluation mechanism. Compared with bottom-up clustering algorithm, dual-path algorithm can make use of the collaborative relationship between various data that obtain outliers more accurate. Outlier detection can effectively improve the accuracy of web data mining analysis, and further improve the content of web data. It can provide a large amount of valuable information resources and improve the level and ability of people to use data. Keywords: Data Mining; Outlier Mining; Web Mining; Web Outlier Mining;

Issue date

2016.

Author

Wang, Peng Cheng

Faculty

Faculty of Science and Technology

Department

Department of Computer and Information Science

Degree

M.Sc.

Subject

Data mining

Web usage mining

Outliers (Statistics)

Supervisor

Guo, Jing Zhi

Files In This Item

TOC & Abstract

Location
1/F Zone C
Library URL
991001919259706306