school

UM E-Theses Collection (澳門大學電子學位論文庫)

check Full Text
Title

A scalable framework for continuous keyword search queries

English Abstract

The efficient processing of large text data plays an important role in different information retrieval (IR) systems. Emerging applications such as news update deliveries and social networking notifications demand showing to end-users the most relevant contents (based on their preferences) due to the limited screen-size of target devices. In this work, the preferences of a user are indicated by a set of keywords where the keywords may either be given by the user or extracted based on her behaviors. Our problem is to continuously report the most relevant documents to registered users according to their preferences, which is denoted as Continuous Keyword Search Queries (CKSQs). Answering CKSQs becomes challenging in the era of big data due to potentially large volume (e.g., number of registered users) and high velocity (e.g., data arrival rates) of the data in emerging applications. To efficiently answer CKSQs, our solution first swaps the index target from documents to users such that the index maintenance cost is completely insensitive to the data arrival rates. Furthermore, we improve the pruning effectiveness by carefully exploiting the local effect of the indexing structure. We additionally elaborate a data partitioning technique which turns our solution to be parallel architecture-friendly. Our experimental study demonstrates that the proposed technique outperforms the state-of-the-art solution by an order of magnitude in terms of response time.

Issue date

2015.

Author

Zhang, Jun Jie

Faculty

Faculty of Science and Technology

Department

Department of Computer and Information Science

Degree

M.Sc.

Subject

Big data

Querying (Computer science)

Files In This Item

Full-text (Internet)

Location
1/F Zone C
Library URL
991000758549706306