UM E-Theses Collection (澳門大學電子學位論文庫)
A scalable framework for continuous keyword search queries
English Abstract
The efficient processing of large text data plays an important role in different information retrieval (IR) systems. Emerging applications such as news update deliveries and social networking notifications demand showing to end-users the most relevant contents (based on their preferences) due to the limited screen-size of target devices. In this work, the preferences of a user are indicated by a set of keywords where the keywords may either be given by the user or extracted based on her behaviors. Our problem is to continuously report the most relevant documents to registered users according to their preferences, which is denoted as Continuous Keyword Search Queries (CKSQs). Answering CKSQs becomes challenging in the era of big data due to potentially large volume (e.g., number of registered users) and high velocity (e.g., data arrival rates) of the data in emerging applications. To efficiently answer CKSQs, our solution first swaps the index target from documents to users such that the index maintenance cost is completely insensitive to the data arrival rates. Furthermore, we improve the pruning effectiveness by carefully exploiting the local effect of the indexing structure. We additionally elaborate a data partitioning technique which turns our solution to be parallel architecture-friendly. Our experimental study demonstrates that the proposed technique outperforms the state-of-the-art solution by an order of magnitude in terms of response time.
Issue Date
Zhang, Jun Jie
Faculty of Science and Technology
Department of Computer and Information Science
Big data
Querying (Computer science)
Software Engineering -- Department of Computer and Information Science

Library URL
Files In This Item:
Full-text (Intranet only)
1/F Zone C