Indiana University Bloomington

Luddy School of Informatics, Computing, and Engineering

Technical Report TR692:
Efficient Association Discovery with Keyword-based Constraints on Large Graph Data

Mo Zhou, Yifan Pan, Yuqing Wu
(Mar 2011), 11 pages pages
[submitted to VLDB 2011]
In many domains, such as social networks, cheminformatics, bioinformatics, and health informatics, data can be represented naturally in graph model, with nodes being data entries and edges the relationships between them. The graph nature of these data brings opportunities and challenges to data storage and retrieval. In particular, it opens the doors to search problems such as semantic association discovery [13, 14, 15] and semantic search [2, 10, 11]. We study the application requirements in these domains and find that discovering Constraint Acyclic Paths is highly in demand. In this paper, we define the CAP problem and propose a set of quantitative metrics for describing keyword-based constraints. We introduce cSPARQL to integrate CAP queries into SPARQL. We propose a series of algorithms to efficiently evaluate core CAP, a critical fragment of CAP queries, on large scale graph data. Extensive experiments illustrate that our algorithms are efficient in answering CAP queries. Applying our technologies to scientific domains has draw interests from domain experts.

Available as: