Exploring Correlated Subspaces for Efficient Query Processing in Sparse Database

Started by aruljothi, Apr 11, 2009, 06:23 PM

Previous topic - Next topic



The sparse data is becoming increasingly common and available in many real-life applications. However, relative little attention has been paid to effectively model the sparse data and existing approaches such as the conventional "horizontal" and "vertical" representations fail to provide satisfactory performance for both storage and query processing, as such approaches are too rigid and generally do not consider the dimension correlations. In this paper, we propose a new approach, named HoVer, to store and conduct query for sparse datasets in an unmodified RDBMS, where HoVer stands for Horizontal representation over Vertically partitioned subspaces. According to the dimension correlations of sparse datasets, a novel mechanism has been developed to vertically partition a high-dimensional sparse dataset into multiple lower dimensional subspaces, and all the dimensions are highly correlated intra-subspace and highly unrelated inter-subspace respectively. Therefore, original data objects can be represented by the horizontal format in respective subspaces. With the novel HoVer representation, users can write SQL queries over the original horizontal view, which can be easily rewritten into queries over the subspace tables. Experiments over synthetic and real-life datasets show that our approach is effective in finding correlated subspaces and yields superior performance for the storage and query of sparse data.

Ajit kumar

Can i get the source code of this paper as i'm doing it as my academic project