Wednesday, October 19, 2011

IEEE PROJECT TITLES and ABSTRACT


ANONYMOUS PUBLICATION OF SENSITIVE TRANSACTIONAL DATA

Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and ‘-diversity, while minimizing the information loss incurred in the anonymzing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time.


Existing system

Existing privacy-preserving techniques focus on anonymizing personal data, which have a fixed schema with a small number of dimensions. Through generalization or suppression, existing methods prevent attackers from Re identifying individual records. However, anonymization of personal data is not sufficient in some applications. Consider, for instance, the example of a large retail company which sells thousands of different products, and has numerous daily purchase transactions. The large amount of transactional data may contain customer spending patterns and trends that are essential for marketing and planning purposes. The company may wish to make the data available to a third party who can process the data and extract interesting patterns (e.g., perform data mining tasks). Since the most likely purpose of the data is to infer certain purchasing trends, characterized by correlations among purchased products, the personal details of the customers are not relevant, and are altogether suppressed. Instead, only the contents of the shopping cart are published for each transaction. Still, there may be particular purchasing habits that discloses customer identity and exposes sensitive customer information.

Proposed System
We devise an anonymized group formation strategy which relies on efficient nearest-neighbor search in high-dimensional spaces. This method outputs directly the anonymized groups, without a need for an additional data reorganization step.  We introduce two representations for transactional data which take advantage of data sparseness, preserve correlations among items, and arrange transactions with similar QID in close proximity to each other. The first method relies on transformation to a band matrix format, whereas the second employs sorting with respect to binary reflected Gray codes. We devise an efficient linear-time heuristic which creates anonymized groups based on the two proposed data organization methods. . We evaluate experimentally our methods with real data sets and show that they clearly outperform existing state of the art in terms of both data utility and computational overhead.

No comments:

Post a Comment