Embedded caching framework for privacy preserving data mining
Date of Issue2009
School of Computer Engineering
Centre for Advanced Information Systems
Most existing work on Privacy-Preserving Data Mining (PPDM) focus on enabling conventional data mining algorithms with the ability to run in a secure manner in a multi-party setting. Although various algorithms in data mining have been enhanced to incorporate secure mechanisms for data privacy preservation, their computation performance is far too high to allow them to be practically useful. This is especially true for those algorithms that make use of common cryptosystems and/or linear algebra transformation. Such cryptographic and/or transformation operations are indeed costly, particularly when they are performed repeatedly in PPDM algorithms. High computational overheads would render these algorithms less practically applicable. In this report, we address the efficiency issue of several PPDM algorithms by proposing to cache result data that are used more than once by Secure Multi-party Computation (SMC) protocols. For this to be possible, we carefully examine the micro steps of many SMC procedures to identify the repetitive or iterative portions. We then establish a general framework for SMC protocols and generalize intrinsic properties from this investigation so as to be able to evaluate their feasibility and capability for intermediate result caching. We design proper intermediate result caching interfaces for various SMC protocols, then apply them to many PPDM algorithms, thus reduce the overall computational cost by caching intermediate results/data. Our experiments show that the caching technique is generalizable to common data mining algorithms and the efficiency of PPDM algorithms can be greatly improved without compromising data privacy. Besides, we also proposed a novel Secure Scalar Product (SSP) protocol to facilitate our caching concept on SMC protocol family. This protocol is built on an efficient cryptosystem and would be able to incorporate intermediate result caching capability inside. In future iterations, these results could be utilized to derive relevant system input/output without costly cryptographic operations, hence, to further reduce overall computational overheads. Our empirical experiments showed that our proposed protocol is able to significantly reduce cryptographic overheads in SSP operation.
DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
Final Year Project (FYP)
Nanyang Technological University