S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle

Satish B Basapur; B S Shylaja; Venkatesh

doi:10.3844/jcssp.2022.138.150

Research Article Open Access

S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle

Satish B Basapur¹, B S Shylaja¹ and Venkatesh²

¹ Institute of Technology, India
² University Visvesvaraya College of Engineering, India

Abstract

Timely data analysis on a wide variety and a large volume of data unveil valuable information or new insights. The analysis results could be used to innovate new avenues in health care service, business and e-service, etc. However, releasing, storing and reusing sensitive data to third parties results in breaching the data privacy of the individual. To combat privacy breach invasion, privacy-preserving techniques such as suppression, generalization and encryption-based privacy models have been proposed in the literature. The widely used privacy preservation model k-anonymity model prevents record-linkage invasions but fails to satisfy monotonicity property. It has more data distortion and fails to defend semantic-similarity, closeness, nearest-neighborhood data privacy breaches. Moreover, existing approaches are not scalable for the large-scale data set. The paper proposes a semantic similarity two-phase cluster based privacy preservation model. The proposed model considers both numerical and categorical attribute values for data anonymization. Two-phase clustering contains two phases. In the first phase, the t-centroid clustering algorithm is designed and used to partition a set of transaction records of data set D into a set of t-centroids based on the Euclidean distance between transaction records. In the second phase, the neighborhood-aware hierarchical clustering algorithm is designed. It is used to split a set of transaction records within clusters based on neighborhood aware attribute values. Two-phase clustering operations are carried out in parallel and scalable for Big Data sets. The proposed privacy model relies on cell generalization to combat records linkage and semantic-similarity, closeness, nearest-neighborhood privacy breach invasion. All experiments are carried out on two different datasets: Income-Census (KDD) and Bank Credit Card dataset. The experimental results demonstrate that the proposed privacy model can combat privacy breach invasion with cell generalization principles. The proposed privacy model is scalable and time efficient for large-scale data sets.

Journal of Computer Science

Volume 18 No. 3, 2022, 138-150

DOI: https://doi.org/10.3844/jcssp.2022.138.150

Submitted On: 5 September 2021 Published On: 17 March 2022

How to Cite: Basapur, S. B., Shylaja, B. S. & Venkatesh, (2022). S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle. Journal of Computer Science, 18(3), 138-150. https://doi.org/10.3844/jcssp.2022.138.150

Copyright: © 2022 Satish B Basapur, B S Shylaja and Venkatesh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

3,170 Views
1,475 Downloads
0 Citations

Download

Keywords

Privacy Preservation Model
Cell Generalization
Transaction Records
Clusters
Quasi-Identifiers and Sensitive Attributes