Research Article Open Access

S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle

Satish B Basapur1, B S Shylaja1 and Venkatesh2
  • 1 Institute of Technology, India
  • 2 University Visvesvaraya College of Engineering, India


Timely data analysis on a wide variety and a large volume of data unveil valuable information or new insights. The analysis results could be used to innovate new avenues in health care service, business and e-service, etc. However, releasing, storing and reusing sensitive data to third parties results in breaching the data privacy of the individual. To combat privacy breach invasion, privacy-preserving techniques such as suppression, generalization and encryption-based privacy models have been proposed in the literature. The widely used privacy preservation model k-anonymity model prevents record-linkage invasions but fails to satisfy monotonicity property. It has more data distortion and fails to defend semantic-similarity, closeness, nearest-neighborhood data privacy breaches. Moreover, existing approaches are not scalable for the large-scale data set. The paper proposes a semantic similarity two-phase cluster based privacy preservation model. The proposed model considers both numerical and categorical attribute values for data anonymization. Two-phase clustering contains two phases. In the first phase, the t-centroid clustering algorithm is designed and used to partition a set of transaction records of data set D into a set of t-centroids based on the Euclidean distance between transaction records. In the second phase, the neighborhood-aware hierarchical clustering algorithm is designed. It is used to split a set of transaction records within clusters based on neighborhood aware attribute values. Two-phase clustering operations are carried out in parallel and scalable for Big Data sets. The proposed privacy model relies on cell generalization to combat records linkage and         semantic-similarity, closeness, nearest-neighborhood privacy breach invasion. All experiments are carried out on two different datasets:         Income-Census (KDD) and Bank Credit Card dataset. The experimental results demonstrate that the proposed privacy model can combat privacy breach invasion with cell generalization principles. The proposed privacy model is scalable and time efficient for large-scale data sets.

Journal of Computer Science
Volume 18 No. 3, 2022, 138-150


Submitted On: 5 September 2021 Published On: 17 March 2022

How to Cite: Basapur, S. B., Shylaja, B. S. & Venkatesh, . (2022). S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle. Journal of Computer Science, 18(3), 138-150.

  • 0 Citations



  • Privacy Preservation Model
  • Cell Generalization
  • Transaction Records
  • Clusters
  • Quasi-Identifiers and Sensitive Attributes