Research Article Open Access

A Cluster Feature-Based Incremental Clustering Approach to Mixed Data

A. M. Sowjanya and M. Shashi


Problem statement: The main objective of this study is to develop an incremental clustering algorithm that can handle numerical as well as categorical attributes in a given dataset. The authors have previously reported a cluster feature-based algorithm, CFICA that can handle only numerical data. Appraoch: Since many of the real life data mining applications work with datasets that contain both numeric and categorical attributes, there is a need for modifying the earlier algorithm to handle such mixed datasets. The core idea is to propose a new distance measure based on the weight age which is automatically generated and apply it to incremental clustering algorithms. The incremental data points are handled in two phases. In the first phase, k-means clustering algorithm is employed for initial clustering of the static databse.In the second phase, the designed distance measure is used to generate the appropriate cluster for the incremental data points. The combination of the two has proved to be more effective in handling mixed datasets. Clustering accuracy, clustering error and the computational time of the proposed approach have been evaluated with different k values and the thresholds. Variation of threshold values showed better results in terms of accuracy for different datasets. Results: The clustering error in this approach reduced considerably with different k values and thresholds. Conclusion: The results ensure the efficiency of the proposed approach in handling real mixed datasets composed of numerical and categorical attributes only.

Journal of Computer Science
Volume 7 No. 12, 2011, 1875-1880


Submitted On: 16 June 2011 Published On: 21 October 2011

How to Cite: Sowjanya, A. M. & Shashi, M. (2011). A Cluster Feature-Based Incremental Clustering Approach to Mixed Data. Journal of Computer Science, 7(12), 1875-1880.

  • 6 Citations



  • Data mining
  • cluster feature
  • centroid
  • farthest neighbor points
  • mixed attributes
  • numerical attributes
  • categorical attributes
  • incremental clustering
  • k-means