An Integrated Framework for Mixed Data Clustering Using Self Organizing Map
Abstract
Problem statement: Clustering plays an important role in data mining of large data and helps in analysis. This develops a vast importance in research field for providing better clustering technique. There are several techniques exists for clustering the similar kind of data. But only very few techniques exists for clustering mixed data items. This leads to the requirement of better clustering technique for classification of mixed data. The cluster must be such that the similarity of items within the clusters is increased and the similarity of items from different clusters must be reduced. The existing techniques possess several advantages and at the same time various disadvantages also exists. Approach: To overcome those drawbacks, Self-Organizing Map (SOM) and Extended Attribute-Oriented Induction (EAOI) for clustering mixed data type data can be used. This will take more time for clustering. A modified SOM was proposed based on batch learning. Results: The experimentation for the proposed technique was carried with the help of UCI Adult Data Set. The number of clusters resulted for the proposed technique is lesser when compared to the usage of SOM. Also the outliers were not obtained by using the proposed technique. Conclusion: The experimental suggests that the proposed technique can be used to cluster the mixed data items with better accuracy of classification.
DOI: https://doi.org/10.3844/jcssp.2011.1639.1645
Copyright: © 2011 Hari Prasad Devaraj and M. Punithavalli. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,334 Views
- 2,774 Downloads
- 1 Citations
Download
Keywords
- Attribute-oriented induction
- clustering technique
- data mining
- training pattern
- self-organizing map
- batch learning
- Better Matching Unit (BMU)
- numeric attributes
- scientific data analysis