고차원 범주형 자료를 위한 비지도 연관성 기반 범주형 변수 선택 방법Association-based Unsupervised Feature Selection for High-dimensional Categorical Data
- Other Titles
- Association-based Unsupervised Feature Selection for High-dimensional Categorical Data
- Authors
- 이창기; 정욱
- Issue Date
- Sep-2019
- Publisher
- 한국품질경영학회
- Keywords
- Feature Selection; High-dimensional Categorical Data; Association-based Dissimilarity; Distance Metric; Unsupervised Learning
- Citation
- 품질경영학회지, v.47, no.3, pp 537 - 552
- Pages
- 16
- Indexed
- KCI
- Journal Title
- 품질경영학회지
- Volume
- 47
- Number
- 3
- Start Page
- 537
- End Page
- 552
- URI
- https://scholarworks.dongguk.edu/handle/sw.dongguk/7685
- DOI
- 10.7469/JKSQM.2019.47.3.537
- ISSN
- 1229-1889
2287-9005
- Abstract
- Purpose: The development of information technology makes it easy to utilize high-dimensional categorical data. In this regard, the purpose of this study is to propose a novel method to select the proper categorical variables in high-dimensional categorical data.
Methods: The proposed feature selection method consists of three steps: (1) The first step defines the goodness- to-pick measure. In this paper, a categorical variable is relevant if it has relationships among other variables. According to the above definition of relevant variables, the goodness-to-pick measure calculates the normalized conditional entropy with other variables. (2) The second step finds the relevant feature subset from the original variables set. This step decides whether a variable is relevant or not. (3) The third step eliminates redundancy variables from the relevant feature subset.
Results: Our experimental results showed that the proposed feature selection method generally yielded better classification performance than without feature selection in high-dimensional categorical data, especially as the number of irrelevant categorical variables increase. Besides, as the number of irrelevant categorical variables that have imbalanced categorical values is increasing, the difference in accuracy between the proposed method and the existing methods being compared increases.
Conclusion: According to experimental results, we confirmed that the proposed method makes it possible to consistently produce high classification accuracy rates in high-dimensional categorical data. Therefore, the proposed method is promising to be used effectively in high-dimensional situation.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Dongguk Business School > Department of Business Administration > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.