Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

고차원 범주형 자료를 위한 비지도 연관성 기반 범주형 변수 선택 방법Association-based Unsupervised Feature Selection for High-dimensional Categorical Data

Other Titles
Association-based Unsupervised Feature Selection for High-dimensional Categorical Data
Authors
이창기정욱
Issue Date
Sep-2019
Publisher
한국품질경영학회
Keywords
Feature Selection; High-dimensional Categorical Data; Association-based Dissimilarity; Distance Metric; Unsupervised Learning
Citation
품질경영학회지, v.47, no.3, pp 537 - 552
Pages
16
Indexed
KCI
Journal Title
품질경영학회지
Volume
47
Number
3
Start Page
537
End Page
552
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/7685
DOI
10.7469/JKSQM.2019.47.3.537
ISSN
1229-1889
2287-9005
Abstract
Purpose: The development of information technology makes it easy to utilize high-dimensional categorical data. In this regard, the purpose of this study is to propose a novel method to select the proper categorical variables in high-dimensional categorical data. Methods: The proposed feature selection method consists of three steps: (1) The first step defines the goodness- to-pick measure. In this paper, a categorical variable is relevant if it has relationships among other variables. According to the above definition of relevant variables, the goodness-to-pick measure calculates the normalized conditional entropy with other variables. (2) The second step finds the relevant feature subset from the original variables set. This step decides whether a variable is relevant or not. (3) The third step eliminates redundancy variables from the relevant feature subset. Results: Our experimental results showed that the proposed feature selection method generally yielded better classification performance than without feature selection in high-dimensional categorical data, especially as the number of irrelevant categorical variables increase. Besides, as the number of irrelevant categorical variables that have imbalanced categorical values is increasing, the difference in accuracy between the proposed method and the existing methods being compared increases. Conclusion: According to experimental results, we confirmed that the proposed method makes it possible to consistently produce high classification accuracy rates in high-dimensional categorical data. Therefore, the proposed method is promising to be used effectively in high-dimensional situation.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Dongguk Business School > Department of Business Administration > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Jung, Uk photo

Jung, Uk
Dongguk Business School (Department of Business Administration)
Read more

Altmetrics

Total Views & Downloads

BROWSE