Detailed Information

Cited 3 time in webofscience Cited 5 time in scopus
Metadata Downloads

Resampling-Based Similarity Measures for High-Dimensional Data

Authors
Amaratunga, DhammikaCabrera, JavierLee, Yung-Seop
Issue Date
1-Jan-2015
Publisher
MARY ANN LIEBERT, INC
Keywords
feature selection; supervised classification; similarity; deep sequencing; unsupervised classification; microarrays; dissimilarity
Citation
JOURNAL OF COMPUTATIONAL BIOLOGY, v.22, no.1, pp 54 - 62
Pages
9
Indexed
SCIE
SCOPUS
Journal Title
JOURNAL OF COMPUTATIONAL BIOLOGY
Volume
22
Number
1
Start Page
54
End Page
62
URI
https://scholarworks.dongguk.edu/handle/sw.dongguk/23494
DOI
10.1089/cmb.2014.0195
ISSN
1066-5277
1557-8666
Abstract
An important issue in classification is the assessment of sample similarity. This is nontrivial in high-dimensional or megavariate datasets-datasets that are comprised of simultaneous measurements on thousands of features, many of which carry little or no information regarding consistent sample differences. Conventional similarity measures do not work particularly well for such data. As an alternative, we propose a distance measure that is based on a refiltering process: at each step of the process a random subset of features is selected and a cluster analysis is performed using only this subset; the relative frequency with which a pair of samples clusters together across several such random subsets forms the similarity measure. The features chosen at any step may be completely random or enriched by awarding the more informative features a higher chance of selection; this enrichment turns out to be particularly effective. We use actual datasets from the burgeoning genomics literature to demonstrate the superior performance of this similarity measure, especially the enriched form of the similarity measure, compared to more conventional measures such as Euclidean distance or correlation, or, if the data are categorical, Hamming distance.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Natural Science > Department of Statistics > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Lee, Yung Seop photo

Lee, Yung Seop
College of Natural Science (Department of Statistics)
Read more

Altmetrics

Total Views & Downloads

BROWSE