On Cost-Efficient Learning of Data Dependencyopen access
- Authors
- Jang, Hyeryung; Song, Hyungseok; Yi, Yung
- Issue Date
- Jun-2022
- Publisher
- IEEE
- Keywords
- Costs; Distributed databases; Inference algorithms; Graphical models; Task analysis; Data models; Tree graphs; Graph structure learning; distributed inference; sample complexity; large deviation principle; belief propagation
- Citation
- IEEE/ACM Transactions on Networking, v.30, no.3, pp 1382 - 1394
- Pages
- 13
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE/ACM Transactions on Networking
- Volume
- 30
- Number
- 3
- Start Page
- 1382
- End Page
- 1394
- URI
- https://scholarworks.dongguk.edu/handle/sw.dongguk/3122
- DOI
- 10.1109/TNET.2022.3141128
- ISSN
- 1063-6692
1558-2566
- Abstract
- In this paper, we consider the problem of learning a tree graph structure that represents the statistical data dependency among nodes for a set of data samples generated by nodes, which provides the basic structure to perform a probabilistic inference task. Inference in the data graph includes marginal inference and maximum a posteriori (MAP) estimation, and belief propagation (BP) is a commonly used algorithm to compute the marginal distribution of nodes via message-passing, incurring non-negligible amount of communication cost. We inevitably have the trade-off between the inference accuracy and the message-passing cost because the learned structure of data dependency and physical connectivity graph are often highly different. In this paper, we formalize this trade-off in an optimization problem which outputs the data dependency graph that jointly considers learning accuracy and message-passing costs. We focus on two popular implementations of BP, ASYNC-BP and SYNC-BP, which have different message-passing mechanisms and cost structures. In ASYNC-BP, we propose a polynomial-time learning algorithm that is optimal, motivated by finding a maximum weight spanning tree of a complete graph. In SYNC-BP, we prove the NP-hardness of the problem and propose a greedy heuristic. For both BP implementations, we quantify how the error probability that the learned cost-efficient data graph differs from the ideal one decays as the number of data samples grows, using the large deviation principle, which provides a guideline on how many samples are necessary to obtain a certain trade-off. We validate our theoretical findings through extensive simulations, which confirms that it has a good match.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.