Depth Prior-Guided 3D Voxel Feature Fusion for 3D Semantic Estimation from Monocular Videos

Wen, Mingyun; Cho, Kyungeun

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Depth Prior-Guided 3D Voxel Feature Fusion for 3D Semantic Estimation from Monocular Videos

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wen, Mingyun	-
dc.contributor.author	Cho, Kyungeun	-
dc.date.accessioned	2024-08-08T13:32:27Z	-
dc.date.available	2024-08-08T13:32:27Z	-
dc.date.issued	2024-07	-
dc.identifier.issn	2227-7390	-
dc.identifier.issn	2227-7390	-
dc.identifier.uri	https://scholarworks.dongguk.edu/handle/sw.dongguk/22702	-
dc.description.abstract	Existing 3D semantic scene reconstruction methods utilize the same set of features extracted from deep learning networks for both 3D semantic estimation and geometry reconstruction, ignoring the differing requirements of semantic segmentation and geometry construction tasks. Additionally, current methods allocate 2D image features to all voxels along camera rays during the back-projection process, without accounting for empty or occluded voxels. To address these issues, we propose separating the features for 3D semantic estimation from those for 3D mesh reconstruction. We use a pretrained vision transformer network for image feature extraction and depth priors estimated by a pretrained multi-view stereo-network to guide the allocation of image features within 3D voxels during the back-projection process. The back-projected image features are aggregated within each 3D voxel via averaging, creating coherent voxel features. The resulting 3D feature volume, composed of unified voxel feature vectors, is fed into a 3D CNN with a semantic classification head to produce a 3D semantic volume. This volume can be combined with existing 3D mesh reconstruction networks to produce a 3D semantic mesh. Experimental results on real-world datasets demonstrate that the proposed method significantly increases 3D semantic estimation accuracy.	-
dc.format.extent	11	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	Depth Prior-Guided 3D Voxel Feature Fusion for 3D Semantic Estimation from Monocular Videos	-
dc.type	Article	-
dc.publisher.location	스위스	-
dc.identifier.doi	10.3390/math12132114	-
dc.identifier.scopusid	2-s2.0-85198439562	-
dc.identifier.wosid	001269659300001	-
dc.identifier.bibliographicCitation	Mathematics, v.12, no.13, pp 1 - 11	-
dc.citation.title	Mathematics	-
dc.citation.volume	12	-
dc.citation.number	13	-
dc.citation.startPage	1	-
dc.citation.endPage	11	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Mathematics	-
dc.relation.journalWebOfScienceCategory	Mathematics	-
dc.subject.keywordPlus	RECONSTRUCTION	-
dc.subject.keywordPlus	TRACKING	-
dc.subject.keywordAuthor	3D semantic scene reconstruction	-
dc.subject.keywordAuthor	depth priors	-
dc.subject.keywordAuthor	vision transformer	-
dc.subject.keywordAuthor	multi-view stereo-network	-
dc.subject.keywordAuthor	voxel feature fusion	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Cho, Kyung Eun photo

Cho, Kyung Eun: College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE