Pow3r, VGGT - 권흥찬 발표 > Seminar | 서울대 지능형 자동차 연구실

Pow3r, VGGT - 권흥찬 발표

페이지 정보

작성자 최고관리자 댓글 조회 작성일 25-08-01 13:41

본문

Pow3r

We present Pow3r, a novel large 3D vision regression model that is highly versatile in the input modalities it accepts. Unlike previous feed-forward models that lack any mechanism to exploit known camera or scene priors at test time, Pow3r incorporates any combination of auxiliary information such as intrinsics, relative pose, dense or sparse depth, alongside input images, within a single network. Building upon the recent DUSt3R paradigm, a transformer-based architecture that leverages powerful pre-training, our lightweight and versatile conditioning acts as additional guidance for the network to predict more accurate estimates when auxiliary information is available. During training we feed the model with random subsets of modalities at each iteration, which enables the model to operate under different levels of known priors at test time. This in turn opens up new capabilities, such as performing inference in native image resolution, or point-cloud completion. Our experiments on 3D reconstruction, depth completion, multi-view depth prediction, multi-view stereo, and multi-view pose estimation tasks yield state-of-the-art results and confirm the effectiveness of Pow3r at exploiting all available information. The project webpage is this https URL.

VGGT

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views. This approach is a step forward in 3D computer vision, where models have typically been constrained to and specialized for single tasks. It is also simple and efficient, reconstructing images in under one second, and still outperforming alternatives without their post-processing utilizing visual geometry optimization techniques. The network achieves state-of-the-art results in multiple 3D tasks, including camera parameter estimation, multi-view depth estimation, dense point cloud reconstruction, and point tracking. We also show that using pretrained VGGT as a feature backbone significantly enhances downstream tasks, such as non-rigid point tracking and feed-forward novel view synthesis.

첨부파일

250723_Seminar_Pow3r_VGGT.pdf (4.7M) 0회 다운로드 | DATE : 2025-08-01 13:41:59

이전글Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting - 황지훈 발표 25.08.01
다음글3D Vision-Language Gaussian Splatting - 김동욱 발표 25.08.01

댓글목록

등록된 댓글이 없습니다.

Boards

Seminar

페이지 정보

본문

첨부파일

댓글목록