Wanruo Zhang, Mengyuan Liu, Hong Liu et al. (4 total)
2025-04-11
AAAI Conference on Artificial Intelligence
10.1609/aaai.v39i10.33101
4 citations
摘要
Recently, transformer-based methods have been introduced to estimate 3D human pose from multiple views by aggregating the spatial-temporal information of human joints to achieve the lifting of 2D to 3D. However, previous approaches cannot model the inter-frame correspondence of each view's joint individually, nor can they directly consider all view interactions at each time, leading to insufficient learning of multi-view associations. To address this issue, we propose a Spatial-View-Temporal tra...