Zezhou Cheng

About Me

I am a tenure-track assistant professor in the Department of Computer Science at University of Virginia, where I lead the Computer Vision Lab. Before joining UVA, I was a postdoctoral researcher at Caltech advised by Georgia Gkioxari, and I completed my Ph.D. in computer science at UMass Amherst, co-advised by Subhransu Maji and Daniel Sheldon. I earned my bachelor’s degree in Computer Science from Sichuan University in 2015, where I began my research in computer vision under the guidance of Qingxiong Yang and Bin Sheng.

I am interested in self-supervised visual representation learning, 3D vision, video understanding and generation, and applying computer vision to areas such as ecology, chemistry and neuroscience.

Prospective Students: I am actively looking for research interns and PhD students joining us. If you’re interested, please contact me at zc3bp@virginia.edu with your CV or complete this form.

News

04/26 Received Sony Faculty Innovation Award. Thank you, Sony!
04/26 Congrats Hao Gu for receiving NSF Graduate Research Fellowship!!!
04/26 WildRayZer received CVPR 2026 highlight. Congrats Xuweiyi and Wentao!
03/26 Received an Adobe Research Gift. Thank you, Adobe!
02/26 One paper accepted to CVPR 2026: WildRayZer. Congrats Xuweiyi and Wentao!
02/26 Gave a talk at the Symposium on Computer and Autonomous Vision Systems on Self-Supervised Learning.
01/26 One paper accepted to ICLR 2026: Point-MoE. Congrats Xuweiyi and Wentao!
01/26 Teaching Computer Vision this Spring.
01/26 I will be co-organizing GPU-Accelerated Computing Research Group.
12/25 I'll be serving on Senior Program Committee at IJCAI.
11/25 Received an Adobe Research Gift. Thank you, Adobe!
10/25 Two papers accepted to 3DV 2026: OVMono3D and Point-MAE-Zero. Congrats Jin and Xuweiyi!
10/25 Received MathWorks Research Award. Thank you, MathWorks!
09/25 Received NVIDIA Academic Grant Program Award. Thank you, NVIDIA!
09/25 Gave a talk at NSF-Simons AI Institute for Cosmic Origins on Computer Vision for Scientific Discovery [Record]
09/25 Two papers accepted to NeurIPS 2025: LabelAny3D and Frame-In-N-Out. Congrats Jin and Boyang!
09/25 Teaching 3D Computer Vision this Fall.
08/25 Received AMD’s High Performance Compute Fund. Thank you, AMD!
06/25 I'll be serving on Senior Program Committee at AAAI for Social Impact Track.
05/25 Our NSF NAIRR Pilot project has been awarded!
03/25 Gave a talk at MathWorks: Recognize Anything in 3D with Minimal Human Supervision.
02/25 One paper accepted to CVPR 2025: Self-supervised Learning for Mid-Level Vision. Congrats Xuweiyi!
01/25 Teaching Computer Vision this Spring.
12/24 Received an Adobe Research Gift. Thank you, Adobe!
12/24 Gave a talk at UVA AIML Seminar: Recognize Anything in 3D with Minimal Human Supervision.
09/24 Teaching 3D Computer Vision this Fall.

Research & Recent Work

Thrust 1: Self-Supervised Representation Learning

We develop self-supervised visual representation learning methods that learn spatio-temporal and semantic representations from images, videos, or 3D point clouds, with broad applications in downstream tasks, such as multimodal and robotic systems.

Chen et al. WildRayZer: Self-supervised Large View Synthesis in Dynamic Environments. CVPR 2026, highlight.
Chen et al. Semantic-Free Procedural 3D Shapes Are Surprisingly Good Teachers. 3DV 2026.
Chen et al. Probing the Mid-level Vision Capabilities of Self-Supervised Learning. CVPR 2025.

Thrust 2: Reconstruct and Recognize Anything in 3D

We build 3D systems that generalize across domains, scenes, and vocabularies while requiring minimal task-specific supervision.

Chen et al. Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts. ICLR 2026.
Yao et al. Open Vocabulary Monocular 3D Object Detection. 3DV 2026.
Yao et al. LabelAny3D: Label Any Object 3D in the Wild. NeurIPS 2025.

Thrust 3: Video and Dynamic Scene Understanding

We study controllable video generation and dynamic-world perception for agents that must reason over motion, geometry, and time.

Wang et al. OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer. arXiv 2026.
Wang et al. Frame In-N-Out: Unbounded Controllable Image-to-Video Generation. NeurIPS 2025.
Zhou et al. Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision. arXiv 2026.