VLG | Computer Vision and Learning Group

Available Projects

Exploring Latent Representatioas for Human Mesh Recovery

Type: Bachelor Thesis or Semester Project

Recent human mesh recovery pipelines (e.g., VQ-HPS, TokenHMR, GenHMR) often compress high-dimensional visual features into discrete tokens before human pose prediction. This project investigates alternative strategies for latent encoding and representation learning, aiming to understand how different encoding schemes influence the accuracy, robustness, and efficiency of 3D human mesh recovery models.

Supervisor: zhiyin.qian@inf.ethz.ch

Learn more

Dense Multi-View 3D Point Tracking

Type: BSc/MSc Thesis or MSc Semester Project

Inspired by AllTracker, this project aims to extend MVTracker to dense 3D tracking at pixel-level granularity across multiple views. Efficiency and scalability will be emphasized.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Better Feature Initialization in Multi-View 3D Point Tracking

Type: BSc/MSc Thesis or MSc Semester Project

MVTracker is recently introduced as the first method for the task of multi-view 3D point tracking. Its way of representing the tracked points with a single features vector might limit the context and is susceptible to noise in the backprojected depth maps. In this project, the student will look for better ways to use the features.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Dynamic 3D Gaussian Splatting with MVTracker Trajectories as Prior

Type: BSc/MSc Thesis or MSc Semester Project

This project investigates combining 3D point trajectories from MVTracker with dynamic 3D Gaussian Splatting pipelines to improve optimization-based reconstruction of moving scenes.

Supervisor: frano.rajic@inf.ethz.ch, sergey.prokudin@inf.ethz.ch

Learn more

Memory-Based Re-Identification for 2D Point Tracking

Type: Master Project

The project explores adding memory and re-identification mechanisms to 2D point tracking models, enabling re-tracking of occluded or temporarily invisible points.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Dataset Generation for Video Depth and Multi-View 3D Point Tracking

Type: BSc/MSc Thesis or MSc Semester Project

Point tracking is a task that has been revived in 2022, but the number and diversity of training datasets remains limited. The goal of this project is to look into creating new or repurposing existing dataset generation pipelines (e.g., from video games) to generate training data for multi-view depth and 3D point tracking, and training existing trackers on the new data.

Supervisor: frano.rajic@inf.ethz.ch, sergey.prokudin@inf.ethz.ch

Learn more

Recovering the 2D Point Tracking Performance of 3D Models

Type: BSc/MSc Thesis or MSc Semester Project

TAPIP3D and MVTracker score well in monocular and multi-view 3D point tracking, but their performance on standard 2D point tracking datasets is hindered by the quality of the video depth received from off-the-shelf depth estimators. This projects aim to make a selected 3D point tracker work well in both settings and be able to perform 2D point tracking even when the depth is noisy or missing.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

From Graph to Geometry: Consistent 3D Scene Editing

Type: Semester Project

When you delete an object from a 3D scene, is it really gone? Pixels may disappear, but the scene graph, the hidden map of objects and their relationships, often keeps whispering clues: 'a chair once stood next to this table.' These symbolic residues pose privacy risks and break the illusion of a clean edit. This project takes on the challenge of erasing objects twice: once in the geometry, and once in the graph. You will design a closed-loop system that scrubs scene graphs of all traces of a removed object and then propagates the changes back into the 3D world, ensuring perfect consistency between symbolic reasoning and physical reality. The outcome not only advances privacy-preserving 3D editing, but also unlocks new possibilities for robotics, AR/VR, and spatial AI.

Supervisors: skocour@ethz.ch

Learn more

Reconstructing the Erased: Object Re-Insertion from Semantic Traces

Type: Semester Project

Can a 'deleted' object come back to life? Even the best 3D scene editing methods often leave subtle traces that give away what was removed. This project turns those leftovers into clues: using our Remove360 dataset, you will develop a model that reconstructs the missing object—its shape, texture, and placement—so realistically that it looks like it was never gone. The work not only exposes hidden weaknesses in current removal techniques, but also pushes forward new methods for realistic scene completion with applications in AR/VR, virtual staging, and digital content creation.

Supervisors: skocour@ethz.ch

Learn more

Leveraging 2D generative models for novel view synthesis

Type: Master or Semester Project

Novel view synthesis (NVS), a fundamental problem in computer vision, seeks to generate renderings from novel target viewpoints given a set of input viewpoints. Achieving this requires addressing several complex challenges: (1) inferring the geometric structure of a scene from 2D observations, (2) rendering the inferred 3D reconstruction from new viewpoints in a physically plausible manner, and (3) inpainting or extrapolating missing regions that are not observed in the input viewpoints. To tackle these challenges, diverse 3D representations, along with classical geometric constraints, advanced optimization techniques, and deep stereo priors, have been extensively studied. In recent years, diffusion generative models for 2D images and videos have demonstrated remarkable capabilities in generating photorealistic images. These advancements have opened new avenues for enhancing NVS by leveraging the priors encoded in these models. This project aims to investigate the types of prior knowledge encoded within 2D generative models that can most effectively benefit NVS. Unlike many contemporary approaches that fine-tune pretrained generative models for specific NVS tasks, this research adopts a zero-shot framework.

Supervisors: yutong.chen@inf.ethz.ch

Learn more

Large Language and Vision Models for Zero-Shot Human Motion Analysis

Type: Master or Semester Project

Pre-trained large language models (LLMs) and vision-language models (VLMs) have demonstrated the ability to understand and autoregressively complete complex token sequences, enabling them to capture both the physical and semantic properties of a scene. By leveraging in-context learning, these models can function as general sequence modelers without requiring additional training. This project aims to explore how these zero-shot capabilities can be applied to human motion analysis tasks, such as motion prediction, generation, and denoising. By converting human motion data into token sequences, the project will assess the effectiveness of pre-trained foundation models in digital human modeling. Students will conduct a literature review, design experimental pipelines, and run tests to evaluate the feasibility of using LLMs and VLMs for motion analysis, while exploring optimal tokenization schemes and input modalities.

Supervisors: sergey.prokudin@inf.ethz.ch

Learn more

Archived Projects

Fix scene normalization issues in MVTracker and make the method work on unbounded scenes

Type: BSc/MSc Thesis or MSc Semester Project

Point tracking is the task of tracking arbitrary points on surfaces of objects throughout the video and is of interest for applications such as robotics. MVTracker introduces the first feedforward multi-view 3d point tracker with state-of-the-art performance across multi-view datasets. However, its performance is not robust to arbitrary scene normalizations since it learns a tracking prior at the scene scale present in the training data. In addition, it underperforms in monocular, unbounded scenes such as on videos from the TAPVid-2D benchmark. In this project, the student will explore the failure cases and limitations of MVTracker and work toward improving the performance in those cases.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Advancing Multi-View Depth Estimation Methods

Type: BSc/MSc Thesis or MSc Semester Project

Having 2–4 cameras instead of one can enable more precise and robust visual perception for robotics. The student will explore monocular and multi-view video depth estimators and work toward implementing a better method.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Improve the Robustness of Multi-View 3D Point Trackers to Noisy and Missing Depth Maps

Type: BSc/MSc Thesis or MSc Semester Project

Video depth estimation for sparse multi-camera input lacks precision and robustness, with frequent failures and considerable geometry misalignment. Such failures directly impact the precision of downstream applications such as multi-view 3D point tracking. The student will work toward improving the robustness of state-of-the-art multi-view 3D point trackers to imperfect depth.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Depth Estimation + 3D Point Tracking

Type: BSc/MSc Thesis or MSc Semester Project

3D point tracking methods treat the problem of depth estimation independently and run point tracking on top of the depth estimates. The quality of the estimated depth can limit the tracking accuracy, especially in cases where depth estimation fails and predicts completely unreasonable geometry. However, the two problems are inherently related and it might be possible that learning to solve the problem jointly is beneficial for both tasks. In this project, you will work towards developing a data-driven, feedforward method that will implicitly or explicitly learn to solve the two tasks in the same neural network.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Incorporating Geometric Cues into 3D Reconstruction

Type: Master or Semester Project

Recent advancements in 3D reconstruction, such as neural radiance fields (NeRF) and 3D Gaussian Splatting have led to impressive results in high-quality novel view synthesis. However, these techniques still face challenges when it comes to extracting accurate geometry, particularly in scenes with reflective or transparent surfaces. At the same time, monocular depth estimation using data-driven or diffusion-based models has shown great promise in inferring depth from a single image and in certain controlled scenarios, access to ground-truth depth information further enables a more precise understanding of scene geometry. This project aims to investigate how depth or normal cues can be integrated into 3D reconstruction pipelines to improve geometric accuracy. The student will explore various methods for incorporating monocular geometric cues, either through direct supervision or indirectly by leveraging depth-aware features, and evaluate the effectiveness of these approaches in challenging scenarios.

Supervisors: johannes.weidenfeller@ai.ethz.ch, lilian.calvet@balgrist.ch

Learn more

Synthesizing Large-scale Human Motions in Environments via 2D Foundation Models

Type: Master or Semester Project

In the era of autonomy, the creation of a 3D digital world that faithfully replicates our physical reality becomes increasingly critical. Central to this endeavor is the incorporation of realistic human behaviors. Moreover, human behaviors are intricately rooted in environments - our movements are influenced by our interactions with various objects and the spatial arrangement of our surroundings. Therefore, it is essential not only to model human motion itself but also to model how humans interact with the surrounding environment. Creating human motions within diverse environments has significant applications across numerous fields, including augmented reality (AR), virtual reality (VR), assistive robotics, biomechanics, filmmaking, and the gaming industry. However, capturing human motions in environments require expensive devices, complicated hardware setup and significant manual efforts, thus not scalable to create large-scale human-scene interaction datasets. In this project, we explore how to leverage 2D foundation models to synthesize 3D human motions in various environments in an efficient and scalable way. The project starts from December 2024 or January 2025.

Supervisor: siwei.zhang@inf.ethz.ch

Learn more

3D Point Tracking with Dynamic Reconstruction Methods

Type: Semester Project

This project aims to evaluate the point tracking performance of state-of-the-art dynamic 3D reconstruction methods on multi-view videos from the TAPVid-3D benchmark. In addition to performance evaluation, failure cases will be analyzed, and improvements will be explored based on the time available during the project.

Supervisor: frano.rajic@inf.ethz.ch

Learn more

Capture and Synthesis of Sign Languages

Type: Master or Semester Project

Sign language is a visual means of communication that uses hand shapes, facial expressions, body movements, and gestures to convey meaning. It serves as the primary language for the deaf and hard-of-hearing communities. Technologies that capture and generate sign language can bridge communication gaps by enabling real-time translation to text or speech, providing educational tools for non-signers, and improving accessibility in public services like healthcare. This project aims to develop a generative model that can convert spoken language to 3D sign language performance by a human avatar.

Supervisors: kaifeng.zhao@inf.ethz.ch

Learn more

Learning 3D Human-Scene Interactions from 2D Observations

Type: Master or Semester Project

The goal of this project is to investigate methods to learn human-scene interaction skills from 2D observations.

Supervisors: kaifeng.zhao@inf.ethz.ch, siwei.zhang@inf.ethz.ch

Learn more

Diffusion Models for 3D Face Animation

Type: Master Project

The goal of this project is to investigate methods to generate 3D facial animations leveraging diffusion models. Diffusion models have shown compelling results in human motion generation. Recent work leverages these models to synthesize full-body motions from sparse input (e.g. head-hand tracking signal). This project will explore extensions of this method to facial animation -- e.g., synthesizing face motion from sparse 2D/3D keypoints.

Supervisors: qianli.ma@inf.ethz.ch, fbogo@meta.com

Learn more

Human motion generation in rich contextual environments

Type: Master or Semester Project

This project aims to leverage a recent 3D human motion dataset CIRCLE to develop a generative human motion model to synthesize highly complex human scene interactions.

Supervisors: gen.li@inf.ethz.ch, yan.zhang@inf.ethz.ch

Learn more

Multi-Person Interaction Capture in the Interactive Design Lab

Type: Master Project

This project aims to build a system to capture interactions between people and the environment.

Supervisors: yan.zhang@inf.ethz.ch, kraus@ibk.baug.ethz.ch

Learn more

Controllable 3D Image Generation

Type: Master or Semester Project

This project attempts to learn object geometry and appearance from a set of 2D images and allows for scale specific controlling. We have also witnessed many great processes in realistic controllable 2D image synthesis and pleasant 3D image results by tacking leverage the recent advance in volume rendering. The core idea of this project is to extend the recent 3D generator that enables a level of control on both appearance and geometry.

Supervisors: anpei.chen@inf.ethz.ch

Learn more

Motion Generation for Hand-object Interaction

Type: Master or Semester Project

Supervisors: kkarunrat@inf.ethz.ch

Learn more

Scene Reconstruction with Moving Objects

Type: Master or Semester Project

This project attempts to reconstruct the geometric and appearance of 4D scenes (static scene + moving objects). We will start with decomposable radiance field reconstruction with a specific setting: a middle scale static environment (room or outdoor street) and one class of objects (human or car).

Supervisors: anpei.chen@inf.ethz.ch

Learn more

Diffusion Models for 3D Scene Generation

Type: Master or Semester Project

Supervisors: Francis Engelmann (mailto:francisengelmann@ai.ethz.ch)

Learn more

A Close Look at Domain Shift in Point Cloud Registration

Type: Master Project

Supervisors: Shengyu Huang (shengyu.huang@geod.baug.ethz.ch), Xuyang Bai (xbaiad@connect.ust.hk), Dr. Theodora Kontogianni (theodora.kontogianni@inf.ethz.ch), Prof. Dr. Konrad Schindler (konrad.schindler@geod.baug.ethz.ch)

Learn more