Thesis

Find out more about student projects

Available Projects


In the era of autonomy, the creation of a 3D digital world that faithfully replicates our physical reality becomes increasingly critical. Central to this endeavor is the incorporation of realistic human behaviors. Moreover, human behaviors are intricately rooted in environments - our movements are influenced by our interactions with various objects and the spatial arrangement of our surroundings. Therefore, it is essential not only to model human motion itself but also to model how humans interact with the surrounding environment. Creating human motions within diverse environments has significant applications across numerous fields, including augmented reality (AR), virtual reality (VR), assistive robotics, biomechanics, filmmaking, and the gaming industry. However, capturing human motions in environments require expensive devices, complicated hardware setup and significant manual efforts, thus not scalable to create large-scale human-scene interaction datasets. In this project, we explore how to leverage 2D foundation models to synthesize 3D human motions in various environments in an efficient and scalable way. The project starts from December 2024 or January 2025.

Supervisor: siwei.zhang@inf.ethz.ch


This project aims to evaluate the point tracking performance of state-of-the-art dynamic 3D reconstruction methods on multi-view videos from the TAPVid-3D benchmark. In addition to performance evaluation, failure cases will be analyzed, and improvements will be explored based on the time available during the project.

Supervisor: frano.rajic@inf.ethz.ch


Pre-trained large language models (LLMs) and vision-language models (VLMs) have demonstrated the ability to understand and autoregressively complete complex token sequences, enabling them to capture both the physical and semantic properties of a scene. By leveraging in-context learning, these models can function as general sequence modelers without requiring additional training. This project aims to explore how these zero-shot capabilities can be applied to human motion analysis tasks, such as motion prediction, generation, and denoising. By converting human motion data into token sequences, the project will assess the effectiveness of pre-trained foundation models in digital human modeling. Students will conduct a literature review, design experimental pipelines, and run tests to evaluate the feasibility of using LLMs and VLMs for motion analysis, while exploring optimal tokenization schemes and input modalities.

Supervisors: sergey.prokudin@inf.ethz.ch


Sign language is a visual means of communication that uses hand shapes, facial expressions, body movements, and gestures to convey meaning. It serves as the primary language for the deaf and hard-of-hearing communities. Technologies that capture and generate sign language can bridge communication gaps by enabling real-time translation to text or speech, providing educational tools for non-signers, and improving accessibility in public services like healthcare. This project aims to develop a generative model that can convert spoken language to 3D sign language performance by a human avatar.

Supervisors: kaifeng.zhao@inf.ethz.ch

Archived Projects


The goal of this project is to investigate methods to learn human-scene interaction skills from 2D observations.

Supervisors: kaifeng.zhao@inf.ethz.ch, siwei.zhang@inf.ethz.ch


The goal of this project is to investigate methods to generate 3D facial animations leveraging diffusion models. Diffusion models have shown compelling results in human motion generation. Recent work leverages these models to synthesize full-body motions from sparse input (e.g. head-hand tracking signal). This project will explore extensions of this method to facial animation -- e.g., synthesizing face motion from sparse 2D/3D keypoints.

Supervisors: qianli.ma@inf.ethz.ch, fbogo@meta.com


This project aims to leverage a recent 3D human motion dataset CIRCLE to develop a generative human motion model to synthesize highly complex human scene interactions.

Supervisors: gen.li@inf.ethz.ch, yan.zhang@inf.ethz.ch


This project aims to build a system to capture interactions between people and the environment.

Supervisors: yan.zhang@inf.ethz.ch, kraus@ibk.baug.ethz.ch


This project attempts to learn object geometry and appearance from a set of 2D images and allows for scale specific controlling. We have also witnessed many great processes in realistic controllable 2D image synthesis and pleasant 3D image results by tacking leverage the recent advance in volume rendering. The core idea of this project is to extend the recent 3D generator that enables a level of control on both appearance and geometry.

Supervisors: anpei.chen@inf.ethz.ch


This project attempts to reconstruct the geometric and appearance of 4D scenes (static scene + moving objects). We will start with decomposable radiance field reconstruction with a specific setting: a middle scale static environment (room or outdoor street) and one class of objects (human or car).

Supervisors: anpei.chen@inf.ethz.ch


Supervisors: Francis Engelmann ()


Supervisors: Shengyu Huang (), Xuyang Bai (), Dr. Theodora Kontogianni (), Prof. Dr. Konrad Schindler ()