Drafting a Ph.D. Research Statement Around 3D Perception

A Ph.D. research statement should not be a list of keywords.

It should answer a deeper question:

What research problem do I want to spend several years on, why does it matter, what have I already done, and what can I contribute next?

As I prepare for Fall 2027 Ph.D. applications, I am trying to organize my statement around one central direction:

Efficient and predictive 3D scene understanding for autonomous and embodied agents.

This note is a draft of how I think about that structure.


1. The Core Thesis

The statement needs a thesis, not just interests.

My current thesis is:

Autonomous and embodied agents need 3D perception systems that are not only accurate, but also communication-efficient, temporally aware, and predictive.

This sentence connects four ideas:

  1. 3D perception: agents act in physical space, so they need structured spatial understanding.
  2. Communication efficiency: real multi-agent systems cannot transmit everything.
  3. Temporal awareness: perception should use history, not only a single frame.
  4. Prediction: agents need to reason about future scene evolution, not only current reconstruction.

This thesis helps avoid a scattered profile. It gives the website, CV, projects, and blog a common center.


2. Why This Problem Matters

The motivation should be broader than one benchmark.

Autonomous driving and robotics both face partial observability:

  • a camera cannot see through occlusion;
  • one vehicle cannot observe every region;
  • sensors are noisy and limited;
  • communication introduces bandwidth and latency constraints;
  • planning depends on future states.

So the research problem is not simply:

Can we get higher mIoU on an occupancy benchmark?

The broader question is:

How can an agent build a useful belief about the 3D world under limited observation, communication, computation, and time?

This framing makes the work relevant to computer vision, robotics, and embodied AI.


3. What My Current Work Shows

The statement should use current projects as evidence.

For me, the evidence can be organized into three stages.

3.1 Semantic Occupancy as a Structured Output

Semantic occupancy prediction is attractive because it combines geometry and semantics in a format that downstream systems can use.

Instead of only detecting objects or segmenting images, the model predicts a voxelized 3D scene:

\[\hat{O} \in \mathbb{R}^{X \times Y \times Z \times C},\]

where each voxel stores semantic occupancy logits.

This output is close to planning needs because it describes occupied space, free space, and semantic categories.

3.2 Collaborative Perception Under Bandwidth Constraints

Single-agent perception is limited by occlusion and field of view.

Collaborative perception allows neighboring agents to share complementary information, but communication is expensive. My current work studies token-based communication:

\[M_{j \rightarrow i} = \{z_k^j : k \in \mathcal{S}_{j \rightarrow i}\},\]

where (\mathcal{S}_{j \rightarrow i}) is the selected token subset sent from agent (j) to ego agent (i).

This lets the system reason about what information is worth transmitting.

3.3 Occupancy World Models

Current-frame occupancy is useful, but agents also need future prediction.

Occupancy world models extend perception from:

\[\hat{O}_t = f_\theta(X_{\leq t})\]

to:

\[\hat{O}_{t:t+K} = f_\theta(X_{\leq t}),\]

where the model predicts future occupancy states over a horizon (K).

This connects perception with temporal reasoning and planning.


4. Future Research Aims

A research statement should show forward motion.

I currently think about three possible Ph.D. research aims.

Aim 1: Efficient 3D Scene Representations

How can autonomous agents represent 3D scenes compactly while preserving task-relevant information?

This includes:

  • tokenized 3D and BEV representations;
  • semantic occupancy and free-space modeling;
  • uncertainty-aware representation learning;
  • memory-efficient temporal scene representations.

Aim 2: Communication-Aware Multi-Agent Perception

How should agents decide what to share under bandwidth, latency, and reliability constraints?

This includes:

  • receiver-driven information requests;
  • adaptive token selection;
  • rate-distortion views of perception communication;
  • robustness to pose noise, missing agents, and delayed messages.

Aim 3: Predictive Occupancy World Models

How can perception models reason about future 3D scene evolution?

This includes:

  • motion-aware token memory;
  • future occupancy forecasting;
  • uncertainty over dynamic and occluded regions;
  • links between predictive perception and downstream planning.

These aims are ambitious, but they form one coherent trajectory.


5. Matching With Advisors

A good statement should make advisor fit easy to see.

For each potential advisor or lab, I want to identify:

  • which part of my direction overlaps with their work;
  • what technical foundation I can contribute;
  • what new question I want to explore with them;
  • why their lab is a strong environment for that question.

The statement should not sound like a generic template.

For a computer vision lab, I may emphasize 3D representation learning and semantic occupancy. For a robotics lab, I may emphasize embodied perception, prediction, and planning. For an autonomous driving lab, I may emphasize collaborative perception and deployment constraints.

The core direction stays the same, but the connection point changes.


6. What Makes the Statement Credible

A research statement becomes credible when ambition is supported by evidence.

For my application materials, I want to show:

  • I have a coherent research direction;
  • I have implemented complete systems;
  • I understand the literature and open problems;
  • I can connect math, code, experiments, and writing;
  • I care about reproducibility and real constraints;
  • I have a concrete plan for future work.

This is why I also maintain research notes on the website. The blog is not separate from the application. It shows how I think.


7. A Possible Statement Structure

A concise structure could be:

  1. Opening thesis: efficient and predictive 3D scene understanding for autonomous and embodied agents.
  2. Motivation: partial observability, limited communication, temporal uncertainty, planning needs.
  3. Previous work: semantic occupancy, collaborative token communication, world models.
  4. Future aims: efficient representation, communication-aware perception, predictive occupancy world models.
  5. Advisor fit: why the target lab is the right place.
  6. Closing: what kind of researcher I hope to become.

This structure is simple, but it keeps the statement focused.


8. Closing Thoughts

The most important thing is coherence.

I do not want my application to look like a collection of unrelated topics: computer vision, autonomous driving, LLMs, robotics, and systems.

Instead, I want it to show a trajectory:

resource-constrained systems -> 3D perception -> collaborative occupancy prediction -> token communication -> temporal memory -> occupancy world models.

That trajectory is still developing, but it is becoming clearer.

The purpose of the research statement is to make that trajectory legible to others.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • 围绕三维感知撰写 Ph.D. Research Statement
  • From Ideas to Reliable Research Systems
  • 从想法到可靠研究系统
  • How I Read Research Papers for Ph.D. Preparation
  • 我如何为 Ph.D. 准备阅读论文