A research idea is only the beginning.

In computer vision and embodied AI, the hard part is often turning a promising idea into a reliable system:

a clean problem formulation;
reproducible code;
trustworthy data processing;
stable training;
fair baselines;
meaningful ablations;
clear writing.

This note is about the engineering side of research, especially for projects in 3D perception and collaborative occupancy prediction.

1. Research Is a System

I used to think of research mainly as idea generation.

Now I think of a research project as a system:

\[\text{idea} \rightarrow \text{formulation} \rightarrow \text{implementation} \rightarrow \text{experiments} \rightarrow \text{analysis} \rightarrow \text{paper}.\]

Weakness in any stage affects the final work.

A strong idea can fail because the code is unstable. A clever model can look weak because the baseline is unfair. A good experiment can be unconvincing because the analysis is unclear.

For Ph.D. preparation, I want to become better not only at proposing ideas, but also at building complete research systems.

2. Start With a Precise Problem

Before writing code, I try to define the research problem in a compact form.

For collaborative occupancy prediction, a simplified formulation is:

\[\hat{O}_i = f_\theta(X_i, \{M_{j \rightarrow i}\}_{j \in \mathcal{N}_i}),\]

where:

(X_i) is the ego agent observation;
(M_{j \rightarrow i}) is the message from neighbor (j);
(\hat{O}_i) is the predicted semantic occupancy grid.

The communication budget can be written as:

\[\sum_{j \in \mathcal{N}_i} \mathrm{Cost}(M_{j \rightarrow i}) \leq B.\]

This formulation forces several design questions:

What is a message?
How is message cost measured?
Which regions need collaboration?
How do we evaluate the accuracy and bandwidth trade-off?
What happens when the budget changes?

If these questions are unclear, the implementation will also become unclear.

3. Make the Baseline Honest

A research contribution is only meaningful relative to a baseline.

For my projects, I try to check:

Is the baseline trained with the same data?
Does it use comparable input modalities?
Does it receive a similar communication budget?
Is the evaluation script exactly the same?
Are hyperparameters tuned fairly?
Does the baseline use test-time information that my method does not use, or vice versa?

In collaborative perception, fairness is especially important because small changes in communication budget can change the result.

If one method sends dense feature maps and another sends sparse tokens, I need to report both accuracy and communication cost. Otherwise the comparison is incomplete.

4. Build for Reproducibility Early

Reproducibility should not be added at the end.

I want each experiment to be describable as a tuple:

\[E = (\mathcal{D}, C, \theta_0, H, S, R),\]

where:

(\mathcal{D}) is the dataset and split;
(C) is the configuration;
(\theta_0) is initialization or checkpoint;
(H) is the hardware and runtime environment;
(S) is the random seed;
(R) is the resulting metrics and logs.

In practice, this means:

use configuration files rather than hidden command-line habits;
log the git commit when possible;
save exact evaluation settings;
keep dataset split definitions fixed;
record communication cost and accuracy together;
avoid silent changes to preprocessing.

This is not glamorous, but it protects the research from accidental confusion.

5. Debug Representations, Not Only Metrics

In 3D perception, a bad metric can come from many causes:

coordinate transformation error;
camera calibration mismatch;
wrong voxel indexing;
invalid BEV projection;
unstable temporal memory;
overly aggressive token compression;
pose misalignment between agents.

So I try to visualize intermediate representations.

For occupancy and BEV models, useful debug views include:

input camera images;
projected BEV feature intensity;
occupancy logits before argmax;
predicted occupied and free-space regions;
token importance maps;
requested communication regions;
temporal memory activation;
failure cases under occlusion.

The model is easier to improve when I can see where it fails.

6. Design Ablations Before the Final Result

Ablation should not be an afterthought.

Before a full experiment finishes, I try to list the claims I want to support.

For a token communication paper, possible claims are:

Token communication reduces bandwidth compared with dense feature sharing.
Receiver-driven requests send more useful information than sender-only ranking.
Content-aware merging preserves task-relevant information better than naive compression.
Temporal memory improves stability under partial observation.
Adaptive budget allocation improves the accuracy-cost trade-off.

Each claim needs a corresponding experiment.

This makes the paper stronger because the experiments are aligned with the argument, rather than being a collection of convenient numbers.

7. Keep a Failure Log

Failed experiments are easy to forget, but they often contain the most useful information.

I try to record:

what I changed;
what I expected;
what happened;
why I think it failed;
whether the failure is implementation-related or idea-related;
what I would test next.

For example:

Experiment: Increase protected token ratio.
Expected: Better occupancy mIoU under low bandwidth.
Observed: Small gain in near regions, worse far-region semantics.
Hypothesis: Protected tokens are too concentrated around high-confidence regions.
Next: Add uncertainty-aware request weighting.

This kind of log helps prevent repeated mistakes and can later become useful analysis.

8. Connect Code to Writing

A research project becomes publishable when implementation, experiments, and writing support the same argument.

I try to write the paper’s main claim early:

We improve collaborative semantic occupancy prediction under limited communication by representing scene information as task-aware tokens and adaptively selecting, merging, and fusing them according to ego uncertainty and bandwidth constraints.

Then I ask:

Does the method section directly implement this claim?
Do the experiments measure the key trade-off?
Do the figures make the idea easy to understand?
Do the ablations isolate each component?
Does the limitation section honestly state what remains unsolved?

Writing early often reveals missing experiments.

9. What This Means for Ph.D. Preparation

A Ph.D. is not only about having good ideas. It is also about repeatedly converting ideas into evidence.

For my own preparation, I want my research process to become more disciplined:

formulate problems clearly;
build stable codebases;
reproduce baselines carefully;
design ablations around claims;
analyze failures honestly;
write with a coherent research narrative.

This is also how I want my website to read: not just as a list of projects, but as evidence that I can think, build, evaluate, and communicate research.

10. Closing Thoughts

The best research systems are quiet.

They make experiments easier to trust. They make failures easier to understand. They make writing easier because the evidence is already organized.

For 3D perception and autonomous systems, this matters even more because the pipeline is complex. Data, geometry, learning, communication, temporal memory, and evaluation are all connected.

My goal is to make each project not only a result, but a reusable research machine that teaches me how to ask the next better question.

From Ideas to Reliable Research Systems