-
LLM Learning: From Pretraining to Decoder Inference
A structured note on how large language models are built and used: tokenization, pretraining, decoder-only Transformers, post-training, prefill, decoding, KV cache, RAG, and core LLM vocabulary.
-
Refining My PhD Research Direction Around 3D Perception
A personal research note on connecting my PhD preparation, semantic occupancy prediction, collaborative perception, token communication, and occupancy world models into a coherent research direction.
-
From Occupancy Prediction to Occupancy World Models
A research note on extending semantic occupancy prediction from current-state reconstruction to future 4D occupancy forecasting and world modeling for autonomous agents.
-
Token Communication for Multi-Agent 3D Perception
A research note on tokenized scene representations, token selection, token merging, and communication-efficient collaborative occupancy prediction.
-
Collaborative Perception: Seeing Beyond a Single Agent
A research note on collaborative perception, multi-agent scene understanding, communication constraints, pose alignment, and why collaboration is important for 3D occupancy prediction.
-
Semantic Occupancy as a Bridge Between Perception and Planning
A research note on semantic occupancy prediction, why it matters for autonomous agents, and how it connects 3D perception, occlusion reasoning, uncertainty, and downstream planning.
-
AI Agents and Embodied Intelligence
Study notes on AI agents, embodied intelligence, perception-action loops, memory, planning, world models, and their connections to computer vision, autonomous driving, and 3D scene understanding.
-
Reinforcement Learning and Decision Making
Study notes on reinforcement learning, decision making, Markov decision processes, dynamic programming, model-free and model-based RL, multi-agent RL, and their connections to embodied AI and autonomous driving.
-
Computer Graphics Foundations
Study notes on computer graphics foundations, geometry, rendering, NeRF, 3D Gaussian Splatting, differentiable rendering, and their connections to computer vision and 3D scene understanding.
-
Computer Vision Foundations
Study notes on computer vision foundations, Cornell Introduction to Computer Vision, multi-view geometry, 3D representations, depth estimation, and their connections to autonomous driving perception.
-
Deep Learning Foundations
Study notes on neural networks, CNNs, Transformers, CS231n, Andrew Ng's Deep Learning Specialization, and their connections to computer vision and autonomous driving perception.
-
Machine Learning Foundations
Study notes on core machine learning, Andrew Ng's machine learning course, PRML, statistical learning theory, representation learning, and their connections to computer vision and autonomous driving.
-
Mathematical Foundations
Study notes on matrix theory, numerical analysis, probability and statistics, optimization, and related mathematical foundations for machine learning and computer vision.
-
Building My PhD Knowledge Base for Computer Vision
A structured roadmap for building the mathematical, machine learning, computer vision, graphics, autonomous driving, and embodied AI foundations needed for PhD research.