Bandwidth-Aware Adaptive Token Communication for Collaborative Occupancy Prediction

Overview

This project investigates efficient multi-agent perception from an information-theoretic perspective, focusing on the trade-off between communication bandwidth and perception performance.

The central idea is to treat inter-agent communication as an information allocation problem. In collaborative perception, not every feature, token, or message contributes equally to the final task. Some information is critical for resolving occlusion or uncertainty, while other information is redundant because it is already visible to the ego agent or repeated across neighboring agents.

The goal is to develop a unified bandwidth-aware framework that can explain and guide the design of communication-efficient perception systems under resource constraints.

This work is ongoing and planned for submission to NeurIPS 2026.

Problem Motivation

In collaborative perception systems, agents need to exchange information to build a consistent global understanding of the environment.

However:

communication bandwidth is limited;
latency and packet size matter for real systems;
dense feature sharing is expensive;
neighboring agents often transmit redundant information;
different scene regions have different task importance;
existing methods often rely on heuristic compression ratios.

This project studies how to formally model:

what information should be transmitted, and how much is sufficient

This question is especially important for collaborative semantic occupancy prediction. Occupancy prediction requires dense 3D scene understanding, but dense 3D representations are expensive to communicate. A principled communication mechanism should allocate bandwidth to the most task-relevant and uncertainty-reducing information.

Core Perspective

The problem is formulated through a Rate–Distortion framework:

Rate: communication cost, such as transmitted tokens, bytes, or feature payload;
Distortion: task degradation, such as occupancy prediction error or mIoU loss.

The objective is to understand and optimize the trade-off between:

compact representations;
information completeness;
perception accuracy;
robustness under bandwidth constraints.

A simplified objective can be written as:

[ \min_{\mathcal{M}} \; D(Y,\hat{Y}_{\mathcal{M}}) + \lambda R(\mathcal{M}) ]

where:

(\mathcal{M}) is the transmitted message set;
(R(\mathcal{M})) measures communication rate;
(D(Y,\hat{Y}_{\mathcal{M}})) measures perception distortion;
(\lambda) controls the trade-off between accuracy and bandwidth.

This formulation provides a way to reason about communication not as a fixed engineering constraint, but as a variable to be optimized jointly with perception quality.

Key Ideas

1. Information-Aware Representation

Instead of treating features as raw tensors, the system models tokens as information carriers with varying task utility.

Each token can be evaluated from multiple perspectives:

semantic relevance;
uncertainty reduction;
novelty relative to ego memory;
complementarity with neighboring observations;
expected contribution to occupancy prediction.

This allows the communication module to prioritize information that is more likely to improve the final perception output.

2. Selective Information Transmission

Only a subset of information is transmitted across agents based on:

relevance to the downstream task;
redundancy across agents;
temporal novelty;
bandwidth budget;
receiver-side request.

Rather than broadcasting all intermediate features, agents transmit a compact set of tokens selected under a communication budget:

[ |\mathcal{M}| \leq B ]

where (B) is the bandwidth or token budget.

3. Communication–Memory Trade-off

The framework jointly considers:

communication: inter-agent information exchange;
memory: temporal information reuse;
local perception: information already available to the ego agent.

If a useful region is already represented in temporal memory, it may not need to be transmitted again. Conversely, if an occluded region is missing from ego memory, it may deserve more communication budget.

This leads to a joint view:

communicate what memory and local observation cannot explain.

4. Adaptive Budget Allocation

The project also studies adaptive bandwidth allocation. Easy scenes may require little communication, while complex scenes with occlusion, dynamic objects, or uncertain regions may require more.

Potential signals for adaptive allocation include:

ego uncertainty maps;
occupancy entropy;
object density;
disagreement among agents;
temporal change magnitude;
distance and viewpoint complementarity.

The long-term goal is to make communication budgets scene-aware rather than fixed.

System-Level Implications

The theoretical framework provides guidance for:

designing communication-efficient token representations;
selecting informative features under bandwidth constraints;
balancing real-time communication and temporal memory;
learning token importance scores that are aligned with downstream occupancy quality;
comparing systems through accuracy-rate curves rather than a single compression setting.

It also connects naturally with practical system designs for:

collaborative occupancy prediction;
multi-agent perception pipelines;
token merging and token pruning;
receiver-driven communication;
future occupancy world models.

This project serves as a theoretical and algorithmic extension of my work on token-based collaborative occupancy prediction.

Planned Method Components

The framework is planned around several implementable modules:

uncertainty-aware request generation: the ego agent identifies regions where local observation and memory are insufficient;
token utility estimation: neighboring agents estimate which tokens are most helpful for the receiver;
rate-constrained token selection: messages are selected under an explicit bandwidth budget;
adaptive token merging: redundant tokens are merged into compact representatives;
rate-distortion evaluation: methods are compared using mIoU versus communication cost curves.

These components connect information-theoretic principles with trainable deep perception modules.

Research Significance

This work aims to:

provide a principled foundation for communication-efficient perception;
bridge information theory and deep learning systems;
explain why some tokens are worth transmitting and others are redundant;
support scalable multi-agent perception;
guide future occupancy world model learning under limited bandwidth.

The broader motivation is that collaborative autonomy cannot assume unlimited communication. A practical system must know how to spend bandwidth wisely.

Status

This work is ongoing and planned for NeurIPS 2026. Detailed formulations and implementation details are omitted while the manuscript is under development.