arXiv Digest — Tuesday, June 9, 2026

40 papers across AI, ML, NLP, and CV from the last 24 hours.

Today's Synthesis

Memory is the dominant thread running through today's batch. Multiple papers converge on the same problem from different angles: world models fail when cameras leave a room and return to find things changed. Four papers independently address memory as the bottleneck — latent spatial caches for video world models, cognitive-science-inspired memory modules for VLA robots, a controlled ablation study isolating memory as the central failure point, and an action-to-image formulation that encodes motion history as visual state. The field is moving past treating memory as an implementation detail toward treating it as an architectural primitive.

Robotics and agent safety form a second cluster. Five robotics papers address long-horizon control, safety attribution, and imitation learning — but the most provocative framing comes from a quantum control paper asking "who earns the safety?" — pointing out that a protective filter can silently cover for an incompetent policy, making post-filter metrics measure the guardrail rather than the learned controller. The question extends directly to LLM safety filters and RLHF.

On the theory side, tight VC-dimension bounds for Transformers and a universal approximation theorem for functional-input networks signal continued appetite for sample- and parameter-efficiency foundations. Notably absent: any papers on scaling laws or new model architectures at scale — the emphasis has shifted from "bigger" to "more principled."

The batch collectively suggests the field is consolidating around reliability — memory that persists, policies that can be verified, and evaluations that can be traced — rather than raw capability.

Papers by Category

Artificial Intelligence

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

Avijit Ghosh, Anka Reuel, Jenny Chim · 2026-06-08

AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is interpretive: readers cannot reliably compare results across sources, identify what a report omits, or trace an aggregate claim to its underlying evidence. Recent efforts address isolated components but leave three gaps: they cover only narrow s

SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation

Matthew Ho, Brian Liu, Jixuan Chen · 2026-06-08

Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain scientists hours to days. We study simulator setup as a problem of agent-tool interface grounding: what minimal simulator-specific adaptations are needed for an off-the-shelf coding agent to operate real scientific software? Our intuition is

Collaborative Human-Agent Protocol (CHAP)

Arsalan Shahid, Gordon Suttie, Philip Black · 2026-06-08

Foundation models are moving from response generation into operational roles. They plan across steps, call tools, request human input, coordinate with other agents, and increasingly carry responsibility for work that affects customers, claims, code, contracts, and clinical decisions. Production deployments are no longer one human supervising one model. They are multi-human, multi-agent collaborati

Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback

Rishabh Sabharwal, Hongru Wang, Amos Storkey · 2026-06-08

Existing benchmarks for deep research agents (DRAs) assess only single-shot outputs, ignoring a key question: can DRAs improve their reports when guided by feedback? To investigate this, we present a multi-turn evaluation of DRAs under two feedback settings: self-reflection, in which the agent revises its report without any external diagnostic signal, and process-level feedback, in which the agent

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Pu Ning, Quan Chen, Kun Tao · 2026-06-08

Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget. Howev

Machine Learning

An Agency-Transferring Model-Free Policy Enhancement Technique

Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim · 2026-06-08

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a baseline into the RL training process, simultaneously improving training efficiency

Rethinking the Divergence Regularization in LLM RL

Jiarui Yao, Xiangxin Zhou, Penghui Qi · 2026-06-08

Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential for stable optimization. Mainstream methods such as PPO and GRPO approximate this control with a ratio-clipping mechanism, but the importance ratio can be a po

Topological Neural Operators

Lennart Bastian, Samuel Leventhal, Mustafa Hajij · 2026-06-08

We introduce Topological Neural Operators (TNOs), a principled framework for operator learning on cell complexes that lifts neural operators (NOs) from functions on points and/or edges to topological domains. TNOs represent data as features defined on cells of varying dimension and model their interactions through Discrete Exterior Calculus, enabling explicit cross-dimensional coupling via gradien

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

Udvas Das, Waris Radji, Debabrota Basu · 2026-06-08

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under practitioner-friendly assumptions, we reduce this setting to linear bandit with stationary mean but heteroskedastic and non-s

Zero Touch Predictive Orchestration: Automating Time-Series Models for the Cloud-Edge Continuum

Abd Elghani Meliani, Arora Sagar, Adlen Ksentini · 2026-06-08

The Cloud-Edge Continuum (CEC) enables latency-critical applications by distributing resources to the far edge, but its extreme volatility makes proactive Zero Touch Management via time-series forecasting essential. However, orchestrators face a severe cold start problem: newly discovered nodes lack the historical data required to train localized predictive models, while generalized models fail to

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom · 2026-06-08

A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalization. We introduce iOSWorld, the first interactive native iOS simulator benchmark built around a persistent user identity

Preserving Plasticity in Continual Learning via Dynamical Isometry

Andries Rosseau, Robert Müller, Ann Nowé · 2026-06-08

Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to one) as a key mechanism for preserving plasticity in continual learning. We revisi

Perturbative Contrastive Physical Learning

Kyungeun Kim, Amanuel Anteneh, Israel Klich · 2026-06-08

Responses to perturbations are key to understanding physical systems. The ability to contrast such responses by comparing how a system reacts under slightly different conditions provides a mechanism for learning. Here, we introduce Perturbative Contrastive Physical Learning (PCPL), a general framework in which learning emerges from measurable contrasts between physical states produced by controlle

Learning Dynamics Reveal a Hierarchy of Weight-Induced Layerwise Gram Metrics

Claudio Nordio · 2026-06-08

We study feed-forward ReLU networks with fixed readout and quadratic loss. The aim is to rewrite gradient descent not primarily as a dynamics in weight space, but as a collective dynamics closed in terms of fields defined on the training-set space. For a single hidden layer, the weight variables can be eliminated from the activation dynamics, yielding a closed equation for the residuals governed b

Tight Sample Complexity of Transformers

Chenxiao Yang, Nathan Srebro, Zhiyuan Li · 2026-06-08

We tightly characterize the VC dimension of depth-L Transformers with a total of W parameters, mapping an input sequence of length T to a single output, establishing an upper bound of O(L W log(TW)) and a nearly matching lower bound of Ω(L W log(TW / L)). We further tightly characterize the sample complexity of chain-of-thought learning using such a Transformer, showing teacher forcing (i.e. selec

Disentanglement with Holographic Reduced Representations

Jhonny J. Velasquez Olivera, Christo K. Thomas, Walid Saad · 2026-06-08

Disentanglement, the separation of factors of variation in data using neural networks, remains a long-standing challenge in machine learning. Prior work has addressed this problem with variational autoencoders and generative adversarial networks that incorporate ideas from variational inference and information-theoretic constraints. In contrast to methods that rely on continuous representations, w

Computation & Language (NLP)

Causally Evaluating the Learnability of Formal Language Tasks

Vesteinn Snaejarnarson, Anej Svete, Josef Valvoda · 2026-06-08

Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setti

Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan

Alexander Chulzhanov, Soeren Eberhardt, Arjun Mukherjee · 2026-06-08

Neural machine translation for digitally low-resource Indigenous languages is often hindered by extreme data scarcity, prompting reliance on extractive web-scraping. To ensure data sovereignty, this study introduces a data synthesis methodology to bootstrap NMT models without scraping target-language parallel text. Focusing on Q'eqchi' Mayan, we transformed community-sourced dictionaries into a ma

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

Wendy K. Tam · 2026-06-08

The ambition behind alignment training is to make large language models safe and useful. The primary mechanism, reinforcement learning from human feedback (RLHF), shapes the behavior of deployed language models by aligning them with 'human values.' Yet the process is opaque. What values are being encoded; whose values are they; and how does RLHF encode them? A growing body of evidence suggests tha

Computer Vision

Latent Spatial Memory for Video World Models

Weijie Wang, Haoyu Zhao, Yifan Yang · 2026-06-08

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel space discards rich features of the learned latent representation. In this paper, we introduce laten

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Mingxian Lin, Shengju Qian, Yuqi Liu · 2026-06-08

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We addres

PTL-Diffusion: Manifold-Aware Diffusion with Periodic Terminal Laws

Danqi Zhuang, Jisui Huang, Xiaoyue Xi · 2026-06-08

Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically powerful, it provides little explicit structure for data concentrated near low-dimensional manifolds, where different regions of the data distribution may correspond to distinct local geometric or semantic

Echo-Memory: A Controlled Study of Memory in Action World Models

Wayne King, Zeyue Xue, Yuxuan Bian · 2026-06-08

We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure is often memory rather than local image synthesis: after the camera leaves and returns, the scene or salient object may silently change. Existing memory designs are hard

Beyond Spherical Harmonics: Rethinking Appearance Models for Radiance Reconstruction

Ewa Miazga, Jorge Condor, Piotr Didyk · 2026-06-08

View-dependent appearance modeling remains a challenging problem in novel-view synthesis and reconstruction. Accurately representing complex angular effects often requires substantial memory and computational resources. For new learning-based methods, a common approach is to rely on SH. However, capturing high-frequency phenomena such as specular reflections demands high-order expansions, which in

End-to-End Optimization of Incoherent Imaging for Classification Under Detector-Limited Readout

Archer Wang, Joshua Chen, Sachin Vaidya · 2026-06-08

End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism characterizing when and why such systems outperform conventional lens-based imaging is largely lacking. This paper focuses on object classification, a central imaging task, and asks when end-to-end optimization of a phase mask for incoherent im

This digest is generated automatically from arXiv submissions. Not affiliated with arXiv or Cornell University.