arXiv Digest — Friday, June 5, 2026

25 papers across AI, ML, NLP, and CV from the last 24 hours.

Today's Synthesis

Three threads connect this batch. A cluster of papers treats agent infrastructure as a first-class research problem rather than an engineering afterthought — persistent memory across sessions, formal recuse signals for access denial, and collaborative competence between multi-agent systems. A second cluster attacks the scaling ceiling of reasoning from below: cross-layer sparse attention, latent reasoning with normalizing flows that bypass textual CoT, and recurrent pretraining without recurrence all aim to make long-horizon computation cheaper before the model even starts thinking. Finally, robotics papers converge on controllability as the bottleneck for deployed systems, whether that means speed-modulated VLA policies or whole-body command spaces that planners can actually address.

The standout is Emergent Language as an Approach to Conscious AI, which proposes studying machine consciousness not through human-language checklists but by letting multi-agent RL systems develop their own communication protocols. If consciousness-related structure emerges only when agents must coordinate without a shared linguistic prior, the paper sidesteps the circularity of evaluating AI consciousness using the language that trained it.

The batch as a whole suggests a field maturing past architecture-scale as the only lever of progress. Where 2024 was defined by bigger context windows, today's papers treat memory efficiency, sparse computation, and inter-agent protocol design as the frontier — signaling that the next gains will come from how systems think together, not how large they are individually.

Papers by Category

Computation & Language (NLP)

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen · 2026-05-29

A long-standing finding in the causal learning literature is that adults struggle to identify conjunctive causal rules while performing better in disjunctive settings. This paper asks whether this bias persists when learners actively choose experiments. We compare human adults, children, and LLMs on an interactive causal discovery task.

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill

Mehmet Iscan · 2026-05-29

Large language models increasingly write, review, and judge code, and a fast-growing practice equips them with prompt skills that ask the model to reason like a scientist. A prominent example tells the model to act as a Popperian falsificationist. We conduct a controlled, pre-registered study using objective test-suite evaluation and find no statistically significant benefit from the Popperian ...

Latent Reasoning with Normalizing Flows

Guancheng Tu, Xiangjun Fu, Suhao Yu · 2026-05-29

Large language models often improve reasoning by generating explicit chain-of-thought. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream. Latent reasoning offers a higher-bandwidth alternative, performing intermediate computations in continuous latent space. We parameterize latent reasoning steps via normalizing flows and show impro...

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

Heng-Jui Chang, Alexander H. Liu, Saurabhchand Bhati · 2026-05-29

Audio encoders are critical to modern audio applications as LLMs increasingly rely on a single encoder for diverse inputs. We present USAD 2.0, a universal audio encoder scaled via representation distillation across speech, music, and sound event domains.

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

Xinnong Zhang, Wanting Shan, Hanjia Lyu · 2026-05-29

Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditin...

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Hanxu Hu, Zdenek Snajdr, Pinzhen Chen · 2026-05-29

Prior work has shown that LLMs can translate unseen or low-resource languages by undergoing continued training or by encoding grammar rules in context. We use RL to train models to leverage provided linguistic resources dynamically, enabling translation of extremely low-resource languages at scale.

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

Petr Parshakov · 2026-05-29

We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs from 74 narrative texts, documented provenance, sentence-level alignment, and story identifiers that enable leakage-aware evaluation.

CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments

Jiaju Chen, Bo Sun, Yuxuan Lu · 2026-05-29

Multi-agent systems built on LLMs have shown growing promise, yet MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual autonomy with group coordination. We introduce CollabSim, a methodology grounded in CSCW research for evaluating ...

Computer Vision

PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding

Shaohui Dai, Yansong Qu, You Shen · 2026-05-29

Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In t...

Complexity-Balanced Diffusion Splitting

Noam Issachar, Dani Lischinski, Raanan Fattal · 2026-05-29

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS),...

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Chenming Zhu, Jingli Lin, Yilin Long · 2026-05-29

While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewpoints when only limited egocentric observations are available. In this work, ...

A Vision-language Framework for Comparative Reasoning in Radiology

Tengfei Zhang, Ziheng Zhao, Lisong Dai · 2026-05-29

Medical imaging AI has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological comparison as an entity-aware cross-image reasoning problem and introduce a framework that supports both reference-case retri...

Artificial Intelligence

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Lizhi Yang, Junheng Li, Nehar Poddar · 2026-05-29

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and ...

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Dong Jing, Jingchen Nie, Tianqi Zhang · 2026-05-29

Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV-cache reuse, or reinforcement learning only shift the policy from ...

Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola · 2026-05-29

Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients. We propose Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit p...

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger · 2026-05-29

Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on GRPO or modifications to steer models to produce Chain-of-Thought traces. The final answer can only be verified after the CoT trace is complete, making it a delayed reward problem. We propose RREDCoT, a segment-level reward redistribution method that allocates ...

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Jui-Hui Chung, Ziyang Cai, Zihao Li · 2026-05-29

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. Goedel-Architect generates a blueprint of formally stated definitions and lemmas, then a tool-equipped Lean prover fills in each node.

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Yutao Sun, Yanqi Zhang, Li Dong · 2026-05-29

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings. We propose cross-layer sparse attention with shared routing: a single index computed at the first layer is reused across all subsequent layers, dramatically reducing the indexing overhead while preserving accuracy.

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

Thamilvendhan Munirathinam · 2026-05-29

As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. We propose a third mode: a lightweight, published in-band deny signal -- the Recuse Signal -- that a server emits over a protocol layer to instruct compliant agents to back off voluntarily.

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Zhuoming Chen, Xinrui Zhong, Qilong Feng · 2026-05-29

Sparse attention is becoming increasingly important for serving LLMs as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive. To address this challenge, we present Vortex, a system that combines a Python-embedded frontend language atop a page-cached sparse attention execution engine.

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Yasmine Omri, Ziyu Gan, Zachary Broveak · 2026-05-29

LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. A rich ecosystem of agent memory systems has emerged spanning flat retrieval, LLM-mediated extraction, consolidating fact stores, and agentic control flows. Yet, their system-level behavior remains uncharacterized. We benchmark these approaches across real-world workloads.

Statistical ML

Causal Atlases from Entropic Inference: Bayesian Networks beyond Optimal DAGs

Hazhir Aliahmadi, Irina Babayan, Greg van Anders · 2026-05-29

Data-driven causal relationship identification is pertinent to advancing understanding of complex systems both within and beyond science. Bayesian networks offer a probabilistic method for modelling generic causal relationships via directed acyclic graphs. We propose an entropic inference approach that produces a distribution over DAGs, enabling uncertainty quantification in causal discovery.

Nonreversible Gauge Fields in Fokker--Planck Dynamics: Supersymmetric Hamiltonians and Learned Finite Forces

Masayuki Ohzeki · 2026-05-29

We formulate stationary-density-preserving nonreversible perturbations of Fokker--Planck dynamics as gauge fields that deform relaxation spectra while leaving the invariant state fixed. With implications for accelerated sampling in diffusion models.

Conformal Risk Sharing: Certified Cost Allocation with Participation Guarantees

Ieva Kazlauskaite · 2026-05-29

Sharing the financial impact of rare adverse events across a group can soften extreme individual burdens. We formalize this as the Certified Allocation Problem: from finite data and without parametric assumptions, construct allocation rules that guarantee both individual rationality and aggregate risk bounds using conformal prediction.

Estimation of the sub-Gaussian parameter

Jason Liu, Min Xu, Jinchuan Xing · 2026-05-29

The sub-Gaussian parameter of a mean-zero random variable is defined. Despite the ubiquity of sub-Gaussian random variables, the estimation of the parameter has received little attention. In this work, we study a natural estimator and establish its consistency and concentration properties under general conditions.

This digest is generated automatically from arXiv submissions. Not affiliated with arXiv or Cornell University.