25 papers across AI, ML, NLP, and CV from the last 24 hours.
Three threads connect this batch. A cluster of papers treats agent infrastructure as a first-class research problem rather than an engineering afterthought — persistent memory across sessions, formal recuse signals for access denial, and collaborative competence between multi-agent systems. A second cluster attacks the scaling ceiling of reasoning from below: cross-layer sparse attention, latent reasoning with normalizing flows that bypass textual CoT, and recurrent pretraining without recurrence all aim to make long-horizon computation cheaper before the model even starts thinking. Finally, robotics papers converge on controllability as the bottleneck for deployed systems, whether that means speed-modulated VLA policies or whole-body command spaces that planners can actually address.
The standout is Emergent Language as an Approach to Conscious AI, which proposes studying machine consciousness not through human-language checklists but by letting multi-agent RL systems develop their own communication protocols. If consciousness-related structure emerges only when agents must coordinate without a shared linguistic prior, the paper sidesteps the circularity of evaluating AI consciousness using the language that trained it.
The batch as a whole suggests a field maturing past architecture-scale as the only lever of progress. Where 2024 was defined by bigger context windows, today's papers treat memory efficiency, sparse computation, and inter-agent protocol design as the frontier — signaling that the next gains will come from how systems think together, not how large they are individually.
Mandana Samiei, Eunice Yiu, Anthony GX-Chen · 2026-05-29
A long-standing finding in the causal learning literature is that adults struggle to identify conjunctive causal rules while performing better in disjunctive settings. This paper asks whether this bias persists when learners actively choose experiments. We compare human adults, children, and LLMs on an interactive causal discovery task.
Mehmet Iscan · 2026-05-29
Large language models increasingly write, review, and judge code, and a fast-growing practice equips them with prompt skills that ask the model to reason like a scientist. A prominent example tells the model to act as a Popperian falsificationist. We conduct a controlled, pre-registered study using objective test-suite evaluation and find no statistically significant benefit from the Popperian ...
Guancheng Tu, Xiangjun Fu, Suhao Yu · 2026-05-29
Large language models often improve reasoning by generating explicit chain-of-thought. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream. Latent reasoning offers a higher-bandwidth alternative, performing intermediate computations in continuous latent space. We parameterize latent reasoning steps via normalizing flows and show impro...
Heng-Jui Chang, Alexander H. Liu, Saurabhchand Bhati · 2026-05-29
Audio encoders are critical to modern audio applications as LLMs increasingly rely on a single encoder for diverse inputs. We present USAD 2.0, a universal audio encoder scaled via representation distillation across speech, music, and sound event domains.
Xinnong Zhang, Wanting Shan, Hanjia Lyu · 2026-05-29
Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditin...
Hanxu Hu, Zdenek Snajdr, Pinzhen Chen · 2026-05-29
Prior work has shown that LLMs can translate unseen or low-resource languages by undergoing continued training or by encoding grammar rules in context. We use RL to train models to leverage provided linguistic resources dynamically, enabling translation of extremely low-resource languages at scale.
Petr Parshakov · 2026-05-29
We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs from 74 narrative texts, documented provenance, sentence-level alignment, and story identifiers that enable leakage-aware evaluation.
Jiaju Chen, Bo Sun, Yuxuan Lu · 2026-05-29
Multi-agent systems built on LLMs have shown growing promise, yet MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual autonomy with group coordination. We introduce CollabSim, a methodology grounded in CSCW research for evaluating ...
Shaohui Dai, Yansong Qu, You Shen · 2026-05-29
Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In t...
Noam Issachar, Dani Lischinski, Raanan Fattal · 2026-05-29
Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS),...
Chenming Zhu, Jingli Lin, Yilin Long · 2026-05-29
While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewpoints when only limited egocentric observations are available. In this work, ...
Tengfei Zhang, Ziheng Zhao, Lisong Dai · 2026-05-29
Medical imaging AI has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological comparison as an entity-aware cross-image reasoning problem and introduce a framework that supports both reference-case retri...
Lizhi Yang, Junheng Li, Nehar Poddar · 2026-05-29
For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and ...
Dong Jing, Jingchen Nie, Tianqi Zhang · 2026-05-29
Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV-cache reuse, or reinforcement learning only shift the policy from ...
Akarsh Kumar, Phillip Isola · 2026-05-29
Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients. We propose Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit p...
Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger · 2026-05-29
Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on GRPO or modifications to steer models to produce Chain-of-Thought traces. The final answer can only be verified after the CoT trace is complete, making it a delayed reward problem. We propose RREDCoT, a segment-level reward redistribution method that allocates ...
Jui-Hui Chung, Ziyang Cai, Zihao Li · 2026-05-29
We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. Goedel-Architect generates a blueprint of formally stated definitions and lemmas, then a tool-equipped Lean prover fills in each node.
Yutao Sun, Yanqi Zhang, Li Dong · 2026-05-29
Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings. We propose cross-layer sparse attention with shared routing: a single index computed at the first layer is reused across all subsequent layers, dramatically reducing the indexing overhead while preserving accuracy.
Thamilvendhan Munirathinam · 2026-05-29
As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. We propose a third mode: a lightweight, published in-band deny signal -- the Recuse Signal -- that a server emits over a protocol layer to instruct compliant agents to back off voluntarily.
Zhuoming Chen, Xinrui Zhong, Qilong Feng · 2026-05-29
Sparse attention is becoming increasingly important for serving LLMs as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive. To address this challenge, we present Vortex, a system that combines a Python-embedded frontend language atop a page-cached sparse attention execution engine.
Yasmine Omri, Ziyu Gan, Zachary Broveak · 2026-05-29
LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. A rich ecosystem of agent memory systems has emerged spanning flat retrieval, LLM-mediated extraction, consolidating fact stores, and agentic control flows. Yet, their system-level behavior remains uncharacterized. We benchmark these approaches across real-world workloads.
Hazhir Aliahmadi, Irina Babayan, Greg van Anders · 2026-05-29
Data-driven causal relationship identification is pertinent to advancing understanding of complex systems both within and beyond science. Bayesian networks offer a probabilistic method for modelling generic causal relationships via directed acyclic graphs. We propose an entropic inference approach that produces a distribution over DAGs, enabling uncertainty quantification in causal discovery.
Masayuki Ohzeki · 2026-05-29
We formulate stationary-density-preserving nonreversible perturbations of Fokker--Planck dynamics as gauge fields that deform relaxation spectra while leaving the invariant state fixed. With implications for accelerated sampling in diffusion models.
Ieva Kazlauskaite · 2026-05-29
Sharing the financial impact of rare adverse events across a group can soften extreme individual burdens. We formalize this as the Certified Allocation Problem: from finite data and without parametric assumptions, construct allocation rules that guarantee both individual rationality and aggregate risk bounds using conformal prediction.
Jason Liu, Min Xu, Jinchuan Xing · 2026-05-29
The sub-Gaussian parameter of a mean-zero random variable is defined. Despite the ubiquity of sub-Gaussian random variables, the estimation of the parameter has received little attention. In this work, we study a natural estimator and establish its consistency and concentration properties under general conditions.
This digest is generated automatically from arXiv submissions. Not affiliated with arXiv or Cornell University.