40 papers across AI, ML, NLP, and CV from the last 24 hours.
Three interconnected threads dominate today's batch: the push to make reasoning cheaper and more structured, the emergence of agent infrastructure as a first-class research area, and a renewed focus on efficiency as long-context workloads strain serving systems.
Reasoning gets treated not as a monologue but as something to engineer at the signal level. One paper replaces serial textual chain-of-thought with latent-space updates via normalizing flows, arguing that forcing every intermediate step through discrete tokens wastes representational bandwidth. Another tackles the delayed-reward problem in reasoning models by redistributing credit at segment granularity rather than waiting for full CoT traces. A third mines the discarded low-confidence predictions from diffusion language models as a lookahead signal for retrieval—turning the model's own uncertainty into a useful probe.
Agent systems shift from capability demos to operational concerns. The standout is a proposal for in-band recusal signals—a lightweight deny mechanism servers can emit over existing protocol channels (SSH banners, PostgreSQL errors) to tell credential-holding agents a resource is off-limits. It's the first paper to treat agent access control as a protocol design problem rather than an application-layer prompt issue, and it matters because autonomous agents with real credentials don't fit the allow/deny binary.
A quiet but significant pattern: multiple papers attack the training-serving gap. Work on polynomial weight preconditioning and double preconditioning both decouple training stability from deployment behavior, while sparse attention systems like Vortex and shared cross-layer routing try to make long-context generation affordable without quality loss. Together, the batch suggests the field is pivoting from making models bigger to making their computation more legible, controllable, and cost-effective to serve.
TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning Marius Dragoi, Ioana Pintilie, Alexandra Dragomir · 2026-06-04
Parameter-efficient finetuning methods based on spectral decomposition have enabled progress in Continual Learning. In this paper we introduce TailLoR, which utilizes the singular bases U and V of the pre-trained weights as a fixed reference frame to learn a low-rank update applied to the singular value matrix. A soft spectral penalty discourages updates aligned with dominant singular directions,...
Regret Minimization with Adaptive Opponents in Repeated Games Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu · 2026-06-04
In this paper, we study regret minimization in repeated games with adaptive opponents who can respond based on histories of play. The standard metric of external regret in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce Repeated Policy Regret (RP-Regret), a game-theoretic metric that measures the difference between the...
Pretraining Recurrent Networks without Recurrence Akarsh Kumar, Phillip Isola · 2026-06-04
Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffers from vanishing or exploding gradients, making long-range associations difficult to learn. We propose Supervised Memory Training (SMT), a method for training...
RREDCoT: Segment-Level Reward Redistribution for Reasoning Models Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger · 2026-06-04
Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can only be verified, and the reward assigned, after the CoT trace is complete, making it a delayed...
HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers Lizhi Yang, Junheng Li, Nehar Poddar · 2026-06-04
For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and...
TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies Dong Jing, Jingchen Nie, Tianqi Zhang · 2026-06-04
Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV-cache reuse, or reinforcement learning only shift the policy from...
RiskFlow: Fast and Faithful Safety-Critical Traffic Scenario Generation Qi Lan, Yining Tang, Yu Shen · 2026-06-04
Safety-critical traffic scenario generation is essential for evaluating autonomous driving systems under rare but high-risk interactions. Existing diffusion-based methods offer strong controllability in closed-loop generation, but their iterative denoising process is computationally expensive and may accumulate sampling and guidance errors over long rollouts, causing unrealistic motion artifacts...
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution Liliana Hotsko, Yinxi Li, Yuntian Deng · 2026-06-04
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork framework that generates repository-specific...
Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill Mehmet Iscan · 2026-06-04
Large language models increasingly write, review, and judge code, and a fast-growing practice equips them with prompt 'skills' that ask the model to reason like a scientist. A prominent example tells the model to act as a Popperian falsificationist, and such skills are reported to improve generated code. But these gains are almost always read off an LLM-as-a-judge, an instrument with documented...
PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding Shaohui Dai, Yansong Qu, You Shen · 2026-06-04
Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In...
Complexity-Balanced Diffusion Splitting Noam Issachar, Dani Lischinski, Raanan Fattal · 2026-06-04
Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS), a...
Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators Chenming Zhu, Jingli Lin, Yilin Long · 2026-06-04
While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewpoints when only limited egocentric observations are available. In this work, we...
A Vision-language Framework for Comparative Reasoning in Radiology Tengfei Zhang, Ziheng Zhao, Lisong Dai · 2026-06-04
Medical imaging artificial intelligence has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological comparison as an entity-aware cross-image reasoning problem and introduce a framework that supports both...
Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao · 2026-06-04
As AI writing assistants become increasingly integrated into real-world drafting and revision workflows, many documents are no longer purely human-written or AI-generated, but instead result from progressive human-AI co-editing. However, existing AI-text detection benchmarks largely focus on final outputs and provide limited understanding of how AI authorship signals emerge, accumulate, or...
Self-Augmenting Retrieval for Diffusion Language Models Paul Jünger, Justin Lovelace, Linxi Zhao · 2026-06-04
Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens are in fact a useful lookahead signal for retrieval-augmented generation: even low-confidence tokens...
You Only Index Once: Cross-Layer Sparse Attention with Shared Routing Yutao Sun, Yanqi Zhang, Li Dong · 2026-06-04
Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically provide stronger acceleration but incur noticeable quality loss, while token...
Human Adults and LLMs as Scientists: Who Benefits from Active Exploration? Mandana Samiei, Eunice Yiu, Anthony GX-Chen · 2026-06-04
A long-standing finding in the causal learning literature is that adults struggle to identify conjunctive causal rules, where an effect requires the simultaneous presence of multiple causes, while performing better in disjunctive settings. However, most demonstrations of this conjunctive handicap rely on passive observation paradigms with limited evidence, where learners have no control over...
DNQ: Deep Nash Q-Network for Partially Observable n-Player Games Qintong Xie, Edward Koh, Xavier Cadet · 2026-06-04
Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propose DNQ, a solver-in-the-loop equilibrium supervision framework for training...
MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery Shangheng Du, Xiangchao Yan, Jinxin Shi · 2026-06-04
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present...
Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement Jui-Hui Chung, Ziyang Cai, Zihao Li · 2026-06-04
We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies. This blueprint is optionally guided by a...
Benchmark Everything Everywhere All at Once Shiyun Xiong, Dongming Wu, Peiwen Sun · 2026-06-04
Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly reach performance saturation after their release, resulting in insufficient discrimination among...
How abundant are good interpolators? August Y. Chen, Ahmed El Alaoui · 2026-06-04
Let S be the set of unit norm linear classifiers which correctly classify every point of a labeled dataset. Under two natural data-generating distributions of the (X,y) pairs -- a Gaussian mixture model and a logistic model with Gaussian features -- and in the proportional regime n/d to alpha with small enough alpha, we establish a large deviation principle
Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals Thamilvendhan Munirathinam · 2026-06-04
As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client). We propose a third mode: a lightweight, published in-band deny signal -- the...
USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding Heng-Jui Chang, Alexander H. Liu, Saurabhchand Bhati · 2026-06-04
Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech or music experts, multi-domain approaches like USAD and SPEAR remain limited in coverage and evaluation. Recent studies also suggest supervised encoders align...
Nonreversible Gauge Fields in Fokker--Planck Dynamics: Supersymmetric Hamiltonians and Learned Finite Forces Masayuki Ohzeki · 2026-06-04
We formulate stationary-density-preserving nonreversible perturbations of Fokker--Planck dynamics as gauge fields that deform relaxation spectra while leaving the invariant state fixed. When detailed balance holds, a similarity transformation maps the reversible Fokker--Planck operator to a Witten-Laplacian-type supersymmetric Hamiltonian; nonreversible gauges then appear as non-Hermitian...
This digest is generated automatically from arXiv submissions. Not affiliated with arXiv or Cornell University.