arXiv Digest — Monday, June 8, 2026

40 papers across AI, ML, NLP, and CV from the last 24 hours.

Today's Synthesis

Three themes run through today's batch: the push toward much longer context windows, the maturation of agentic frameworks beyond single-turn demos, and a growing appetite for peering inside model internals. Long-video and extended-temporal reasoning papers cluster together because the core bottleneck — token explosion and attention dilution over hours of input — is finally being tackled with architecture-level solutions rather than brute-force scaling. MemDreamer splits perception from reasoning and streams video into a hierarchical graph memory, while a separate line of work compresses tokens in autonomous driving pipelines by aligning them with downstream planning objectives rather than generic saliency heuristics.

The agent papers are notable for their scope. Agentopia extends LLM-powered social simulation from days to months, letting agents accumulate experience at scales that could actually support learning rather than just task completion. Socratic-SWE closes the loop by reusing an agent's own debugging traces to generate new training tasks — self-play for software engineering.

Standout paper: "A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning" exhaustively labels 10,247 reasoning steps across AIME 2025 problems and finds a structural gap — humans backtrace and reflect far more often, while the model stacks inferences without revisiting earlier decisions. It reframes the "Aha moment" debate from whether models feel insight to whether their reasoning graphs have the right topology.

Conspicuously absent: no reinforcement learning from human feedback papers in this crop, and the optimization batch leans classical (decentralized SGD, path kernel interpolation) rather than LM-focused. Taken together, the field is shifting from "make it bigger" toward "make it work over long horizons, with real self-improvement loops, and with auditable internals."

Papers by Category

Artificial Intelligence (cs.AI)

How reliable are LLMs when it comes to playing dice?

Luca Avena, Gianmarco Bet, Bernardo Busoni · 2026-06-05

Controlled benchmarking of 8 state-of-the-art models on discrete probability problems with standard and counterintuitive exercises, testing with and without Chain-of-Thought. Models average 0.96 accuracy on standard problems but drop to 0.59 on counterintuitive ones that trigger heuristic reasoning.

Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Xintao Wang, Sirui Zheng, Hongqiu Wu · 2026-06-05

Extends LLM-powered agent society simulation from days to months-scale, studying whether agents can learn from simulated social experience to better understand and replicate human behavior through long-term growth.

How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope

Jeremy Yang, Kate Zyskowski, Noah Yonack · 2026-06-05

Using production data from Perplexity's Search and Computer products, studies the transition from conversational assistants to autonomous agents that execute tasks end-to-end, with three key empirical findings on acceleration and scope.

Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle

Jiayu Wang, Weijiang Lv, Bowen Fu · 2026-06-05

Evaluates frontier agents across the research lifecycle, revealing significant limitations in field sensitivity, research ethics, and nuanced scientific judgment despite proficiency in coding and autonomous experiment execution.

PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

Fuqiang Wang, Song Tan, Zheng Guo · 2026-06-05

Framework organizing scientific paper recommendation into three coupled longitudinal stages — Profiling, Recommending, and Adapting — handling daily paper streams where interests shift and feedback accumulates.

Computation & Language / NLP (cs.CL)

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Songhao Wu, Zhongxin Chen, Yuxuan Liu · 2026-06-05

Identifies why LLMs struggle as off-the-shelf embedding models: text embeddings align with frequent but uninformative tokens when projected onto vocabulary space. Proposes using the unembedding matrix as a feature lens to correct this.

Sycophantic Praise: Evaluating Excessive Praise in Language Models

Daniel Vennemeyer, Phan Anh Duong, Meryl Ye · 2026-06-05

Argues sycophantic praise is a distinct alignment problem beyond excessive agreement. Introduces a parameterized framework measuring whether praise is excessive relative to contribution quality and expected user ability.

The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs

Yang Zhang, Xiao Fei, Amr Mohamed · 2026-06-05

Investigates whether local cultural knowledge is better accessed through English or the local language in LLMs, using masked evaluation to separate language proficiency from language-conditioned knowledge access.

Computer Vision (cs.CV)

UniSHARP: Universal Sharp Monocular View Synthesis

Meixi Song, Dizhe Zhang, Hao Ren · 2026-06-05

Extends SHARP for universal monocular rendering across a continuum of camera systems — conventional perspective to wide-FOV, fisheye, and omnidirectional panoramic — by aligning images in a unified omnidirectional latent space.

Streaming Video Generation with Streaming Force Control

Hanhui Wang, Yiming Xie, Haiwen Feng · 2026-06-05

Introduces StreamForce, a causal and unified streaming video generation model that responds instantly to continuous, time-varying force inputs — local or global — without requiring separate models per force type.

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding

Cong Chen, Guo Gan, Kaixiang Ji · 2026-06-05

Decouples perception and reasoning for hours-long video understanding by incrementally constructing a Hierarchical Graph Memory with three-tier semantic abstraction, shifting the task into an agentic exploration process.

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Haoyuan Li, Zhengdong Hu, Jun Wang · 2026-06-05

Reveals that MLLM agents applying uniform tool-use strategies across diverse 3D scenes underperform — proposes evolving scene-aware skills where agents select tools according to specific scene and task characteristics.

OpenGlass: Open-Source Smart Glasses for On-Device Event-Based Gesture Recognition

Pietro Bonazzi, Julian Moosmann, Ahmet Celik · 2026-06-05

Open-source smart glasses platform for rapid prototyping with event-based vision and embedded ML at scale, using a modular FPC interposer design supporting both event-based and frame-based sensors within power and compute constraints.

Machine Learning (cs.LG)

Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

Fatema Siddika, Md Anwar Hossen, Tanwi Mallick · 2026-06-05

SETA framework resolves the plasticity-stability conflict in continual LLM learning through adaptive sparse subspace routing, distinguishing specific task knowledge from shared capabilities that existing methods treat uniformly.

Second-Order Path Kernel Interpolation Formulas in Machine Learning

Jin Guo, Roy Y. He, Jean-Michel Morel · 2026-06-05

Extends Domingos' 2020 first-order interpolation formula along optimization paths to second-order characterizations, valid for models trained with batch-based stochastic gradient descent.

Discovering Multiscale Deep Formulas in Complex Systems via Neural-Guided Lambda Calculus

Hanqiao Yu, Shusen Yang, Xuebin Ren · 2026-06-05

Deflex: end-to-end AI method extracting multiscale formulas with potentially different forms — invariants and distributions — from complex systems using neural-guided lambda calculus, going beyond single-scale symbolic regression.

Sparsely Gated Tiny Linear Experts

Simon Schug · 2026-06-05

Demonstrates that shrinking each expert to a single neuron and selecting a tiny fraction of many available neurons improves compute efficiency and interpretability — the key is removing the nonlinearity entirely from each expert.

A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning

Yuxiang Chen, Jun Wang · 2026-06-05

Exhaustively annotates 10,247 reasoning steps across all 30 AIME 2025 problems into five functional categories. Finds humans backtrace and reflect far more, while the model stacks inferences without revisiting earlier decisions.

Robotics (cs.RO)

Re-imagining ISO 26262 in the Age of Autonomous Vehicles

Chaitanya Shinde, Hadi Hajieghrary, Paul Schmitt · 2026-06-05

Decomposes the Controllability placeholder in ISO 26262 functional safety standard into two auditable evidence dimensions — Transferability and Predictability — to adapt human-driven vehicle safety principles for autonomous vehicles.

Statistical ML (stat.ML)

Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach

Lei Huang · 2026-06-05

CascadeNet recovers hidden influence networks behind dynamic cascades — product adoption, disease spread, financial distress — without requiring a specified diffusion model, using a debiased Jacobian-based ML framework.

This digest is generated automatically from arXiv submissions. Not affiliated with arXiv or Cornell University.