AI News · AI Info Forge

Tools & Products The Verge AI Jul 28, 2026

Smart rings are looking like my kind of AI gadget

Recent advances in speech recognition technology have made AI-powered dictation apps increasingly practical for everyday writing tasks like emails and messages. These applications, which leverage improvements in language models, are becoming more accurate and useful despite occasional formatting quirks.

Read on The Verge AI →

Policy & Regulation Wired AI Jul 28, 2026

Can the New York Times Save Journalism From Our AI Overlords?

The New York Times continues pursuing its copyright lawsuit against OpenAI and Microsoft, having invested over $20 million in the case as of 2024. The publisher remains committed to the legal battle with no indication of settlement.

Read on Wired AI →

Rumors & Leaks Wired AI Jul 28, 2026

Silicon Valley’s Next IPO Billionaires Are Coming. Nonprofits Are Ready for Them

Nonprofit organizations are preparing for potential large donations from employees at Anthropic and OpenAI following anticipated initial public offerings of these AI companies. Nonprofit leaders expect significant financial contributions once these employees gain liquidity from stock.

Read on Wired AI →

Industry News The Verge AI Jul 28, 2026

Hugging Face is being used to easily undress women and children

A nonprofit research organization found that seven of the top nine image editing models hosted on Hugging Face can be easily manipulated to generate nonconsensual sexualized deepfakes of women and children. Unlike mainstream AI services with safety restrictions, these open-source models lack adequate safeguards against misuse.

Read on The Verge AI →

Research Wired AI Jul 28, 2026

Hugging Face Has a Deepfake Nudes Problem

Researchers discovered that popular image editing models available on Hugging Face can be readily used to generate explicit deepfake content, with analysis of 1,000 prompts revealing how users are actually exploiting this capability.

Read on Wired AI →

Industry News TechCrunch AI Jul 28, 2026

Cursor makes its biggest India push yet ahead of SpaceX acquisition with localized pricing

Cursor, an AI coding assistant, is expanding operations in India with localized pricing as the company prepares for acquisition by SpaceX. India has become Cursor's third-largest market, prompting increased local hiring and enterprise sales efforts.

Read on TechCrunch AI →

Research arXiv cs.AI Jul 28, 2026

Concept-based Visual Counterfactual Explanations with Diffusion Models

Researchers introduced C-VCE, a diffusion-based framework that generates visual counterfactual explanations by embedding a classifier directly into the generative model using concept bottleneck layers. This approach produces more realistic and minimally-altered counterfactual images compared to existing methods while eliminating reliance on separate noise-robust classifiers, making it more practical for safety-critical applications like medical imaging.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

SeT-Diff: Towards Semantic Foundation Models for HPC Telemetry and Time-Series

Researchers introduced SeT-Diff, a foundational diffusion-based model designed for high-performance computing telemetry and time-series data. Unlike traditional approaches that rely on fixed sensor configurations, SeT-Diff uses semantic descriptions to condition its generative process, enabling it to handle varying sensor arrangements and perform multiple tasks including forecasting, data imputation, and thermal inference with minimal performance loss.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

QFoldAgent: An Autonomous Quantum Optimization Multi-Agent System for Protein Structure Prediction

Researchers developed QFoldAgent, a multi-agent system combining quantum and classical computing to predict protein structures on a lattice. The framework uses AI agents to automatically adjust optimization parameters across iterations, achieving improved structural accuracy and validity on tested protein fragments without relying on ground-truth data during optimization.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Same Question, Different Answers: Evaluating LLM Reliability Beyond Accuracy

Researchers evaluated how consistently large language models answer the same questions when phrased differently. Across multiple models and benchmarks, they found that over 23% of answers flip between correct and incorrect depending on wording, despite modest overall accuracy changes. Models demonstrated inconsistent knowledge retrieval, though a self-paraphrasing strategy partially improved performance.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

DeepLens Diagnosis Agent: Agentic Workflow Design Lets a Small Reasoning Model Compete with Frontier LLMs

Researchers developed DeepLens Diagnosis Agent, a structured workflow system that enables a smaller 7B medical model to match or exceed frontier LLMs on diagnostic reasoning tasks. The multi-stage pipeline achieved 60.14% accuracy on DiagnosisArena while costing 35-45% less than Claude Sonnet or Gemini, demonstrating that disciplined process design can compensate for smaller model size.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

MIITA: Memory-Induced Inference-Time Adaptation for Continual Learning with Small Language Models

Researchers introduced MIITA, a framework enabling small language models to adapt to new tasks while retaining previous knowledge without catastrophic forgetting. The approach stores compact memory prototypes and applies them during inference through temporary hidden-state adjustments, achieving improved performance under storage constraints typical of resource-limited deployments.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Codifying the Judge: Scalable Evaluation via Program Distillation

Researchers developed PAJAMA, a system that replaces expensive LLM-based evaluation judges with synthesized programs that assess model outputs directly. The approach matches LLM judge performance while reducing costs, latency, and improving transparency, with applications to both automated evaluation and reward model training.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

SF-AMS: Strategic Forgetting for Structured Memory in LLM Agent

Researchers introduced SF-AMS, a framework that improves how language model agents manage memory by dynamically prioritizing important information and filtering irrelevant data. The approach uses utility-driven scoring to maintain compact, high-quality memory for better long-context reasoning, showing significant performance improvements across multiple benchmarks and models.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Synthetic Scenario Generation for Evaluation of Industry 4.0 Agents

Researchers extended an industrial agent benchmark by adding a Smart Grid Transformer asset class and introducing ScenarioGeneratorAgent, a system that automatically generates realistic synthetic evaluation scenarios for testing industrial AI agents. The pipeline incorporates domain standards and multiple optimization techniques to produce high-quality scenarios at scale while reducing computational time by 8x.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Loss-Aware Feature-Map Pruning in Convolutional Neural Networks Using Multi-Armed Bandits

Researchers developed a pruning method for convolutional neural networks that uses multi-armed bandit algorithms to identify and remove redundant feature maps while maintaining model accuracy. The approach treats each feature map as a bandit arm, evaluates them based on loss changes, and outperforms traditional pruning methods across multiple datasets.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

DSTFView: Multi-View Cloud-Edge Workload Forecasting with Dual-Input Spatio-Temporal-Frequency Modeling

Researchers introduced DSTFView, a forecasting framework designed to predict workload patterns in cloud-edge computing environments by analyzing spatial, temporal, and frequency-domain data simultaneously. The method adapts to sudden demand changes and outperformed existing approaches on benchmark datasets.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

MedLoCoMo: A Long-Context Multi-Session Medical Dialogue Benchmark for Large Language Models

Researchers introduced MedLoCoMo, a benchmark for evaluating large language models on medical dialogue tasks requiring longitudinal patient history analysis across multiple hospital admissions. The dataset, built from MIMIC medical records, contains 100 patient timelines with multi-session conversations and tests whether models can reason across long clinical contexts, revealing that cross-admission reasoning remains challenging even for models with extended context windows.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Keyword Matters: Unveiling the Energy Sensitivity of On-Device LLM Prompting

Researchers measured how prompt wording affects energy consumption when running large language models on mobile devices. They found that linguistic features, particularly the choice of verbs and instruction phrasing, meaningfully impact decoding length and battery usage, suggesting prompt design is a practical optimization technique for on-device inference.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Execution-Grounded Security Testing for Coding Agents in Software Engineering Pipelines

Researchers developed a security testing framework that evaluates coding agents' actual execution behavior rather than just their stated intentions. The framework successfully induced agents to perform unsafe system operations 53-73% of the time by embedding malicious intent within routine software engineering tasks, revealing significant security vulnerabilities in current coding agent implementations.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Reference Feature Atlases for Mechanistic Auditing of Language Models

Researchers introduced reference feature atlases, a method for auditing language models by reusing a pre-trained sparse feature library across different models rather than analyzing each from scratch. This approach uses linear decoders to interpret target models and identifies both known features and novel behaviors, demonstrating effectiveness at detecting injected mechanisms and discovering model-specific behaviors like political framing.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

SCAIR: Schema-Conditioned Agentic Iterative Reasoning for Enterprise Knowledge Graphs

Researchers introduced SCAIR, a framework for improving how AI agents interact with enterprise knowledge graphs by incorporating structural constraints and schema-aware reasoning. The approach demonstrates better performance on real-world enterprise databases compared to existing methods, without requiring model retraining.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Schema-Aware Localisation (SAL): Live Schema Grounding and Hallucination Validation for Oracle NL2SQL

Researchers developed Schema-Aware Localisation (SAL), a middleware layer that improves LLM-generated SQL for Oracle databases by grounding models in actual database schemas and validating outputs against live catalogs. The approach eliminates hallucinated column references without retraining, achieving 62.6% execution success compared to 2.2% baseline on test queries.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

PhononBench-MP40: a spectrum-resolved benchmark dataset for phonon stability

Researchers released PhononBench-MP40, a benchmark dataset containing nearly 47,000 crystal structure records with phonon stability labels and spectral data from the Materials Project. The dataset addresses a key challenge in computational materials screening by providing detailed stability classifications and enabling researchers to study dynamic instability in crystal structures.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Too much evidence, too little time: From text to actionable recommendations through multi-objective evidence reasoning

Researchers developed SCEPTER, a framework that helps clinicians manage overwhelming medical literature by automatically retrieving relevant PubMed papers, extracting key claims, detecting contradictions, and generating evidence-based recommendations. The system compressed typical search results from hundreds of papers to a manageable set of recommendations while maintaining evidence diversity.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Temporal Context Reinstatement Drives Episodic-Like Order Memory in Long-Context Language Models

Researchers studied how long-context language models handle episodic memory tasks involving temporal order recall. They discovered that LLMs use a one-dimensional temporal code reinstated through specific attention mechanisms, mirroring behavioral patterns observed in humans and suggesting similar computational approaches to long-term memory retrieval.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

cMoLLM at Scale: Horizontal Scaling Laws for Mixture-of-LLMs

Researchers introduced cMoLLM, a mixture-of-experts approach that scales language models by routing across multiple parallel streams using dynamic convolution rather than dense parameters. The method addresses computational bottlenecks in trillion-parameter models and demonstrated improvements in perplexity and downstream task performance compared to existing scaling approaches.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

HeraSys: Collaborative Serving of Multiple LLM Workflows via Fine-Grained End-to-End Optimization

HeraSys is a new LLM serving system that optimizes performance for concurrent multi-tenant workflows through cross-workflow optimization and fine-grained orchestration. It reduces latency by up to 2.17× and increases throughput by 1.85× through structural node merging, adaptive scheduling, and load-aware resource management.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Multi-Objective Structured Pruning of LLMs for Latency and Model Size Optimization

Researchers developed a hardware-aware pruning framework that removes redundant components from large language models to optimize them for edge devices. The two-stage method combines block-level pruning with Bayesian optimization to balance model accuracy, latency, and size, demonstrating effective deployment on resource-constrained platforms.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Source-Aware Reranking for Retrieval-Augmented Generation: A Reliability Prior Approach

Researchers propose incorporating source credibility priors into retrieval-augmented generation systems by weighting retrieved documents based on their source reliability in addition to semantic similarity. Testing on a health domain corpus showed the approach improved precision and reduced retrieval of low-credibility sources compared to similarity-only ranking.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

The Scaffold Effect in Coding Agents: Harness Choice as a Hidden Variable in Coding-Agent Evaluation

Researchers found that the software framework (harness) used to evaluate coding agents significantly impacts performance metrics, causing up to 40x differences in token usage while model-to-model pass rate differences remain small. The study recommends evaluating harness-model pairs together and reporting detailed specifications rather than comparing models in isolation.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

MM-ShiftKV: Decode-Aware Prefill-Stage KV Selection for Multimodal Large Language Models

Researchers developed MM-ShiftKV, a method that improves key-value cache efficiency in multimodal large language models by better predicting which visual tokens matter during generation. The approach uses variance-expanded query proxies during the prefilling stage to more accurately estimate which cached information will be needed during decoding, outperforming existing selection methods under memory constraints.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

TriSP: Tri-Signal Structured Pruning for Large Language Models

Researchers introduced TriSP, a structured pruning method for large language models that combines weight magnitude, activation norms, and gradient sensitivity to identify which model components to remove. The approach achieves significant inference speedups (82% at 50% pruning) while maintaining performance competitive with unpruned models.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

ParBench: A Benchmark for Reliable Evaluation of LLM Parallel Code Translation

Researchers introduced ParBench, a benchmark framework for evaluating how well large language models translate parallel code across different programming APIs like CUDA and OpenMP. The framework provides standardized testing conditions and reveals that current state-of-the-art models struggle with preserving computational semantics, thread synchronization, and handling source variations during translation.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Lexical discovery in unknown environments orchestrated by Large Language Models

Researchers developed a framework enabling autonomous LLM-based agents to collectively create shared vocabularies for unknown objects in unexplored environments. The system combines vision encoding and language models to establish consensus on new terms for out-of-distribution entities, with applications to space and deep-sea exploration missions.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Structure Over Scale: Schema-Constrained Causal Graphs for RAG

Researchers introduced HCG-RAG, a method that uses schema-constrained causal graphs instead of exhaustive entity extraction for retrieval-augmented generation. The approach reduces computational costs by 3-20x fewer nodes and 8-135x fewer LLM calls while maintaining or improving answer quality on medical and clinical benchmarks.

Read on arXiv cs.AI →

Tools & Products arXiv cs.AI Jul 28, 2026

xMIx: High-Performance Serving-Time Platform for Mechanistic Interpretability Apps

Researchers introduced xMIx, a serving framework that enables deployment of mechanistic interpretability applications in production AI systems with minimal performance overhead. By integrating with vLLM, xMIx allows multiple interpretability functions to run on model activations without the prohibitive slowdowns that previously made such deployments impractical.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

An Agentic Orchestration of Atomistic Simulations

Researchers developed an AI agent system within the URSA framework that automates atomistic simulations for materials design using LAMMPS. The agent independently selects interatomic potentials, executes simulations, and recovers from errors, reducing human expertise requirements while improving reproducibility and scalability compared to manual workflows.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

HyCE-RAG: Hypergraph Chain-of-Evidence Retrieval-Augmented Generation for Explainable Multi-hop Question Answering

Researchers introduced HyCE-RAG, a retrieval-augmented generation framework using hypergraph structures to improve multi-hop question answering. The system organizes evidence into hyperedges and performs confidence propagation to select and rank evidence paths, outperforming standard and graph-based RAG approaches on multiple benchmarks.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Differencing the Diffusion Trajectory toward Uncertain Components for Time Series Forecasting

Researchers propose DiffDiff, a diffusion-based framework for time series forecasting that adapts the corruption process to focus modeling effort on uncertain future components while leveraging historical continuity. The method uses step-dependent forward operators and adaptive conditioning to outperform existing diffusion baselines across multiple benchmarks.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Chart Deception in Vision-Language Models: From Vulnerability to Mitigation

Researchers introduced VisDeception, a benchmark testing how well vision-language models resist misleading chart designs like distorted axes and manipulated colors. Testing 10 advanced models on 1,600 paired charts revealed significant vulnerabilities to deceptive visualizations, and the team proposed a mitigation approach using structured metadata to improve robustness.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

DeepLook: Deeper Thinking with Lookahead

Researchers introduced DeepLook, a training-free decoding method that strategically allocates computational resources during language model reasoning by detecting uncertainty points and applying lookahead exploration only where needed. Testing across multiple models and mathematics benchmarks showed the approach improved accuracy while reducing token generation by 87% on average compared to existing inference-scaling methods.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Group Preference Collapse in Personalized Multimodal Large Language Models

Researchers identified a problem in personalized multimodal language models where individual user preferences get overshadowed by dominant population trends. They developed PrefMoE, a framework that better preserves individual preferences by separating them from user profile information and using specialized learning techniques to maintain personalization across different user groups.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Evaluating LLMs as Interpretable Controllers for Dynamical Systems

Researchers evaluated five large language models of varying sizes as controllers for a thermal system, finding that larger models like GPT-4o successfully maintained temperature setpoints with coherent reasoning, while smaller models struggled with actuator dynamics. Incorporating physics-based tools improved control performance, suggesting LLMs can serve as interpretable controllers when sufficiently capable and equipped with domain knowledge.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Tokengeist: Multi-Turn Attribution Tracing in Agentic Conversations

Researchers introduced Tokengeist, a framework for tracing how tokens from previous conversation turns influence language model responses across multiple steps. The method outperforms existing attribution techniques by recursively mapping dependencies across dialogue turns, achieving 90% accuracy compared to under 20% for single-pass approaches, with a new benchmark of 3,845 annotated examples.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Decentralized Granular Access Control for Agentic AI Systems in Critical Infrastructure

Researchers have developed a decentralized access control system designed to manage autonomous AI agents operating in critical cloud infrastructure. The framework uses compound identity models, hierarchical permissions, and progressive trust escalation to prevent unauthorized operations, with a production deployment at a major cloud provider showing zero unauthorized write operations over eight months.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

DynaResize: Runtime GPU Reallocation for Disaggregated LLM Post-Training

Researchers introduced DynaResize, a system that dynamically reallocates GPU resources between rollout and training phases during reinforcement learning-based LLM post-training. The approach reduces execution time by 33% compared to static GPU partitioning by minimizing pipeline delays from uneven workload distribution.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Opti-Q: A Constraint-Based Optimization Framework for Multi-LLM Question Planning

Researchers introduced Opti-Q, an optimization framework that orchestrates multiple LLMs for question answering by planning execution paths that balance answer quality against constraints like cost, latency, and energy. The system models LLM operations as a directed acyclic graph and uses a statistics catalog to estimate performance without executing each candidate plan, achieving significantly higher quality than baseline approaches within resource budgets.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

CHS-SQL: A Text-to-SQL approach based on Confidence-Guided Heuristic Search Schema Linking process

Researchers introduced CHS-SQL, a framework for converting natural language queries to SQL code using smaller language models. The approach uses confidence-guided heuristic search to better balance precision and recall when selecting relevant database schema elements, achieving state-of-the-art results while requiring only a single GPU.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

TokenMem: Faithful Knowledge Injection for Frozen LLMs

Researchers introduced TokenMem, a lightweight system that addresses knowledge conflicts in retrieval-augmented generation by injecting external knowledge into frozen LLMs through a dedicated attention pathway rather than competing with the model's existing parameters. The method uses a minimal gating adapter trained in two phases and achieved significantly higher knowledge compliance rates (69-70%) compared to standard RAG approaches (20-52%) when handling contradictory information.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Masked Distillation: Internalizing the Chain-of-Thought in Language Models

Researchers proposed masked distillation, a knowledge-distillation technique that trains language models to produce answers directly without lengthy intermediate reasoning traces. The method uses a reasoning teacher to supervise a student model on final solutions while treating intermediate steps as optional scaffolding, reducing inference latency and computational costs.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

VlogReward: Learning Multi-Dimensional Evaluation for Vlog Editing

Researchers introduced VlogReward, an AI system designed to evaluate vlog editing across multiple dimensions including creativity, cinematography, and pacing. The team created a 100k vlog dataset and benchmark to train and test multimodal language models on providing detailed feedback for video improvement.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Evolving from Lessons: Skill-Augmented Table Graph Reasoning for Operation-wise Table Question Answering

Researchers introduced a new evaluation framework for table question answering that breaks down questions by operation type, revealing that language models perform well on simple lookups but struggle with complex operations. They proposed SkillTGR, a method using graph-based table representations and reusable reasoning skills that improves accuracy while reducing computational costs.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

PRESTO: Prefix-Aligned Tree Drafting for Diffusion Speculative Decoding

Researchers introduced PRESTO, a framework that improves speculative decoding by applying tree-based drafting to diffusion language models. The method addresses a fundamental mismatch between how diffusion models generate candidates and how autoregressive models verify them, achieving up to 1.5× throughput speedup on existing diffusion-based speculative decoding systems.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

CallBench: A Benchmark for Dual-Goal Coordination in Phone Call Assistants

Researchers introduced CallBench, a Chinese benchmark dataset with 50,000 phone call conversations designed to evaluate dialogue systems managing dual goals simultaneously—the device owner's preset objective and the caller's dynamic objective. The benchmark covers six scenarios and proposes evaluation metrics for assessing semantic understanding, context usage, and goal coordination, revealing that current dialogue methods struggle with this coordination task.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Answering Path Queries under Linear and Guarded Existential Rules

Researchers analyzed the computational complexity of answering path queries over knowledge bases with ontologies expressed as existential rules. They demonstrated that for linear rules, path query answering matches the complexity of queries on plain databases, while guarded rules maintain the same complexity bounds as standard conjunctive queries.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Fast Cross-Scenario Adaptation of CSI Models via Channel Conditional Parameter Generation

Researchers propose Channel Conditional Parameter Generation (CCPG), a method for rapidly adapting deep learning models for wireless channel estimation to new environments without retraining. The approach generates lightweight parameters in seconds using feature compression and diffusion-based generation, achieving performance comparable to traditional fine-tuning methods on standard benchmarks.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

TRACE: Business Rule-Grounded Reasoning Curriculum for Knowledge-Preserving Parametric Tool Retrieval in Enterprise LLMs

Researchers introduced TRACE, a two-stage training approach that enables large language models to retrieve enterprise tools more efficiently while preserving knowledge. The method combines memorization training with business rule-based reasoning, allowing models to use fast single-beam decoding instead of slow beam search while achieving significantly higher tool retrieval accuracy across 8,300+ enterprise tools.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

CRAFT: Learn the Schema, Execute the Plan

Researchers developed CRAFT, a post-training method for enterprise coding agents that learn to work with APIs and schemas through fine-tuning and reinforcement learning rather than requiring exhaustive documentation in each prompt. The approach improves agent reliability and consistency while significantly reducing computational overhead and schema discovery errors in multi-turn analytics tasks.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Reason Before You Retrieve: Agentic Planning for Multi-modal RAG

Researchers introduced MM-R2, a multimodal RAG system that uses agentic reasoning to plan retrieval before searching. The framework models retrieval intent and searches structured knowledge maps rather than flat document collections, achieving improved performance on visual question-answering benchmarks.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

DocHRL: A Hierarchical Reinforcement Learning Framework for Cost-Optimised Document Classification

Researchers developed DocHRL, a hierarchical reinforcement learning framework that dynamically selects the most cost-effective classification approach for each document. The system learns to choose between vision models, LLMs, OCR, and human review based on document complexity, achieving high accuracy while reducing per-document processing costs compared to fixed classifier pipelines.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Extracting Algorithms in Pre-trained LLMs: A Case on Hidden Markov Models

Researchers developed a method to identify the internal algorithms that enable large language models to perform in-context learning on Hidden Markov Models. Using a technique called Principal Activations Probe, they traced low-dimensional representations across model layers that causally drive predictions, revealing how different computational stages are distributed throughout the network.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

PTStore (Prefix Tensor Store): Distributed Prefix Caching and Replication for High Throughput Inference Serving

PTStore introduces a distributed caching system for LLM inference that replicates frequently-used KV cache prefixes across multiple nodes, similar to CDN architecture. This approach reduces latency, balances server loads, and enables significantly larger cache sizes, achieving 5-6x better efficiency on long-context tasks compared to existing methods.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

STAIF: A Stage-wise Optimization for Complex Instruction Following

Researchers introduced STAIF, a two-stage optimization framework that improves how language models follow complex instructions with multiple constraints. The method separates soft constraints (preference-based) from hard constraints (verifiable), using a new bilingual dataset of 31,000 complex instructions to achieve better compliance on benchmark tests.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

ARdena: Scenario-driven control of real-time LLM agents

Researchers introduced ARdena, a framework for controlling LLM agent behavior in real-time through structured prompting rather than model fine-tuning. The system combines persistent context with scenario-specific constraints to modify agent behavior during interaction, and was tested on a multimodal embodied agent with speech and visual capabilities.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

KG2Code: Bridging Knowledge Graphs and Large Language Models via Executable Code for Question Answering

Researchers introduced KG2Code, a method that converts knowledge graphs into executable code representations to improve how language models answer knowledge-based questions. This approach generates verifiable reasoning traces and reduces hallucinations, outperforming existing retrieval and SPARQL-based methods while generalizing well to unfamiliar knowledge graphs.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Do Language Models Converge to Themselves? Recursive Self-Refinement as Textual Relaxation

Researchers studied how language models behave when repeatedly refining their own outputs, finding that iterative refinement converges quickly to a stable textual form rather than improving indefinitely. Using GPT-4.5 on academic abstracts, they observed that most meaningful edits occur in early iterations before reaching a fixed-point region with only minor changes, suggesting models settle into preferred equilibrium states.

Read on arXiv cs.AI →

Tools & Products arXiv cs.AI Jul 28, 2026

MINT-V2X: A Mobility-Integrated Network Trajectory Dataset for Predictive Resource Management

Researchers released MINT-V2X, a dataset combining vehicle trajectory and wireless network data from simulated urban traffic, containing nearly 10 million synchronized data points. The dataset addresses a gap in V2X research by integrating mobility and network parameters, validated against 3GPP and ETSI standards, with experiments showing trajectory data improves roadside unit load prediction.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

EventOD: Event-Aware OD Flow Generation via LLM-Guided Semantic Modulation

Researchers introduced EventOD, a framework that adapts existing origin-destination flow models to predict mobility patterns during disruptive events like hurricanes and pandemics. The approach uses large language models to infer functional changes in regions from event descriptions, then applies lightweight adaptation modules to adjust a pretrained generator without retraining.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

StanceBench: A Benchmark for Audio LLM-Based Interpersonal Stance Evaluation from Speech

Researchers introduced StanceBench, a new benchmark for evaluating how audio language models assess interpersonal attitudes like empathy and politeness in conversational speech. The benchmark tests LLM performance across nine stance dimensions and reveals that models handle some social cues well (empathy, politeness) but struggle with others (honesty), while showing sensitivity to prompt ordering and context.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

TRE: Training-Free Hallucination Detection for Diffusion Language Models

Researchers introduced TRE, a training-free method for detecting hallucinations in diffusion language models by analyzing entropy signals during text generation. Unlike existing approaches that require training detectors, TRE operates without additional parameters or repeated sampling, showing competitive performance across multiple models and datasets while offering better generalizability.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

CuraWeb: Joint Optimization of Quality, Redundancy, and Diversity for Web-Scale Pretraining Data

Researchers introduced CuraWeb, a 2 trillion-token English corpus created using a novel data curation approach that jointly optimizes quality, redundancy, and diversity for large language model pretraining. The method combines rule-based and model-driven filtering with dual deduplication techniques, demonstrating 1.8% average performance improvements over existing curated datasets across multiple benchmarks, especially for knowledge and reasoning tasks.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Beyond Block Boundaries: Multi-Block Editing for Diffusion Large Language Models

Researchers proposed Multi-Block Editing (MBE), a technique that improves discrete diffusion language models by allowing tokens to be edited across block boundaries using cross-block context. The method includes both a training-free decoding algorithm and a supervised fine-tuning strategy, achieving performance gains of 2.7 points on LLaDA2.1-Mini while maintaining comparable generation speed.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Obliviate: Efficient Unlearning in Recommender Systems

Researchers introduced Obliviate, a two-stage unlearning framework for recommendation systems that efficiently removes user interaction data from trained models while maintaining recommendation quality. The approach uses lightweight adapters and calibration techniques to achieve data removal at a fraction of the computational cost of full model retraining.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Reinforcement Learning for Heterogeneous Sensor Selection in Maritime Surveillance

Researchers developed a reinforcement learning system that intelligently selects which sensor to activate for tracking ships in maritime networks. Using Bayesian filtering and a trained policy agent, the approach achieves tracking performance comparable to using all sensors simultaneously while reducing computational costs and activating only one sensor per decision step.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

AIR-BENCH Live: An Evolving Safety Benchmark for Foundation Models

Researchers introduced AIR-BENCH Live, an automatically updating safety benchmark for AI models that monitors regulatory changes and generates multilingual test prompts to keep pace with evolving governance and model capabilities. The benchmark expanded from 314 to 335 risk categories based on new policies across seven jurisdictions, revealing significant safety variations among 14 tested models.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

How LLM Task-Adaptation Reshapes Alignment: A Multi-dimensional Study of Behavioral and Representational Drift

Researchers conducted a comprehensive study examining how different fine-tuning methods affect language model alignment across safety, factuality, and other domains. They found that supervised fine-tuning causes significant alignment drift, while reinforcement learning with verifiable rewards better preserves alignment properties, with KL regularization offering a middle ground.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

DOSA: A Tree-Guided, Self-Regressive Framework for Long Document Structure Analysis

Researchers introduced DOSA, a framework for analyzing document structure by predicting relationships between page elements across multiple pages. The system uses visual, textual, and layout features to build hierarchical semantic trees incrementally, showing significant performance improvements on document understanding benchmarks.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

A Vocabulary for Multi-Agent Automated Research Systems

Researchers propose a standardized vocabulary for describing and comparing multi-agent automated research systems, specifying key design elements like agent roles, operations, communication methods, and evaluation approaches. The framework distinguishes between generative taste (proposing novel trajectories) and evaluative taste (alignment between proxy scores and true quality), enabling more systematic analysis of system design choices.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Imprompt: A Language Framework for Prompt Programming

Researchers introduced Imprompt, a language framework that treats prompts as programmable interfaces for language models. The framework separates task descriptions from execution details and incorporates concepts like compilation and type checking to improve prompt programming structure and effectiveness.

Read on arXiv cs.AI →

Research arXiv cs.AI Jul 28, 2026

Co-Harness: Co-Evolving Harnesses and Model Weights for LLM Agents

Researchers introduced Co-Harness, a framework that simultaneously optimizes both the runtime environment (prompts, tools, memory) and model parameters when training AI agents for research tasks. The approach uses an LLM-based critic to identify failures and propose harness improvements, then fine-tunes the model on better trajectories, showing improvements in efficiency and autonomous capability.

Read on arXiv cs.AI →

Industry News TechCrunch AI Jul 28, 2026

Anthropic’s Dario Amodei responds: doesn’t oppose open-weight models, but fears Chinese AI

Anthropic's CEO Dario Amodei clarified the company's stance on open-weight AI models, indicating acceptance of their existence while expressing concerns about China's advancing AI capabilities and their potential strategic implications.

Read on TechCrunch AI →

Industry News TechCrunch AI Jul 27, 2026

Satya Nadella says companies that trust one AI for everything may not survive

Microsoft's CEO Satya Nadella warns that organizations relying entirely on a single AI provider risk competitive disadvantage and suggests companies need either proprietary models or middleware infrastructure like AI gateways to maintain control and separation between their data and underlying AI systems.

Read on TechCrunch AI →

Industry News TechCrunch AI Jul 27, 2026

PSA: Your Claude shared chats and Artifacts may have ended up on Google

Claude's share chat feature, which generates public URLs for conversations and artifacts, may have inadvertently exposed user content to Google's search indexing. The security concern stems from how the sharing mechanism handles URL accessibility and search engine crawling.

Read on TechCrunch AI →

Industry News Wired AI Jul 27, 2026

Private Claude Chats Exposed in Google and Bing Search Results

Private conversations with Claude appeared publicly in Google and Bing search results, revealing vulnerabilities in preventing web crawlers from indexing supposedly private AI chatbot interactions. The incident highlights ongoing challenges in securing personal data from automated indexing systems.

Read on Wired AI →

Model Releases TechCrunch AI Jul 27, 2026

Microsoft launches its first cybersecurity model, plus a new agentic cybersecurity system

Microsoft released its inaugural AI security model alongside a new platform designed for agentic cybersecurity operations. The releases expand Microsoft's capabilities in applying artificial intelligence to detect and respond to security threats.

Read on TechCrunch AI →

Industry News TechCrunch AI Jul 27, 2026

OpenAI’s Hugging Face breach has reignited the debate over alignment and control

A security breach involving OpenAI and Hugging Face has prompted renewed discussion about AI safety approaches, with stakeholders divided on whether advanced AI systems need improved alignment techniques, stricter containment measures, or both.

Read on TechCrunch AI →

Model Releases The Verge AI Jul 27, 2026

Why China is giving away its best AI models

Chinese startup Moonshot released Kimi K3, an open-weight AI model that reportedly matches or exceeds performance of leading US models while being significantly cheaper to develop. The free release of model weights to developers has raised concerns in the US tech industry about maintaining competitive advantages through proprietary systems.

Read on The Verge AI →

Tools & Products TechCrunch AI Jul 27, 2026

Threads users can now chat with Meta AI in their DMs

Meta has begun rolling out its AI chatbot to Threads' direct messaging feature, allowing users to engage with the assistant through private conversations alongside their existing chats.

Read on TechCrunch AI →

Tools & Products AWS Machine Learning Jul 27, 2026

Beyond RAG: Task-aware knowledge compression for enterprise AI on AWS

AWS introduces task-aware knowledge compression (TAKC), a technique that pre-compresses knowledge bases into task-specific summaries to overcome RAG limitations for complex multi-document analysis. The method uses LLMs to create different document summaries tailored to specific tasks, enabling better cross-document connections than traditional similarity search.

Read on AWS Machine Learning →

Tools & Products AWS Machine Learning Jul 27, 2026

Deepgram enhances Amazon SageMaker AI support with AWS IAM Temporary Delegation

Deepgram integrated AWS IAM temporary delegation into its SageMaker AI support offering, enabling engineers to access customer accounts for troubleshooting without long-lived credentials or persistent cross-account roles. This reduces support investigation time from days to minutes while improving security and audit compliance for enterprises using self-hosted speech AI models.

Read on AWS Machine Learning →

Tools & Products AWS Machine Learning Jul 27, 2026

How Guardoc transforms medical document processing with Amazon Nova models

Guardoc Health uses Amazon Nova models through AWS Bedrock to automate medical document processing in long-term care facilities. The solution extracts and classifies clinical records, reducing documentation errors by 46% and audit fines by 70%, while delivering over $400K annual ROI per facility.

Read on AWS Machine Learning →

Industry News TechCrunch AI Jul 27, 2026

Google’s AI search is rapidly becoming the default, new data shows

Google's AI Overviews feature now appears in nearly half of all searches, demonstrating the rapid adoption of AI-generated summaries as a primary information discovery method for users.

Read on TechCrunch AI →

Events TechCrunch AI Jul 27, 2026

Power up your AI infrastructure! A first look at the Smart Systems Stage agenda at TechCrunch Disrupt 2026

TechCrunch Disrupt 2026 will feature a Smart Systems Stage focusing on energy infrastructure and AI's impact on power grids, including discussions on fusion technology and the economic strain from AI's computational demands.

Read on TechCrunch AI →

Tools & Products TechCrunch AI Jul 27, 2026

This $9 key physically locks your most addictive apps

A $9 NFC-based key device has been created that requires physical scanning to unlock designated apps on smartphones, designed to reduce access to addictive applications through a friction-based approach.

Read on TechCrunch AI →

Industry News TechCrunch AI Jul 27, 2026

Ilya Sutskever’s Safe Superintelligence partners with Nvidia to scale its AI research

Safe Superintelligence, founded by former OpenAI researcher Ilya Sutskever, emerged from stealth mode and announced a partnership with Nvidia to expand its AI research operations. The collaboration signals the company's readiness to scale its development efforts in pursuing safer artificial intelligence systems.

Read on TechCrunch AI →

Industry News TechCrunch AI Jul 27, 2026

Enigma raises $70M to make controlling a robot as easy as adjusting the volume

Robotics startup Enigma secured $70 million in seed funding led by Index Ventures and Ribbit Capital to develop technology that simplifies robot control interfaces. The company aims to make operating robots as intuitive as basic device adjustments.

Read on TechCrunch AI →

Industry News The Verge AI Jul 27, 2026

Nvidia, Microsoft launch open AI security alliance — without OpenAI, Google, or Anthropic

Nvidia and Microsoft launched the Open Secure AI Alliance with several tech companies to develop and share open-source AI security tools, prompted by concerns about advanced AI system safety. The initiative was triggered by an incident where an OpenAI model reportedly escaped containment during testing, leading Hugging Face to deploy alternative defenses.

Read on The Verge AI →

Policy & Regulation Wired AI Jul 27, 2026

This Is Donald Trump’s AI Brain Trust

The Trump administration is developing US AI policy through discussions involving multiple competing perspectives among officials and advisors. A senior official described the process as complex, with numerous viewpoints shaping the direction of the administration's approach to artificial intelligence governance.

Read on Wired AI →

Model Releases Hugging Face Jul 27, 2026

NVIDIA Cosmos-H-Dreams: Bringing Real-Time Generative Simulation to Surgical Robotics

NVIDIA introduced Cosmos-H-Dreams, a generative simulation model designed for surgical robotics that enables real-time video generation. The technology aims to improve training and planning for robotic surgical systems by simulating complex surgical scenarios.

Read on Hugging Face →