Weekly Reads Archive

Week of July 6, 2026

Papers

WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory — Hanlin Wang, Hao Ouyang et al.
We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoin...
Alignment Is All You Need For X-to-4D Generation — Qiaowei Miao, Kehan Li et al.
Generative diffusion models excel at synthesizing high-quality images, videos, and 3D content under multimodal control. However, arbitrary user-def...
PointDiT: Pixel-Space Diffusion for Monocular Geometry Estimation — Haofei Xu, Rundi Wu et al.
State-of-the-art single-image 3D reconstruction methods often rely on complex hybrid architectures and loss functions, or compress geometry into la...
Distributed Attacks in Persistent-State AI Control — Josh Hills, Ida Caspary, Asa Cooper Stickland
As AI coding agents become more autonomous, they increasingly ship code iteratively, with the codebase persisting across sessions. This persistence...
LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning — Matteo Boglioni, Thibault Rousset et al.
LLMs memorize sensitive training data, including personally identifiable information (PII), creating a pressing need for reliable post hoc removal ...

Blog Posts

🤗 Kernels: Major Updates — Hugging Face Blog
Hugging Face and Cerebras bring Gemma 4 to real-time voice AI — Hugging Face Blog
ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration — Hugging Face Blog

Week of June 29, 2026

Papers

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand — Dihong Huang, Zhenyu Wei et al.
Dexterous manipulation policies can solve individual skills, but composing them to perform multiple tasks with a single hand remains challenging. A...
PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception — Yana Wei, Hongbo Peng et al.
We introduce PerceptionRubrics, a rubric-based evaluation framework that addresses the gap between saturated benchmark scores and real-world brittl...
StructSplat: Generalizable 3D Gaussian Splatting from Uncalibrated Sparse Views — Jia-Chen Zhao, Beiqi Chen et al.
We present StructSplat, a feed-forward and generalizable 3D Gaussian reconstruction framework that operates directly on uncalibrated images without...
Surprises in Proper Positive-Only Learning — Shai Ben-David, Farnam Mansouri et al.
Binary classification from positive-only samples is a variant of PAC learning in which the learner receives i.i.d. samples from the positive region...
Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes — Luis Leal
Many two-player zero-sum games admit not a unique Nash equilibrium but a convex set of them: a polytope of profiles that all share the minimax valu...

Blog Posts

Mapping Europe’s AI Workforce Opportunity — OpenAI Blog
A new OpenAI report maps how AI could reshape jobs across the EU, highlighting which occupations may face automation, growth, or workflow changes.
HP Inc. launches Frontier strategic partnership with OpenAI — OpenAI Blog
HP Inc. scales its OpenAI Frontier partnership to deploy AI across customer experiences, software development, and enterprise operations.
Previewing GPT-5.6 Sol: a next-generation model — OpenAI Blog
OpenAI previews GPT-5.6 Sol, a next-generation model with stronger capabilities in coding, science, and cybersecurity, paired with its most advance...

Week of June 22, 2026

Papers

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising — Siang-Ling Zhang, Huai-Hsun Cheng et al.
Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough ch...
TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living — Arkaprava Sinha, Dominick Reilly et al.
Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches e...
How Transparent is DiffusionGemma? — Joshua Engels, Callum McDougall et al.
LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising...
UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning — Wenhao Chi, Arkaprava Sinha et al.
Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single...
Optimal Deterministic Multicalibration and Omniprediction — Georgy Noarov, Aaron Roth
A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just...

Blog Posts

Samsung Electronics brings ChatGPT and Codex to employees — OpenAI Blog
Samsung Electronics deploys ChatGPT Enterprise and Codex to employees worldwide, marking one of OpenAI’s largest enterprise AI rollouts.
MosaicLeaks: Can your research agent keep a secret? — Hugging Face Blog
New usage analytics and updated spend controls for enterprises — OpenAI Blog
OpenAI introduces new spend controls and usage analytics for ChatGPT Enterprise, helping organizations manage costs and scale AI with confidence.

Week of June 15, 2026

Papers

Gaze Heads: How VLMs Look at What They Describe — Rohit Gandikota, David Bau
How a vision-language model internally solves the task of describing an image is far from obvious. We find that the model develops a specific mecha...
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains — Xinyue Cai, Chaoyou Fu et al.
Current automated pipelines for audio-visual Question Answering (QA) generally adopt a ``video-caption-QA'' paradigm. However, these methods typica...
RATS! Patches Talk Through Registers: Emergent Parts in Register Attention Transformers — Timing Yang, Predrag Neskovic et al.
When humans see a bird, they recognize far more than just "bird" -- they see a head, wings, and talons, a structured assembly of reusable parts tha...
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space — Xichen Pan, Aashu Singh et al.
Large language models (LLMs) are widely used in text-to-image (T2I) systems, but they are typically limited to text encoding, while denoising is ha...
Instruct-Particulate: Scaling Feed-Forward 3D Object Articulation with Kinematic Control — Ruining Li, Yuxin Yao et al.
Reconstructing articulated 3D objects is important for animation, gaming, and robotic simulations. Recent neural networks can estimate the articula...

Blog Posts

Introducing the OpenAI Partner Network — OpenAI Blog
OpenAI launches the Partner Network, investing $150M to help global partners accelerate enterprise AI adoption, deployment, and transformation.
olmo-eval: An evaluation workbench for the model development loop — Hugging Face Blog
New OpenAI Academy courses for the next era of work — OpenAI Blog
OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.

Week of June 8, 2026

Papers

How reliable are LLMs when it comes to playing dice? — Luca Avena, Gianmarco Bet, Bernardo Busoni
We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability pr...
UniSHARP: Universal Sharp Monocular View Synthesis — Meixi Song, Dizhe Zhang et al.
In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum o...
Agentopia: Long-Term Life Simulation and Learning in Agent Societies — Xintao Wang, Sirui Zheng et al.
Humans learn from social life. Simulating this process with LLM-powered agents represents a promising research direction, raising a natural questio...
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism — Cong Chen, Guo Gan et al.
Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion ...
Streaming Video Generation with Streaming Force Control — Hanhui Wang, Yiming Xie et al.
We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike p...

Blog Posts

Amazing Digital Dentures (a failed project) — Hugging Face Blog
Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI — Hugging Face Blog
EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios — Hugging Face Blog

Week of June 1, 2026

Blog Posts

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action — Hugging Face Blog
Boston Children’s uses AI to unlock new diagnoses — OpenAI Blog
Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease c...
How Braintrust turns customer requests into code with Codex — OpenAI Blog
How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.

Week of May 25, 2026

Blog Posts

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models — Hugging Face Blog
Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook — Hugging Face Blog
OpenAI named a Leader in enterprise coding agents by Gartner — OpenAI Blog
OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-sc...

Week of May 18, 2026

Blog Posts

OpenAI and Malta partner to bring ChatGPT Plus to all citizens — OpenAI Blog
OpenAI and Malta partner to expand AI access, offering ChatGPT Plus and training to help citizens build practical AI skills and use AI responsibly.
A new personal finance experience in ChatGPT — OpenAI Blog
Preview a new personal finance experience in ChatGPT for Pro users in the U.S. Securely connect your financial accounts and get AI-powered insights...
How data science teams use Codex — OpenAI Blog
See how data science teams can use Codex to build root-cause briefs, impact readouts, KPI memos, scoped analyses, and dashboard specs from real wor...

Week of May 11, 2026

Papers

123D: Unifying Multi-Modal Autonomous Driving Data at Scale — Daniel Dauner, Valentin Charraut et al.
The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity rema...
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling — Tong Zheng, Haolin Liu et al.
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during...
Normalizing Trajectory Models — Jiatao Gu, Tianrong Chen et al.
Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed ...
Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration — Shuhang Lin, Chuhao Zhou et al.
Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide...
Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping — Maryam Maghsoudi, Shihab Shamma
Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally a...

Blog Posts

OpenAI Campus Network: Student club interest form — OpenAI Blog
Join the OpenAI Campus Network—connect student clubs worldwide, access AI tools, host events, and build an AI-powered campus community.
How enterprises are scaling AI — OpenAI Blog
How enterprises scale AI: from early experiments to compounding impact through trust, governance, workflow design, and quality at scale.
MachinaCheck: Building a Multi-Agent CNC Manufacturability System on AMD MI300X — Hugging Face Blog

Week of May 4, 2026

Papers

Posterior Augmented Flow Matching — George Stoica, Sayak Paul et al.
Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for hi...
HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs — Jinpai Zhao, Nishant Panda et al.
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned cl...
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models — Sailesh Panda, Pritam Kadasi et al.
Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they f...
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs — Siyuan Huang, Xiaoye Qu et al.
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Diluti...
Let ViT Speak: Generative Language-Image Pre-training — Yan Fang, Mengcheng Lan et al.
In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining ...

Blog Posts

Introducing Advanced Account Security — OpenAI Blog
Introducing Advanced Account Security: phishing-resistant login, stronger recovery, and enhanced protections to safeguard sensitive data and preven...
Where the goblins came from — OpenAI Blog
How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.
AI evals are becoming the new compute bottleneck — Hugging Face Blog

Week of April 27, 2026

Papers

Recursive Multi-Agent Systems — Xiyuan Yang, Jiaru Zou et al.
Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent stat...
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios — Jinxiang Meng, Shaoping Huang et al.
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing...
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum — Chu-Cheng Lin, Eugene Ie
Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable ...
A paradox of AI fluency — Christopher Potts, Moritz Sudhof
How much does a user's skill with AI shape what AI actually delivers for them? This question is critical for users, AI product builders, and societ...
Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics — Andre Herz, Daniel Durstewitz, Georgia Koppe
Identity teacher forcing (ITF) enables stable training of deterministic recurrent surrogates for chaotic dynamical systems and has been highly effe...

Blog Posts

AI evals are becoming the new compute bottleneck — Hugging Face Blog
Granite 4.1 LLMs: How They’re Built — Hugging Face Blog
Building the compute infrastructure for the Intelligence Age — OpenAI Blog
OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.

Week of April 20, 2026

Papers

Repurposing 3D Generative Model for Autoregressive Layout Generation — Haoran Feng, Yifan Niu et al.
We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts ...
FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation — Dian Shao, Zhengzheng Xu et al.
UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous...
Phase transitions in Doi-Onsager, Noisy Transformer, and other multimodal models — Kyunghoo Mun, Matthew Rosenzweig
We study phase transitions for repulsive-attractive mean-field free energies on the circle. For a $\frac{1}{n+1}$-periodic interaction whose Fourie...
ASMR-Bench: Auditing for Sabotage in ML Research — Eric Gan, Aryan Bhatt et al.
As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading resul...
Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan — Shivarth Rai, Tejeswar Pokuri
Atmospheric haze significantly degrades wildlife imagery, impeding computer vision applications critical for conservation, such as animal detection...

Blog Posts

Building a Fast Multilingual OCR Model with Synthetic Data — Hugging Face Blog
Codex for (almost) everything — OpenAI Blog
The updated Codex app for macOS and Windows adds computer use, in-app browsing, image generation, memory, and plugins to accelerate developer workf...
Introducing GPT-Rosalind for life sciences research — OpenAI Blog
OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific...

Week of April 13, 2026

Papers

Tango: Taming Visual Signals for Efficient Video Large Language Models — Shukang Yin, Sirui Zhao et al.
Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models (Video LLMs). This work revisits and advanc...
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism — Hadas Orgad, Boyi Wei et al.
Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routine...
ANTIC: Adaptive Neural Temporal In-situ Compressor — Sandeep S. Cranganore, Andrei Bodnar et al.
The persistent storage requirements for high-resolution, spatiotemporally evolving fields governed by large-scale and high-dimensional partial diff...
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision — Soroosh Tayebi Arasteh, Mehdi Joodaki et al.
Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the p...
EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks — Lulin Liu, Dayou Li et al.
Large foundation models have made significant advances in embodied intelligence, enabling synthesis and reasoning over egocentric input for househo...

Blog Posts

Research with ChatGPT — OpenAI Blog
Learn how to research with ChatGPT using search and deep research to find up-to-date information, analyze sources, and generate structured insights.
ChatGPT for finance teams — OpenAI Blog
Learn how finance teams use ChatGPT to streamline reporting, analyze data, improve forecasts, and communicate insights more clearly.
ChatGPT for research — OpenAI Blog
Learn how to use ChatGPT for research to gather sources, analyze information, and create structured, citation-backed insights.

Week of April 6, 2026

Papers

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning — Ankan Deria, Komal Kumar et al.
Recent vision-language models (VLMs) typically rely on a single vision encoder trained with contrastive image-text objectives, such as CLIP-style p...
Enhancing Robustness of Federated Learning via Server Learning — Van Sy Mai, Kushal Chakrabarti et al.
This paper explores the use of server learning for enhancing the robustness of federated learning against malicious attacks even when clients' trai...
VOSR: A Vision-Only Generative Model for Image Super-Resolution — Rongyuan Wu, Lingchen Sun et al.
Most of the recent generative image super-resolution (SR) methods rely on adapting large text-to-image (T2I) diffusion models pretrained on web-sca...
HyperCT: Low-Rank Hypernet for Unified Chest CT Analysis — Fengbei Liu, Sunwoo Kwak et al.
Non-contrast chest CTs offer a rich opportunity for both conventional pulmonary and opportunistic extra-pulmonary screening. While Multi-Task Learn...
Power one sequential tests exist for weakly compact $\mathscr P$ against $\mathscr P^c$ — Ashwin Ram, Aaditya Ramdas
Suppose we observe data from a distribution $P$ and we wish to test the composite null hypothesis that $P\in\mathscr P$ against a composite alterna...

Blog Posts

OpenAI acquires TBPN — OpenAI Blog
OpenAI acquires TBPN to accelerate global conversations around AI and support independent media, expanding dialogue with builders, businesses, and ...
Codex now offers more flexible pricing for teams — OpenAI Blog
Codex now includes pay-as-you-go pricing for ChatGPT Business and Enterprise, providing teams a more flexible option to start and scale adoption.
Welcome Gemma 4: Frontier multimodal intelligence on device — Hugging Face Blog

Week of March 30, 2026

Papers

Detailed Geometry and Appearance from Opportunistic Motion — Ryosuke Hirai, Kohei Yamashita et al.
Reconstructing 3D geometry and appearance from a sparse set of fixed cameras is a foundational task with broad applications, yet it remains fundame...
Learning to Commit: Generating Organic Pull Requests via Online Repository Memory — Mo Li, L. H. Xu et al.
Large language model (LLM)-based coding agents achieve impressive results on controlled benchmarks yet routinely produce pull requests that real ma...
Weight Tying Biases Token Embeddings Towards the Output Space — Antonio Lopardo, Avyukth Harish et al.
Weight tying, i.e. sharing parameters between input and output embedding matrices, is common practice in language model design, yet its impact on t...
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation — Nicolas von Lützow, Barbara Rössle et al.
Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternati...
Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning — Xinqi, Liu et al.
Lack of accessible and dexterous robot hardware has been a significant bottleneck to achieving human-level dexterity in robots. Last year, we relea...

Blog Posts

Helping disaster response teams turn AI into action across Asia — OpenAI Blog
AI for Disaster Response in Asia: OpenAI Workshop with Gates Foundation
STADLER reshapes knowledge work at a 230-year-old company — OpenAI Blog
Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.
Liberate your OpenClaw — Hugging Face Blog

Week of March 23, 2026

Papers

MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints — Yu Qi, Xinyi Xu et al.
Video generative models show emerging reasoning behaviors. It is essential to ensure that generated events remain causally consistent across frames...
From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering — Xinyi Shang, Yi Tang et al.
Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask ...
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation — Jiazheng Xing, Fei Du et al.
Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained ...
Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation — Sebastian Gerard, Josephine Sullivan
Many segmentation tasks, such as medical image segmentation or future state prediction, are inherently ambiguous, meaning that multiple predictions...
CoVR-R:Reason-Aware Composed Video Retrieval — Omkar Thawakar, Dmitry Demidov et al.
Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification...

Blog Posts

Build a Domain-Specific Embedding Model in Under a Day — Hugging Face Blog
What's New in Mellea 0.4.0 + Granite Libraries Release — Hugging Face Blog
How we monitor internal coding agents for misalignment — OpenAI Blog
How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and st...

Week of March 16, 2026

Papers

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization — Yangsong Zhang, Anujith Muraleedharan et al.
Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. B...
Representation Learning for Spatiotemporal Physical Systems — Helen Qu, Rudy Morel et al.
Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accura...
Visual-ERM: Reward Modeling for Visual Equivalence — Ziyu Liu, Shengyuan Ding et al.
Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured repres...
Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models — Ziqi Ma, Mengzhan Liufu, Georgia Gkioxari
Evolutions in the world, such as water pouring or ice melting, happen regardless of being observed. Video world models generate "worlds" via 2D fra...
Neuron-Aware Data Selection In Instruction Tuning For Large Language Models — Xin Chen, Junchao Wu et al.
Instruction Tuning (IT) has been proven to be an effective approach to unlock the powerful capabilities of large language models (LLMs). Recent stu...

Blog Posts

Beyond Semantic Similarity: Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline — Hugging Face Blog
Rakuten fixes issues twice as fast with Codex — OpenAI Blog
Rakuten uses Codex, the coding agent from OpenAI, to ship software faster and safer, reducing MTTR 50%, automating CI/CD reviews, and delivering fu...
Designing AI agents to resist prompt injection — OpenAI Blog
How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.

Week of March 9, 2026

Papers

Multimodal Large Language Models as Image Classifiers — Nikita Kisel, Illia Volkov et al.
Multimodal Large Language Models (MLLM) classification performance depends critically on evaluation protocol and ground truth quality. Studies comp...
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion — Lijiang Li, Zuwei Long et al.
While recent multimodal large language models (MLLMs) have made impressive strides, they predominantly employ a conventional autoregressive archite...
BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations — Thomas Monninger, Shaoyuan Xie et al.
The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic unde...
Fly360: Omnidirectional Obstacle Avoidance within Drone View — Xiangkai Zhang, Dizhe Zhang et al.
Obstacle avoidance in unmanned aerial vehicles (UAVs), as a fundamental capability, has gained increasing attention with the growing focus on spati...
SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation — Vishal Thengane, Zhaochong An et al.
Incremental Few-Shot (IFS) segmentation aims to learn new categories over time from only a few annotations. Although widely studied in 2D, it remai...

Blog Posts

How Descript enables multilingual video dubbing at scale — OpenAI Blog
Descript uses OpenAI models to scale multilingual video dubbing, optimizing translations for both meaning and timing so dubbed speech sounds natura...
Codex Security: now in research preview — OpenAI Blog
Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher...
How Balyasny Asset Management built an AI research engine for investing — OpenAI Blog
See how Balyasny built an AI research system with GPT-5.4, rigorous model evaluation, and agent workflows to transform investment analysis at scale.

Week of March 2, 2026

Papers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters — Snell et al.
Shows that adaptively allocating test-time compute can outperform 14x larger models.
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — DeepSeek-AI
Demonstrates how RL-based training produces strong reasoning without supervised fine-tuning.
Utonia: Toward One Encoder for All Point Clouds — Yujia Zhang, Xiaoyang Wu et al.
We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we pre...
MIBURI: Towards Expressive Interactive Gesture Synthesis — M. Hamza Mughal, Rishabh Dabral et al.
Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large...
CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance — Hanyang Wang, Yiyang Liu et al.
Classifier-Free Guidance (CFG) has emerged as a central approach for enhancing semantic alignment in flow-based diffusion models. In this paper, we...
How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference — Toru Lin, Shuying Deng et al.
Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks ar...
ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation — Xialin He, Sirui Xu et al.
Achieving autonomous and versatile whole-body loco-manipulation remains a central barrier to making humanoids practically useful. Yet existing appr...

Blog Posts

A Visual Guide to Quantization — Maarten Grootendorst
Intuitive visual walkthrough of LLM quantization techniques from FP16 to GPTQ and GGUF.
Inference at the Edge: Practical Lessons from Deploying LLMs on Consumer GPUs — vLLM Blog
Real-world tips for efficient local LLM inference with vLLM on consumer hardware.
Understanding AI and learning outcomes — OpenAI Blog
OpenAI introduces the Learning Outcomes Measurement Suite to assess AI's impact on student learning across diverse educational environments over time.
PRX Part 3 — Training a Text-to-Image Model in 24h! — Hugging Face Blog
GPT-5.3 Instant System Card — OpenAI Blog