Week of April 13, 2026
Papers
-
Tango: Taming Visual Signals for Efficient Video Large Language Models
Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models (Video LLMs). This work revisits and advanc...
-
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routine...
-
ANTIC: Adaptive Neural Temporal In-situ Compressor
The persistent storage requirements for high-resolution, spatiotemporally evolving fields governed by large-scale and high-dimensional partial diff...
-
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the p...
-
EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks
Large foundation models have made significant advances in embodied intelligence, enabling synthesis and reasoning over egocentric input for househo...
Blog Posts
-
Research with ChatGPT
Learn how to research with ChatGPT using search and deep research to find up-to-date information, analyze sources, and generate structured insights.
-
ChatGPT for finance teams
Learn how finance teams use ChatGPT to streamline reporting, analyze data, improve forecasts, and communicate insights more clearly.
-
ChatGPT for research
Learn how to use ChatGPT for research to gather sources, analyze information, and create structured, citation-backed insights.
Week of April 6, 2026
Papers
-
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning
Recent vision-language models (VLMs) typically rely on a single vision encoder trained with contrastive image-text objectives, such as CLIP-style p...
-
Enhancing Robustness of Federated Learning via Server Learning
This paper explores the use of server learning for enhancing the robustness of federated learning against malicious attacks even when clients' trai...
-
VOSR: A Vision-Only Generative Model for Image Super-Resolution
Most of the recent generative image super-resolution (SR) methods rely on adapting large text-to-image (T2I) diffusion models pretrained on web-sca...
-
HyperCT: Low-Rank Hypernet for Unified Chest CT Analysis
Non-contrast chest CTs offer a rich opportunity for both conventional pulmonary and opportunistic extra-pulmonary screening. While Multi-Task Learn...
-
Power one sequential tests exist for weakly compact $\mathscr P$ against $\mathscr P^c$
Suppose we observe data from a distribution $P$ and we wish to test the composite null hypothesis that $P\in\mathscr P$ against a composite alterna...
Blog Posts
-
OpenAI acquires TBPN
OpenAI acquires TBPN to accelerate global conversations around AI and support independent media, expanding dialogue with builders, businesses, and ...
-
Codex now offers more flexible pricing for teams
Codex now includes pay-as-you-go pricing for ChatGPT Business and Enterprise, providing teams a more flexible option to start and scale adoption.
- Welcome Gemma 4: Frontier multimodal intelligence on device
Week of March 30, 2026
Papers
-
Detailed Geometry and Appearance from Opportunistic Motion
Reconstructing 3D geometry and appearance from a sparse set of fixed cameras is a foundational task with broad applications, yet it remains fundame...
-
Learning to Commit: Generating Organic Pull Requests via Online Repository Memory
Large language model (LLM)-based coding agents achieve impressive results on controlled benchmarks yet routinely produce pull requests that real ma...
-
Weight Tying Biases Token Embeddings Towards the Output Space
Weight tying, i.e. sharing parameters between input and output embedding matrices, is common practice in language model design, yet its impact on t...
-
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation
Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternati...
-
Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning
Lack of accessible and dexterous robot hardware has been a significant bottleneck to achieving human-level dexterity in robots. Last year, we relea...
Blog Posts
-
Helping disaster response teams turn AI into action across Asia
AI for Disaster Response in Asia: OpenAI Workshop with Gates Foundation
-
STADLER reshapes knowledge work at a 230-year-old company
Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.
- Liberate your OpenClaw
Week of March 23, 2026
Papers
-
MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints
Video generative models show emerging reasoning behaviors. It is essential to ensure that generated events remain causally consistent across frames...
-
From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask ...
-
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained ...
-
Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation
Many segmentation tasks, such as medical image segmentation or future state prediction, are inherently ambiguous, meaning that multiple predictions...
-
CoVR-R:Reason-Aware Composed Video Retrieval
Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification...
Blog Posts
- Build a Domain-Specific Embedding Model in Under a Day
- What's New in Mellea 0.4.0 + Granite Libraries Release
-
How we monitor internal coding agents for misalignment
How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and st...
Week of March 16, 2026
Papers
-
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. B...
-
Representation Learning for Spatiotemporal Physical Systems
Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accura...
-
Visual-ERM: Reward Modeling for Visual Equivalence
Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured repres...
-
Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models
Evolutions in the world, such as water pouring or ice melting, happen regardless of being observed. Video world models generate "worlds" via 2D fra...
-
Neuron-Aware Data Selection In Instruction Tuning For Large Language Models
Instruction Tuning (IT) has been proven to be an effective approach to unlock the powerful capabilities of large language models (LLMs). Recent stu...
Blog Posts
- Beyond Semantic Similarity: Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline
-
Rakuten fixes issues twice as fast with Codex
Rakuten uses Codex, the coding agent from OpenAI, to ship software faster and safer, reducing MTTR 50%, automating CI/CD reviews, and delivering fu...
-
Designing AI agents to resist prompt injection
How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.
Week of March 9, 2026
Papers
-
Multimodal Large Language Models as Image Classifiers
Multimodal Large Language Models (MLLM) classification performance depends critically on evaluation protocol and ground truth quality. Studies comp...
-
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
While recent multimodal large language models (MLLMs) have made impressive strides, they predominantly employ a conventional autoregressive archite...
-
BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic unde...
-
Fly360: Omnidirectional Obstacle Avoidance within Drone View
Obstacle avoidance in unmanned aerial vehicles (UAVs), as a fundamental capability, has gained increasing attention with the growing focus on spati...
-
SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation
Incremental Few-Shot (IFS) segmentation aims to learn new categories over time from only a few annotations. Although widely studied in 2D, it remai...
Blog Posts
-
How Descript enables multilingual video dubbing at scale
Descript uses OpenAI models to scale multilingual video dubbing, optimizing translations for both meaning and timing so dubbed speech sounds natura...
-
Codex Security: now in research preview
Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher...
-
How Balyasny Asset Management built an AI research engine for investing
See how Balyasny built an AI research system with GPT-5.4, rigorous model evaluation, and agent workflows to transform investment analysis at scale.
Week of March 2, 2026
Papers
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Shows that adaptively allocating test-time compute can outperform 14x larger models.
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Demonstrates how RL-based training produces strong reasoning without supervised fine-tuning.
-
Utonia: Toward One Encoder for All Point Clouds
We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we pre...
-
MIBURI: Towards Expressive Interactive Gesture Synthesis
Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large...
-
CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance
Classifier-Free Guidance (CFG) has emerged as a central approach for enhancing semantic alignment in flow-based diffusion models. In this paper, we...
-
How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference
Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks ar...
-
ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation
Achieving autonomous and versatile whole-body loco-manipulation remains a central barrier to making humanoids practically useful. Yet existing appr...
Blog Posts
-
A Visual Guide to Quantization
Intuitive visual walkthrough of LLM quantization techniques from FP16 to GPTQ and GGUF.
-
Inference at the Edge: Practical Lessons from Deploying LLMs on Consumer GPUs
Real-world tips for efficient local LLM inference with vLLM on consumer hardware.
-
Understanding AI and learning outcomes
OpenAI introduces the Learning Outcomes Measurement Suite to assess AI's impact on student learning across diverse educational environments over time.
- PRX Part 3 — Training a Text-to-Image Model in 24h!
- GPT-5.3 Instant System Card