All Vaults
DR

AI Deep Research

AI Research Agents & Methodology

How OpenAI, Anthropic, and Google built their 'deep research' agents, plus the independent reviews, prompt frameworks, and academic surveys that explain why multi-step, multi-agent web research actually works now.

11 articles·Updated 7/5/2026·
Curated by@hawking520
Get Burn 451

About this vault

Deep research agents emerged in early 2025 as a new category of AI product: give the model a question, and it independently plans a research strategy, runs dozens to hundreds of searches, reads primary sources, and returns a cited report instead of a single answer. This collection covers the landscape from the source: OpenAI's original 'Introducing deep research' announcement (built on a browsing-optimized o3 variant), Anthropic's detailed engineering writeup on the orchestrator/subagent architecture behind Claude's Research feature, and Google's two Gemini Deep Research releases (the original Gemini Advanced rollout and the later Deep Research Max evolution with MCP support and native visualizations). It also includes Simon Willison's independent commentary tracking when these tools crossed from 'promising but hallucinatory' to genuinely useful, TechCrunch's coverage of the competitive benchmark race between Google and OpenAI, two arXiv surveys formalizing Agentic RAG and Deep Research agents as research categories, a practitioner's detailed prompt-engineering template for getting reliable output from these tools, and Together AI's open-source implementation with its own benchmark evaluation. Together, these pieces trace both how deep research agents are built internally and how well they actually perform in practice.

11 articles

Google launched its deepest AI research agent yet — on the same day OpenAI dropped GPT-5.2

TechCrunch reports on Google's December 2025 release of a reimagined Gemini Deep Research agent built on Gemini 3 Pro, exposed via a new Interactions API so developers can embed the research capability in their own apps. The piece covers Google's new DeepSearchQA benchmark plus results on Humanity's Last Exam and BrowserComp (where Google led overall but ChatGPT 5 Pro edged it out on BrowserComp), and notes the announcement's timing landed the same day OpenAI shipped GPT-5.2.

Try Deep Research and our new experimental model in Gemini, your AI assistant

Google's December 2024 announcement of Deep Research rolling out to Gemini Advanced subscribers. It explains that the agentic feature creates a multi-step research plan (which users can revise or approve), then iteratively browses and refines its analysis over a few minutes before producing a cited report exportable to Google Docs, powered by Gemini's reasoning and 1M-token context window.

Open Deep Research

Together AI's April 2025 technical blog post presenting their open-source Deep Research implementation: a plan-search-self-reflect-write pipeline that assigns different LLMs to planner, summarizer, JSON-extractor, and report-writer roles, uses Tavily for web retrieval, and generates Mermaid charts, cover images, and podcast audio alongside the report. They benchmark it against LangChain's open_deep_research and HuggingFace's SmolAgents on FRAMES, SimpleQA, and HotPotQA using an LLM-as-judge, and show an ablation proving multi-step search meaningfully outperforms single-step RAG.

Deep Research Agents: A Systematic Examination And Roadmap

An arXiv survey (submitted Jun 2025, revised Sep 2025) that formally examines 'Deep Research' (DR) agents as a distinct category of autonomous AI systems combining dynamic reasoning, long-horizon planning, multi-hop retrieval, iterative tool use, and structured report generation. It contrasts API-based retrieval with browser-based exploration, reviews tool-use frameworks including MCP integration, proposes a taxonomy of static vs. dynamic workflows and single- vs. multi-agent architectures, and critiques current benchmarks for restricted external-knowledge access, sequential-execution inefficiency, and metrics misaligned with DR agents' actual objectives.

My Ultimate DeepResearch Prompt Builder Template and How I Use It

A practitioner blog post (June 2025) sharing a detailed, fill-in-the-blank prompt template the author feeds into Gemini 2.5 Pro to generate deep-research prompts, then runs through a Deep Research tool. The template forces explicit definition of objective, scope/boundaries, key terms, step-by-step research sub-tasks, output format, an assigned analyst persona, source-quality requirements, and a mandatory AI self-review checklist before the report is delivered — aimed at turning vague research requests into structured, evidence-backed reports.

Introducing deep research

OpenAI's original February 2025 announcement of deep research, an agentic ChatGPT capability built on a version of o3 optimized for web browsing and data analysis. It describes how the agent independently searches, reads, and synthesizes hundreds of online sources into a fully cited report, typically taking 5-30 minutes, and positions it as a step toward AGI's ability to produce novel research.

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

An arXiv survey (submitted Jan 2025, revised through v4 in 2026) arguing that traditional RAG's static retrieve-then-generate workflow can't handle multi-step reasoning, and proposing Agentic RAG — RAG pipelines augmented with autonomous agents doing reflection, planning, tool use, and multi-agent collaboration. The paper introduces a taxonomy of Agentic RAG architectures by agent cardinality, control structure, autonomy, and knowledge representation, compares design trade-offs across existing frameworks, surveys applications in healthcare, finance, education, and enterprise document processing, and lists open challenges in evaluation, coordination, memory management, efficiency, and governance.

How we built our multi-agent research system

Anthropic's engineering writeup on building Claude's Research feature, an orchestrator-worker multi-agent system where a LeadResearcher spawns parallel subagents that each search independently before a CitationAgent adds sourcing. It reports a 90.2% improvement over single-agent Opus 4 on internal evals, explains that token usage and parallel tool calls are the dominant performance levers, and details prompt-engineering and production-reliability lessons (rainbow deployments, LLM-as-judge evals, human testing) learned moving from prototype to production.

Deep Research Max: a step change for autonomous research agents

Google's 2026 announcement of two new Gemini Deep Research API agents built on Gemini 3.1 Pro: a low-latency 'Deep Research' for interactive use and 'Deep Research Max' for exhaustive, high-quality asynchronous reports. It details new MCP support for connecting proprietary data sources, native chart/infographic generation, collaborative research-plan review, and claims Max consults more sources and surfaces nuances the December release often missed.

AI assisted search-based research actually works now

Simon Willison's April 2025 verdict that AI-assisted search-based research has finally crossed from disappointing to genuinely useful, after two-plus years of skepticism. He compares Deep Research implementations from Google Gemini, OpenAI, and Perplexity against the newer search-enabled o3/o4-mini models (which run searches inline during chain-of-thought reasoning), argues Google and Anthropic's search integrations lag behind, and gives a concrete example of using o4-mini-high to port code to a new SDK via live search.

Anthropic: How we built our multi-agent research system

Simon Willison's link-blog commentary on Anthropic's multi-agent research system post, in which he says the piece cured his prior skepticism of multi-agent LLM systems. He pulls out the key technical points — the orchestrator/subagent architecture, the 90.2% eval improvement over single-agent Opus 4, the 15x token cost of multi-agent systems, the Memory mechanism for surviving context truncation, and the OODA research loop used by subagents — and calls it the most practical writeup he has seen on multi-agent system design.

Start reading, not hoarding.

Import this vault to Burn 451 and actually read what matters.

New to Burn? See how the read-later app works →

Frequently asked questions

Who is AI Deep Research?

AI Deep Research is covered in this Burn 451 vault with a focus on ai research agents & methodology. How OpenAI, Anthropic, and Google built their 'deep research' agents, plus the independent reviews, prompt frameworks, and academic surveys that explain why multi-step, multi-agent web research actually works now.

How was the AI Deep Research vault curated?

The AI Deep Research vault was hand-curated by the Burn 451 editorial team from publicly available essays, blog posts, podcast transcripts, and social threads. Each piece includes an AI-generated summary so readers can triage in seconds. The vault auto-syncs as new content from AI Deep Research is published.

How many articles are in the AI Deep Research vault?

The AI Deep Research vault currently contains 11 curated pieces organized by topic, not chronology. Each article has an AI summary and a direct link to the original source. Items are refreshed hourly through Burn 451's ISR pipeline, so new publications appear within a day.

How do I use this vault with Claude or Cursor?

Install the burn-mcp-server package from npm and connect it to Claude, Cursor, or any MCP-compatible AI tool. The vault becomes queryable as live context — your AI can search, summarize, and cite articles from AI Deep Research directly in conversation without manual copy-paste or re-uploading files.

What is Burn 451?

Burn 451 is a read-later app built around a 24-hour burn timer that forces daily triage. Articles you save must be read, vaulted, or released within 24 hours. The Vault layer — including this AI Deep Research collection — holds permanent curated reading lists for AI thought leaders, founders, and researchers.

Content attributed to original authors. Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.