Skip to content

Topic · A8

OpenAI Assistants → Responses migration: the 2026-08-26 sunset (2026)

OpenAI's Assistants API sunsets August 26, 2026. There is no automated Threads → Conversations migration. Teams with tenant-per-assistant architectures face 2-6 weeks of engineering work. Here's the migration map.

OpenAI announced the deprecation of the Assistants API in 2025. The sunset date is August 26, 2026 — about three months from now. After that date, /v1/assistants, /v1/threads, and /v1/runs return errors. There is no extension and no automated migration tool.

The Responses API is the replacement. For teams who built lightly on Assistants — one shared configuration, no persisted threads — the migration is a day. For teams with tenant-per-assistant architectures, persisted thread history, and production file_search workflows, it's 2-6 weeks of engineering work. This page maps the gaps.

The conceptual shift

Assistants modeled the agent as a backend object you registered once and called many times. Responses inverts the model: each call is the unit, and configuration travels with the call.

Assistants conceptResponses equivalent
Assistant (persistent backend object)None. Configuration is per-call.
ThreadConversation (optional, stateful) — or stateless with previous_response_id
RunIndividual response call
instructions field on AssistantFirst message in the conversation, or system message per call
Built-in tools bound to Assistant (file_search, code_interpreter)Tools passed per-call
File attachments via vector storeFiles passed per-call, or via the Conversations API's attachment surface
The implication: anywhere your code referenced an assistant_id, you now reference a configuration bundle that includes instructions, tools, files, and model choice. That bundle has to live somewhere — usually your own database.

The five migration patterns by Assistants usage

Pattern 1: single shared Assistant. One Assistant ID for all users, no per-user state. Migration: extract the Assistant's configuration into a constant, build the configuration into each Responses call, ship. Typically a 1-3 day job. Pattern 2: shared Assistant + persisted threads. One Assistant ID, but each user has a thread you reference across sessions. Migration: extract Assistant config (as above), then either (a) move thread history to Conversations API (OpenAI manages state) or (b) move thread history to your own DB and pass previous_response_id per call. Choice (b) is usually cleaner long-term but requires building a message store. Typically a 1-2 week job. Pattern 3: tenant-per-assistant. One Assistant per tenant, with that tenant's instructions, files, and tool configuration. This is the painful case. Migration: build a tenant-config table in your DB, export every Assistant's configuration into it, rewrite the call path to load tenant config per-request and inject it into the Responses call. Vector store binding goes away — files now ship per-call, which means re-architecting whatever was uploading files to the per-tenant vector store. Typically a 3-6 week job. Pattern 4: tenant-per-assistant + per-user threads. All of Pattern 3 plus persisted thread state per user. Migration: Pattern 3 plus Pattern 2's thread-handling decision. Typically 4-6 weeks. Pattern 5: heavy file_search usage. Production workflows that depend on per-Assistant vector store binding. Migration: rebuild the file-handling pipeline to attach files per-call, decide between OpenAI's hosted file_search and a self-hosted vector DB (Pinecone, Qdrant, pgvector). Often the right moment to leave file_search and own the retrieval layer; see /topic/rag-frameworks. Typically 3-5 weeks for the file pipeline alone, on top of whichever other pattern applies.

The export script you have to write

Since there's no automated migration tool, every team writes some version of this script. The minimum shape:

// 1. List every Assistant in your account.
const assistants = await openai.beta.assistants.list({ limit: 100 });

// 2. For each Assistant, extract the configuration. for (const a of assistants.data) { const config = { id: a.id, name: a.name, instructions: a.instructions, model: a.model, tools: a.tools, tool_resources: a.tool_resources, metadata: a.metadata, }; await db.tenantConfigs.upsert({ where: { assistantId: a.id }, create: { ...config, tenantId: a.metadata?.tenantId }, update: config, }); }

// 3. For each Thread, extract the message history. const threadsForAssistant = await db.threads.findMany({ where: { assistantId: a.id } }); for (const t of threadsForAssistant) { const messages = await openai.beta.threads.messages.list(t.openaiThreadId); await db.conversations.create({ tenantId: t.tenantId, userId: t.userId, messages: messages.data.map(m => ({ role: m.role, content: m.content[0]?.type === 'text' ? m.content[0].text.value : '', createdAt: m.created_at, })), }); }

The complications that turn this into real work:

  • Files attached to threads. The vector store binding goes away. Files need to be re-attached per-call or moved to your own retrieval store.
  • Runs in progress. A thread with an active run at migration time is in an indeterminate state. You either wait for runs to complete or accept that some users get a single "session reset" experience.
  • Tool calls in message history. Assistants' message format embeds tool calls in content; Responses' format is structured differently. The transform has to map cleanly.
  • Custom metadata. Anything in Assistant.metadata or Thread.metadata is yours and has to move into your DB explicitly.

The instructions-field gap

The Responses API has no instructions field analogous to Assistants. Three patterns for handling it:

Inline as system message per call. Pass the tenant's instructions as a system role message on every request. Simple, costs tokens on every call. If you're using /topic/anthropic-prompt-caching (different provider but same architectural pattern applies) — the cached prefix should be the instructions block. OpenAI's prompt caching kicks in automatically for prefixes ≥1024 tokens on most models. First message in Conversation. When using the Conversations API for state management, the first user-or-system message becomes the de facto instructions. Subsequent turns reference it via the conversation ID without re-sending. Developer message role. OpenAI added a developer role (effectively system-level) that survives across conversation turns. For instructions that should persist but be subordinate to the actual system prompt, this is the right slot.

Most production migrations end up using the first message of a Conversation, because it gives the cleanest token-cost story (the instructions get cached after the first call) and matches the mental model teams are used to.

What to keep, what to drop

The temptation during a forced migration is to rewrite everything. Resist it for components that aren't on the migration path.

Keep your client-side rendering. The Responses API surfaces messages in a similar shape to Assistants; your UI layer probably doesn't need to change much. Keep your auth, your rate limiting, your logging. These are upstream of the API call and survive the swap. Drop the Assistants-specific helpers in your codebase (anything wrapping openai.beta.assistants.* calls). Replace with direct openai.responses.create() calls or AI SDK equivalents. Reconsider your retrieval architecture. The file_search → per-call-files transition is a forcing function. If you've been wanting to move to a real vector store (pgvector, Qdrant, Pinecone), this is the moment. Reconsider your state-management layer. If you've been storing only OpenAI thread IDs and re-reading from their store, this is the moment to build your own message store. Long-term it's cheaper, faster, and gives you observability you can't get from the OpenAI side.

The cost-of-delay calculation

Three months until sunset means roughly 12 weeks of working time. A 4-week migration with 2 weeks of buffer means starting no later than week 6 from sunset — early July 2026. Teams that wait until August will be shipping under deadline pressure with no margin for the unknowns.

The unknowns that bite late-starters:

  • The export script produces unexpected shapes (custom metadata patterns you forgot about)
  • Production thread volume is higher than the dev fixtures suggested
  • File migration takes longer than estimated because the vector store had more state than you remembered
  • Edge cases in tool-call message shape force frontend rework
Start now. The 2-6 week estimate assumes nothing goes sideways; the realistic schedule includes slack for the things that always go sideways.

What to read next

Sources

Frequently asked

When does the OpenAI Assistants API actually sunset?
August 26, 2026 — roughly three months from today. OpenAI announced the deprecation in 2025 and confirmed the hard date. After the sunset, API calls to /v1/assistants, /v1/threads, /v1/runs, and the file_search built-in tool tied to assistants will return errors. The Responses API and the new Conversations API are the replacements. There is no extension and no soft-deprecation window — the date is the date.
What's the equivalent in the Responses API for each Assistants concept?
Threads become Conversations (stateful) or are dropped in favor of stateless responses (where you manage state yourself via previous_response_id). Runs become individual response calls. The 'instructions' field on assistants has no native equivalent — you pass instructions as the first message in the conversation, or as a system message on each call. Built-in tools (file_search, code_interpreter) are still available but configured per-call rather than per-assistant. The biggest conceptual shift: Assistants modeled the agent as a persistent backend object; Responses models each call as the unit and pushes state management to your application.
Why is the tenant-per-assistant pattern the painful case?
Teams that created an Assistant per tenant (one Assistant ID per customer, with that tenant's instructions, files, and tool configuration) modeled the Assistant as a long-lived backend object. The Responses API has no equivalent. You can't pre-register a tenant's configuration and then run it later — the configuration has to be passed on each call. Teams in this position need to (a) extract every tenant's Assistant configuration into their own database, (b) rewrite the call path to inject that configuration per-request, and (c) handle the per-call file-attachment story since the Assistants vector store binding goes away. The OpenAI dev forum thread on this is the highest-ranking practitioner signal.
Is there an automated migration tool?
No. OpenAI has explicitly stated they will not provide an automated Threads → Conversations migration. The reason given is that thread state varied significantly across uses — runs in progress, partial tool calls, file attachments, custom metadata — and OpenAI judged the per-team correctness risk too high for a generic tool. Teams have to write their own export-from-Assistants / import-into-Conversations script. The shape of that script depends on what your threads actually contain.
What's the engineering estimate for the migration?
2-6 weeks, depending on architectural shape. The lower end is teams with a single shared Assistant configuration and no persisted thread history — they swap the API call and ship. The upper end is teams with tenant-per-assistant + persisted thread state + custom file_search vector stores + production traffic at scale. The middle of the range (3-4 weeks) is the typical case: one or two Assistant configs, some persisted thread history, some files. Teams should plan for the upper-end estimate and be pleasantly surprised if they ship early.
Should I migrate to Conversations or go stateless?
Depends on your state-management situation. Conversations API gives you OpenAI-side state — useful if you don't want to rebuild thread persistence in your own stack. Stateless + previous_response_id gives you full control — useful if you already persist messages and just want OpenAI to stay out of your state layer. Most teams who built on Assistants assumed OpenAI was their state layer, which makes Conversations the lower-friction migration target. Teams who used the AI SDK or their own message store should go stateless.

Related topics