OpenAI Prompt Guidance for GPT-5.5, GPT-5.4 & Codex: What Actually Works in 2026

OpenAI quietly published a detailed prompt guidance document covering its newest model generations — GPT-5.5, GPT-5.4, and Codex (GPT-5.3). It's more specific than anything they've shared before, and it directly contradicts several patterns that worked well on GPT-4. Here's what changed, what to stop doing, and which patterns consistently get the best results.

For years, the "more instructions = better results" approach to prompting dominated. Long system prompts, all-caps constraints, exhaustive rule lists — if something wasn't working, you'd add more words. That approach is now explicitly called out by OpenAI as a problem for their newer models.

The guidance is split by model: GPT-5.5 has its own section, GPT-5.4 has another, and Codex (the agentic coding model) gets a detailed breakdown for software engineering workflows. There's also a short set of universal patterns that apply across all three. Let's go through each.

GPT-5.5: Less Is More

GPT-5.5 / GPT-5.5 Pro

GPT-5.5 is where the guidance is most surprising. The model is described as performing better with shorter, outcome-focused prompts — not the lengthy instruction stacks that became standard with GPT-4. If your system prompt is over 300 words, it's probably hurting more than it's helping on this model.

1. Define the outcome, not the process

Instead of describing every step a model should take, describe what a successful output looks like. GPT-5.5 is capable enough to infer the process — it doesn't need to be hand-held through each step.

Stop doing this: "First, read the user's message carefully. Then identify the main intent. Then consider all possible responses. Then write a response that is helpful, professional, and concise…"

Do this instead: "You are a customer support agent for [Company]. Resolve issues concisely. Escalate only when you cannot solve the problem directly."

2. Separate personality from task instructions

OpenAI specifically recommends separating personality/tone definitions from task-specific instructions. Mixing them creates confusion about which constraint takes priority.

// PERSONALITY — define once, applies globally
Tone: Direct, warm, no filler phrases ("Great question!")
Style: Short paragraphs. Bullet points for lists of 3+.
Collaboration: Ask one clarifying question if intent is ambiguous.

// TASK — separate block
Task: Help users debug their code. Provide working solutions.
    

3. Use decision rules instead of absolute constraints

The guidance explicitly says to avoid "ALWAYS" and "NEVER" except for true invariants. These create rigid behavior that breaks on edge cases. Instead, write decision rules that give the model judgment.

❌ Absolute rule

"NEVER discuss pricing"

Breaks when users ask innocent questions like "is this free?" — model refuses unhelpfully

✓ Decision rule

"Redirect pricing questions to sales"

Model answers in context ("I can't quote prices, but our sales team at sales@co.com can help") rather than refusing

❌ Absolute rule

"ALWAYS ask before acting"

Creates endless confirmation loops even for trivial, reversible actions

✓ Decision rule

"Confirm before irreversible actions"

Model acts autonomously for low-stakes tasks, confirms only when it matters

4. Set retrieval budgets with explicit stopping conditions

When building RAG systems or agents that search for information, GPT-5.5 needs a clear stopping rule. Without one, it loops on searches trying to find marginally better results. The guidance recommends: "Search up to 3 times. If the result is sufficient after 2, stop."

5. Mark assumptions in creative work

When generating creative content that blends facts with invented content, the guidance recommends explicitly instructing the model to mark assumptions. This prevents confident hallucinations dressed as research.

GPT-5.4: Output Contracts & Verification Loops

GPT-5.4 / GPT-5.4 mini / GPT-5.4 nano

GPT-5.4 guidance focuses heavily on structured outputs and verification. The model is designed to follow explicit output contracts and to check its own work — but only if you build that into the prompt.

The Output Contract Pattern

Instead of hoping the model formats its response correctly, define an explicit contract at the start of the system prompt:

// Output contract — paste at start of system prompt
Format: Respond with three sections: Summary (2 sentences), Analysis (3-5 bullets), Recommendation (1 sentence).
Length: Total response under 200 words.
Tone: Professional, no hedging ("it seems like", "perhaps").
Never: Add unsolicited caveats about AI limitations.
    

This pattern is different from telling the model what to do — it tells the model what the finished output looks like. GPT-5.4 follows these contracts far more reliably than format instructions buried in paragraphs.

The Verification Loop

For high-stakes outputs, OpenAI recommends building a lightweight self-check into the prompt. The model checks four things before finalizing:

Is the content factually grounded in the provided sources?
Does it match the requested format?
Is the length within specified limits?
Does it avoid the forbidden patterns listed in the prompt?

Prompt addition: "Before responding, silently verify: (1) all claims are grounded in provided context, (2) output matches the format contract, (3) response is under [X] words. Only output the final answer."

The Follow-Through Default

One of the most practical additions for agentic apps: define a follow-through rule to prevent the model from stopping to ask permission at every step.

// Follow-through default for agentic apps
"If the user's intent is clear and the next step is reversible and low-risk, proceed without asking for confirmation. Only pause when an action is irreversible or outside the defined scope."
    

Mini vs Nano: What changes

Model	Requires	Best for	Avoid
GPT-5.4	Minimal explicit structure	Complex, multi-step tasks	Overly rigid format constraints
GPT-5.4 mini	More explicit structure	Bulk processing, classification	Ambiguous instructions
GPT-5.4 nano	Very explicit, narrow scope	Single-task, well-defined jobs	Multi-step reasoning chains

Reasoning effort as a tuning knob

GPT-5.4's API exposes a reasoning_effort parameter. OpenAI's guidance: start at "none" or "low" for execution tasks; only increase to "medium" or above when the task requires genuine reasoning (multi-step analysis, math, complex debugging). Higher effort costs more and takes longer — don't default to maximum.

GPT-5.3 Codex: Agentic Code Engineering

GPT-5.3 Codex

Codex gets its own section because agentic coding workflows are categorically different from chat. The guidance here is the most detailed — and the most actionable if you're building or using AI coding tools.

Tool hierarchy: always prefer specialized tools

Codex has a preferred order for operations. The guidance says to configure it with this hierarchy: specialized tools first (apply_patch, git, search tools), shell commands last. This prevents fragile string-manipulation workarounds when a proper tool exists.

Parallel tool calling by default

One of the biggest performance improvements: instruct Codex to batch independent operations rather than read files sequentially. Reading 5 files one by one is slow; reading all 5 in parallel is fast.

// Add to your Codex system prompt
"When exploring a codebase, read multiple relevant files in parallel rather than sequentially. Batch all independent operations."
    

Autonomy bias: proceed, don't pause

Codex is designed for agentic work. The guidance explicitly recommends an autonomy bias: gather context, plan, implement, test, and refine without asking for additional prompts. Pausing to confirm every decision defeats the purpose of an autonomous coding agent.

Key prompt addition: "When the task is clear and the implementation path is reasonable, proceed without asking for confirmation. Deliver a working solution with documented assumptions rather than stopping for clarifications."

Git safety rules are non-negotiable

The guidance is unambiguous: never use destructive git commands (reset --hard, force push, branch -D) without explicit user approval. This should be hardcoded in every Codex system prompt.

// Git safety — mandatory in all Codex prompts
"Never run destructive git commands (git reset --hard, git push --force, git clean -f, git branch -D) without explicit user approval. When in doubt about a git operation, describe the intended action and ask first."
    

Frontend work: no generic layouts

A surprisingly specific instruction for frontend tasks: use intentional design. Avoid placeholder gradients, generic card layouts, and default color schemes. The model should make real visual choices — typography, spacing, color — rather than producing a "looks like every Bootstrap site" output. If you're using Codex for UI work, explicitly say: "Use an intentional, professional design. No generic layouts."

DRY enforcement at model level

Before adding any new function or helper, Codex should search for prior implementations in the codebase. The guidance recommends: "Search for existing implementations before writing new ones. Extract and reuse shared code rather than duplicating."

Universal Patterns: Across All Models

Beyond model-specific guidance, OpenAI identifies patterns that improve results regardless of which model you're using:

Universal #1

Define success criteria first

Explicitly state what a good output looks like before describing the task. "A good response will be under 3 sentences, reference the user's specific situation, and propose one concrete next step."

Universal #2

Use modular constraint blocks

Structure prompts as separate blocks: personality, output contract, verification rules, tool rules. This makes prompts easier to test and update without breaking other parts.

Universal #3

Test one variable at a time

When a prompt isn't working, change one thing and test again. Changing multiple things simultaneously makes it impossible to know what actually helped.

Universal #4

Make completion visible

For multi-step tasks, include a completion checklist. The model tracks which items are Done, Blocked, or Cancelled and marks them before finishing.

What This Means for Your Existing Prompts

If you're using prompts that were written for GPT-4 or GPT-4o, most of them will still work — but you're leaving performance on the table. Here's a quick audit checklist:

Is your system prompt over 400 words? Cut it in half. GPT-5.5 specifically underperforms with verbose instruction stacks.
Do you use ALWAYS or NEVER for judgment calls? Convert them to decision rules with context and conditions.
Do you have a format contract? If not, add one. Define sections, length, and tone in a dedicated block.
Are you using reasoning_effort on every call? Default to low/none for most tasks; reserve high for hard reasoning problems.
Does your Codex prompt include git safety rules? If not, add them immediately.

The single highest-leverage change: Add an explicit output contract to the top of your system prompt. Most models, including GPT-4o, respond better to defined output structure — and for GPT-5.4, it's the difference between reliable and unreliable formatting.

The Bigger Picture

The core insight running through all of OpenAI's guidance is a shift from prescriptive to outcome-oriented prompting. Older models needed to be walked through every step. GPT-5.x models are capable of figuring out the steps — what they need is a clear picture of the destination.

The irony is that better models require shorter, more confident prompts. Trust the model more, control it less, and define "done" very clearly. That's the pattern OpenAI is pointing toward with every generation.

For teams managing multiple prompts across different models and use cases, this also highlights why prompt versioning and organization matters. The right GPT-5.5 prompt looks different from the right GPT-5.4 prompt — which looks different from the right Codex prompt. If all your prompts are hardcoded, updating them as guidance evolves is painful. If they're organized and versioned, it's a 5-minute update.

Save, organize, and version your prompts

PromptChief is built for exactly this — store prompt variants per model, track what works, and reuse your best system prompts across every project. Free to start.

Open PromptChief Free →

OpenAI's Official Prompt Guidance for GPT-5.5, GPT-5.4 & Codex: What Actually Works in 2026