Structured Output Prompting: Get JSON, Tables, and Formatted Data From Any AI
“Give me the data in JSON format” works about 60% of the time. The other 40%, you get malformed JSON, extra commentary, missing fields, or the model deciding to explain why JSON is a great format instead of actually producing it.
Structured output prompting is the set of techniques that push that 60% to 95%+. If you’re building anything that programmatically consumes AI output, this is essential.
Why Models Struggle With Structure
LLMs generate text token by token. They don’t have a concept of “valid JSON” โ they’re predicting the next most likely character based on patterns. This means:
- They might forget to close a bracket
- They might add a trailing comma (invalid JSON)
- They might wrap the JSON in markdown code blocks you didn’t ask for
- They might add explanatory text before or after the JSON
Understanding this helps you write prompts that work with the model’s tendencies, not against them.
The Reliable JSON Prompt Template
This template works consistently across GPT-4, Claude, Gemini, and most capable models:
Extract the following information from the text below.
Respond with ONLY a JSON object. No explanation, no markdown, no additional text.
Required fields:
- name (string): The person's full name
- email (string): Email address, or null if not found
- company (string): Company name, or null if not found
- role (string): Job title, or null if not found
Example output:
{"name": "Jane Smith", "email": "[email protected]", "company": "Acme Inc", "role": "CTO"}
Text to extract from:
"""
{input_text}
"""
Why each element matters:
- “Respond with ONLY a JSON object” โ Explicit instruction to suppress commentary
- Field definitions with types โ The model knows exactly what to produce
- Null handling โ Tells the model what to do when data is missing (instead of guessing)
- Example output โ Shows the exact format expected
- Delimited input โ Triple quotes separate the instruction from the data
Beyond JSON: Other Structured Formats
CSV Output
Convert the following data into CSV format.
Output ONLY the CSV data with a header row. No explanation.
Use commas as delimiters. Wrap fields containing commas in double quotes.
Header: Name, Email, Department, Start Date
Data:
{input}
Markdown Tables
Organize this information into a markdown table.
Output ONLY the table. No text before or after.
Columns: Feature | Free Plan | Pro Plan | Enterprise
Sort rows by feature name alphabetically.
Information:
{input}
YAML
Convert this configuration into valid YAML.
Output ONLY the YAML. No code fences, no explanation.
Use 2-space indentation. Include comments for non-obvious values.
Configuration:
{input}
Validation Strategies
Never trust AI output blindly. Always validate.
Python JSON Validation
import json
def parse_ai_json(response: str) -> dict | None:
# Strip markdown code fences if present
cleaned = response.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("\n", 1)[1] # Remove first line
if cleaned.endswith("```"):
cleaned = cleaned[:-3]
cleaned = cleaned.strip()
try:
data = json.loads(cleaned)
except json.JSONDecodeError:
# Try to fix common issues
# Trailing comma before closing brace
cleaned = re.sub(r",\s*([}\]])", r"\1", cleaned)
# Single quotes instead of double
cleaned = cleaned.replace("'", '"')
try:
data = json.loads(cleaned)
except json.JSONDecodeError:
return None
return data
def validate_schema(data: dict, required_fields: dict) -> list[str]:
"""Validate that output matches expected schema"""
errors = []
for field, expected_type in required_fields.items():
if field not in data:
errors.append(f"Missing field: {field}")
elif not isinstance(data[field], expected_type) and data[field] is not None:
errors.append(f"Wrong type for {field}: expected {expected_type.__name__}")
return errors
Retry With Error Feedback
When validation fails, feed the error back to the model:
Your previous response was not valid JSON. The error was:
{error_message}
Your response was:
{previous_response}
Please fix the JSON and respond with ONLY the corrected JSON object.
This self-correction loop fixes most issues in one retry.
Advanced Techniques
Schema Enforcement With Pydantic
For production systems, define your expected output as a Pydantic model:
from pydantic import BaseModel, Field
class ExtractedContact(BaseModel):
name: str = Field(description="Full name")
email: str | None = Field(description="Email address")
company: str | None = Field(description="Company name")
role: str | None = Field(description="Job title")
# Include the schema in your prompt
schema_str = ExtractedContact.model_json_schema()
prompt = f"Extract contact info. Output must conform to this JSON schema:\n{schema_str}"
Function Calling / Tool Use
Most modern APIs (OpenAI, Anthropic, Google) support function calling, which forces the model to output structured data matching a predefined schema. This is more reliable than prompt-based approaches.
# OpenAI function calling example
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": f"Extract contact info from: {text}"}],
tools=[{
"type": "function",
"function": {
"name": "save_contact",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"company": {"type": "string"},
},
"required": ["name"]
}
}
}]
)
Function calling gives you schema validation at the API level โ the model is constrained to produce valid output.
Batch Processing
When extracting structured data from multiple items, process them one at a time rather than asking for an array of 50 objects. Models are more reliable with single-item extraction.
results = []
for item in items:
result = extract_single(item) # One API call per item
if validate(result):
results.append(result)
else:
results.append(retry_extract(item))
More API calls, but dramatically higher accuracy.
Common Failure Modes
1. The Chatty Model. “Sure! Here’s the JSON you requested:” followed by the actual JSON. Fix: “Respond with ONLY the JSON. Your entire response must be parseable as JSON.”
2. Markdown Wrapping. The model wraps output in ```json code fences. Fix: “Do not use code fences or markdown formatting.” Or just strip them in post-processing.
3. Hallucinated Fields. The model adds fields you didn’t ask for. Fix: “Include ONLY the fields listed above. Do not add any additional fields.”
4. Inconsistent Null Handling. Sometimes null, sometimes empty string, sometimes “N/A”. Fix: “Use null for missing values. Do not use empty strings, ‘N/A’, or ‘unknown’.”
Recommended Gear
Key Takeaways
- Always include an example of the exact output format you want.
- Use “Respond with ONLY [format]” to suppress commentary.
- Define every field with its type and null behavior.
- Validate and retry โ never trust raw AI output in production.
- For production systems, use function calling / tool use over prompt-based formatting.
Structured output is where prompt engineering meets software engineering. Get it right, and AI becomes a reliable data processing tool. Get it wrong, and you’re writing regex to parse free-form text โ which is exactly what you were trying to avoid.