What is a technical AEO audit?

A technical AEO audit evaluates whether your site is optimised for AI engine discovery. It checks 15 key areas including robots.txt AI crawler access, llms.txt implementation, structured data, content rendering, and AI-specific metadata.

What should robots.txt include for AI crawlers?

Explicitly allow major AI crawlers: GPTBot and OAI-SearchBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI), and Applebot-Extended (Apple Intelligence). Use specific User-Agent rules for each.

Do I need llms.txt for AEO?

llms.txt is an experimental navigation aid that helps AI engines understand your site structure and key pages. While not required for AI visibility, it provides a machine-readable summary that can improve how AI engines parse and cite your content.

Technical AEO Audit Checklist: 15 Items Every Site Needs

overview

audit-overview.md

A Technical AEO audit is the foundation of any AI Engine Optimisation strategy. Before you optimise content or build a query bank, you need to ensure AI engines can actually access, parse, and understand your site. This checklist covers the 15 essential items, grouped into five categories: crawler access, structured data, content architecture, AI-specific files, and monitoring.

Many of these items take minutes to implement. Others require coordination with your engineering team. The checklist is ordered by impact and dependency — start at the top, and each subsequent item builds on the foundation laid by the previous ones.

Use this as a quarterly audit tool. AI engine capabilities evolve rapidly, and new crawlers and retrieval patterns emerge regularly. A site that passes this audit today should be re-audited every 90 days to ensure continued compliance. The goal is not a perfect score — it is a systematic process that keeps your brand visible to the AI engines that are reshaping how people discover products and services.

crawler_access

Category 1: Crawler Access

If AI crawlers cannot reach your content, nothing else in your AEO strategy matters. These three items ensure your site is accessible to the bots that feed AI engines like ChatGPT and Perplexity.

[01-03] crawler-access

[01] robots.txt allows AI crawlers. Verify that your robots.txt does not block GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Applebot-Extended, or Google-Extended. Many sites inadvertently block these user agents with broad disallow rules. Check each one explicitly — a wildcard Disallow: / rule will silently cut you off from every AI engine.

[02] No IP-based blocking of AI ranges. Some WAFs and CDN configurations block requests from cloud IP ranges that AI crawlers use. Verify that your infrastructure does not silently block these requests at the network level, even when robots.txt allows them. Test with the actual user-agent strings from outside your network to confirm accessibility.

[03] Key pages return 200 status codes. Ensure your most important pages — homepage, product pages, about page, key blog posts — return HTTP 200 to AI crawlers. Pages behind authentication, paywalls, or JavaScript rendering gates are invisible to most AI crawlers. Content that only renders client-side will not be indexed. Use server-side rendering or static site generation for all content you want AI engines to access.

ai_files

Category 2: AI-Specific Files

These files are the newest additions to the technical AEO toolkit. They provide AI engines with explicit, structured information about your brand that goes beyond what can be inferred from page content alone. Most competitors have not implemented them, which makes this a high-impact first-mover opportunity.

[04-06] ai-files

[04] llms.txt at domain root. Create a concise llms.txt file that summarises your brand, links to key pages, and provides the context AI models need to describe you accurately. Include both /llms.txt for a compact overview and /llms-full.txt for comprehensive detail. This file is the single most important AI-specific asset you can create — it tells every LLM-based system exactly who you are and what you do.

[05] llm-profile.json in .well-known directory. Add a structured JSON file at /.well-known/llm-profile.json that provides machine-readable brand metadata. This file should include your entity type, description, key offerings, target markets, and links to your primary content. Unlike llms.txt, this file is designed for programmatic consumption by AI retrieval systems.

[06] ai.txt in .well-known directory. The /.well-known/ai.txt file declares your AI interaction preferences: which AI systems can access your content, how your brand should be attributed, and where to find your llms.txt and llm-profile.json files. Think of it as a meta-policy that ties your AI files together and signals intent to AI engine operators.

structured_data

Category 3: Structured Data

Structured data helps AI engines extract facts accurately. While AI models can parse unstructured text, Schema.org markup provides explicit signals that reduce the chance of misrepresentation and increase citation confidence.

[07-09] structured-data

[07] Organization schema on homepage. Implement Schema.org Organization markup with your legal name, logo, founding date, social profiles, and contact information. This gives AI engines an authoritative entity definition for your brand — the anchor point from which all other brand mentions are validated.

[08] Product/Service schema on offering pages. Each product or service page should have corresponding Schema.org markup — Product, SoftwareApplication, or Service types as appropriate. Include name, description, pricing (if public), and features. This is how AI models learn what you sell and to whom.

[09] Article and FAQ schema on content pages. Blog posts should use Article schema with headline, author, datePublished, dateModified, and publisher. Q&A content should use FAQPage schema. These schemas help AI engines attribute content correctly, assess recency, and extract question-answer pairs for response synthesis.

content_architecture

Category 4: Content Architecture

Content architecture determines how effectively AI models can navigate, parse, and extract information from your site. Poor architecture means even great content gets misunderstood or overlooked entirely.

[10] Clear heading hierarchy on every page. Each page should have a single H1 followed by a logical H2/H3 structure. AI models use heading hierarchy to understand content organisation and extract section-level answers. Broken or illogical heading structures reduce extraction accuracy and confuse retrieval systems about which section answers which query.

[11] Entity definitions in first paragraphs. The first paragraph of key pages should contain a clear, factual definition of the primary entity discussed. AI engines often extract the opening paragraph as a summary. Front-load your most important claims and ensure they are self-contained — the opening sentence should work as a standalone answer to "What is [your topic]?"

[12] Internal linking with descriptive anchor text. Internal links help AI models understand entity relationships across your site. Use descriptive anchor text that tells the model what the linked page is about. Avoid generic anchors like "click here" or "read more." Consistent brand naming across all pages is equally critical — if you call your product "Acme Pro" on one page and "AcmePro Suite" on another, the model may treat them as different entities entirely.

monitoring

Category 5: Monitoring & Validation

Implementation without monitoring is guesswork. These final items ensure you can verify that your technical AEO work is producing results and catch regressions before they cost you visibility.

[13-15] monitoring

[13] AI crawler access log review. Check your server access logs for GPTBot, ClaudeBot, PerplexityBot, and other AI user agents at least monthly. Confirm they are crawling your key pages, receiving 200 responses, and accessing the content you intend. If a crawler that was previously active disappears from your logs, investigate immediately — your WAF, CDN, or hosting provider may have introduced new blocking rules.

[14] Quarterly AI engine query testing. Every 90 days, run your target queries through ChatGPT, Perplexity, Claude, and Google AI Overviews. Document whether your brand is mentioned, whether the information is accurate, and whether citations link back to your site. This is the ground-truth test — if AI engines are not representing you correctly despite passing items 1-12, you have a content or authority gap to address.

[15] Structured data validation on every deploy. Add Schema.org validation to your CI/CD pipeline or post-deploy checks. Broken JSON-LD, missing required properties, or type mismatches silently degrade your AI visibility. Tools like Google's Rich Results Test and Schema.org's validator should be run against every page that carries structured data, after every deployment that touches those pages.

running_the_audit

Running the Audit: Practical Process

audit-process.sh

Run this audit by creating a spreadsheet with all 15 items. For each item, record pass/fail status, the specific finding, and the remediation action needed. Assign ownership and deadlines for each failed item.

Prioritise by category order. Items 01-03 (crawler access) are blockers — if AI crawlers cannot reach your content, nothing else matters. Items 04-06 (AI files) are high-impact, low-effort wins that most competitors have not implemented yet. Items 07-09 (structured data) and 10-12 (content architecture) are medium-effort improvements that compound over time. Items 13-15 (monitoring) ensure your work stays effective as AI engines evolve.

After remediation, validate by querying AI engines directly. Ask ChatGPT, Perplexity, and Claude about your brand and check whether their responses have improved in accuracy and completeness. The ultimate measure of a successful technical AEO audit is not a green checklist — it is accurate AI responses that cite your content and recommend your brand.

Schedule the next audit for 90 days out. AI engine capabilities evolve quarterly, and maintaining your technical AEO foundation requires ongoing attention. Treat this the way you treat your SEO technical health — as a living process, not a one-time project.

related_posts

Technical8 min read