agent-readygeollms-txtmcpseoai-agentsweb3

How to Make Your Website Agent-Ready (GEO Playbook)

AI agents are becoming a real discovery channel. A practical, implementation-first guide to robots.txt, llms.txt, MCP servers, A2A cards, markdown negotiation, and the rest of the agent-readiness stack, based on rebuilding ethereallabs.io against the full checklist.

Ethereal Labs9 min read
How to Make Your Website Agent-Ready (GEO Playbook)

TL;DR

  • AI agents are becoming a real discovery and referral channel. If your site is not machine-readable, agents skip you.
  • Agent-readiness is a stack. You need discovery files, structured data, an MCP or A2A server, markdown negotiation, and a clear robots.txt policy.
  • Most of it is a weekend of work. None of it is speculative. Scanners like isitagentready.com already score sites on these signals.
  • We rebuilt ethereallabs.io against the full checklist. This post is the playbook, based on real implementation work.
  • Skip the parts that do not apply to your product. Honest gaps beat fake endpoints.

AI agents are starting to pick websites the way search engines used to. A user asks ChatGPT for a Web3 development agency. The agent fetches a handful of candidate sites, parses what it can, and decides who to surface. If your HTML is a black box, the agent moves on.

This is Generative Engine Optimization, or GEO. It is SEO for a reader that is a language model instead of a human.

We just rebuilt ethereallabs.io against the full agent-readiness checklist. This is what actually matters, what to skip, and what we shipped.

Why agent-readiness is not just SEO

Quick Recap: Traditional SEO optimises for a search engine. Agent-readiness optimises for a language model that summarises, cites, and recommends.

Classic SEO cares about keywords, backlinks, crawlability, and page speed. Those still matter. Agents care about a different set of signals.

Can they read your content without running JavaScript. Can they tell what your product is in a single request. Is there a machine-readable description of your APIs, tools, and pricing. Do you label which content is agent-facing.

If the answer is no, your site is invisible to agents even if it ranks on Google. The cost is real. For a services agency, not surfacing in "find me a Web3 development studio" agent queries is lost pipeline.

The agent-readiness stack

Quick Recap: Seven layers cover the full surface. Each layer has scanner tests, recognised standards, and known tradeoffs.

Think of it as a stack. Each layer has its own file, standard, or protocol.

  1. robots.txt with AI crawler directives. Explicit allow or disallow for GPTBot, ClaudeBot, CCBot, PerplexityBot, Google-Extended, and friends. Add Content-Signal directives for training and search preferences.
  2. llms.txt and llms-full.txt. Markdown descriptions of your product, written for language models. One-shot full context in llms-full.txt.
  3. Structured data (JSON-LD). Organization, Product, Service, FAQPage, Review, Speakable. Linked via @id so agents parse you as one entity.
  4. Discovery files in /.well-known/. MCP server card, A2A agent card, API catalog (RFC 9727), agent-skills index. Each has a specific path scanners probe.
  5. MCP server. A live endpoint that agents can call to list services, get details, or fetch contact info. Streamable HTTP, stateless, read-only.
  6. Markdown negotiation. When an agent sends Accept: text/markdown, return a clean markdown body instead of HTML.
  7. Honest agent-facing views. An /index.md canonical, a ?mode=agent query param, and a machine-readable pricing.md if relevant.

Each piece has real scanner weight. None of them is hard to implement.

robots.txt done right

Quick Recap: Set a clear AI policy. Name the crawlers. Add Content-Signal. Leave a sensible tier structure.

Most robots.txt files have a wildcard and a sitemap. That is not enough for agent scanners. They check three things.

First, are Tier 1 AI crawlers named explicitly. Second, is there a Content-Signal directive stating your AI-training and search preferences. Third, is the file structured clearly enough that a scanner can parse your policy intent.

User-Agent: *
Allow: /
Content-Signal: ai-train=yes, search=yes, ai-input=yes

# LLM training crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

We added explicit entries for 60+ AI crawlers, grouped into tiers by role. Search engines, LLM training bots, live agent browsers, dataset crawlers. The Content-Signal line tells compliant crawlers your actual preferences in one string.

If you want to block training, flip those values. ai-train=no is a legitimate policy. Blocking training but allowing search is common for publishers.

llms.txt and llms-full.txt

Quick Recap: llms.txt is a short markdown description of your product for language models. llms-full.txt is the single-request full dump.

The pattern started as a proposal and is now widely scanned. Put a markdown file at /llms.txt describing your product. Include a summary, core offerings, key URLs, and an agent-facing FAQ.

The difference between llms.txt and llms-full.txt is depth. llms.txt is the index. It links to service pages, case studies, blog posts. An agent that wants everything has to follow links.

llms-full.txt is the monolith. One file, full content, no link-following. For a services agency this includes every service description, every case study, testimonials, and recent blog posts inline. For a SaaS it would include full API docs, integration guides, and schemas.

We serve llms.txt as a static file and generate llms-full.txt at runtime from the same data sources that power the marketing pages. It stays in sync automatically.

Structured data, done properly

Quick Recap: Connect your schema.org entities with @id references. Add FAQPage, Review, and Speakable on top of Organization.

Most sites stop at an Organization schema and call it done. Agents need more. At minimum, publish:

  • Organization with sameAs links to every social profile you own
  • Product or Service describing what you sell
  • ProfessionalService if you do local or service-based work
  • FAQPage with real questions and answers
  • Review for real customer testimonials, one per review, linked to the Organization
  • Speakable marking which parts of your content are agent-summarisable

All entities should be in a single @graph with @id cross-references. That way an agent parses you as a connected entity, not a bag of disconnected schemas.

A note on Review schema. Do not fabricate reviewRating values on testimonials that were not star-rated. Google penalises that and it misleads users. Emit the Review entity without reviewRating if the source was qualitative.

MCP server, A2A card, /.well-known/ files

Quick Recap: Agents discover your capabilities through a set of predictable well-known paths. Publish them.

Your /.well-known/ directory should cover:

  • mcp.json and mcp/server-card.json: MCP discovery. Points agents at your MCP endpoint.
  • agent-card.json: A2A agent card. Describes your agent's capabilities for agent-to-agent calls.
  • agent-skills/index.json: skills index per the Agent Skills RFC. Each skill has a sha256 digest for integrity.
  • api-catalog: RFC 9727 linkset pointing to your OpenAPI specs, documentation, and related resources.

The MCP server itself is a small endpoint that speaks Streamable HTTP. For a services agency, stateless and read-only is correct. Tools expose service listings, case studies, and contact channels. Nothing writes. Nothing accepts user input that routes to sensitive systems.

// Sketch of an MCP server tool for a services catalog registerAppTool(server, "list_services", { description: "Return the full service catalog.", inputSchema: {}, _meta: { ui: { resourceUri: "ui://services.html" } }, }, async () => { const payload = services.map(s => ({ slug: s.slug, title: s.title })); return { content: [{ type: "text", text: JSON.stringify(payload) }] }; });

We run a stateless MCP server at /api/mcp. Each request gets a fresh transport and server instance. No cross-request state. No session memory. It is defence in depth.

Markdown negotiation and the /index.md fallback

Quick Recap: When an agent sends Accept: text/markdown, return markdown. Also serve a canonical /index.md URL.

Browsers want HTML. Agents often want markdown. Content negotiation covers both from the same URL.

We added middleware that checks the Accept header. If the agent prefers text/markdown over text/html, the request rewrites to a markdown route handler. The handler pulls from the same data source as the HTML page and returns a clean markdown body.

Paths covered:

  • / returns a markdown homepage summary
  • /services/{slug} returns a markdown service summary
  • /case-studies/{slug} returns a markdown case-study summary
  • /blog/{slug} returns the raw markdown post body

We also added a /index.md canonical URL. Some agents probe predictable paths rather than negotiate. Both work.

Structural tweaks: headings, hreflang, agent mode

Quick Recap: Scanners check heading hierarchy, language hints, and agent-specific views. Each fix is small.

Three smaller fixes matter for scanner scores.

Heading hierarchy. Scanners flag pages with an H1 and then a jump to H3. Even if the visual design leaves a section unlabeled, add a visually-hidden H2 to bridge the gap. A screen-reader-only class is a clean way to do this.

hreflang tags. If your site is English-only, still emit <link rel="alternate" hreflang="en"> and <link rel="alternate" hreflang="x-default">. Without them AI assistants sometimes serve wrong-language versions to international users.

?mode=agent query param. A growing convention. When the homepage receives ?mode=agent, rewrite to the markdown summary. Agents that want a machine-readable view can get one via an obvious query string.

What to skip, and why

Quick Recap: Scanners penalise missing OAuth, x402, UCP, and ACP endpoints. For most service-based sites, those endpoints should not exist.

Agent-readiness scanners reward every box ticked. But ticking boxes dishonestly hurts more than it helps.

Skip OAuth/OIDC discovery if you have no protected APIs. Publishing empty /.well-known/openid-configuration is misleading.

Skip x402, UCP, ACP payment protocols unless you actually sell pay-per-request API access. A services agency with human-scoped engagements has no per-call pricing. Implementing fake payment endpoints confuses agents and risks real attempts.

Skip AggregateRating if your testimonials are not star-rated. Fabricating ratings is a policy violation and misleads users.

Skip pricing.md only if you genuinely have no pricing signal to share. Even a services agency can publish typical ranges, what moves price up or down, and what is always included. That is honest and useful.

Tell agents what you are not. Our llms.txt has a "What Ethereal Labs is NOT" section that explicitly lists non-applicable agent-readiness checks. It is the cleanest way to avoid being downscored for missing features you should not have.

Security notes

Quick Recap: MCP servers, markdown routes, and agent-card files all add public surface. Treat them like public API endpoints.

Any agent-facing endpoint is an attack surface. A few rules we followed:

  • MCP server is read-only, stateless, and only returns data already public on the marketing site
  • Markdown routes pull from the same data sources as the HTML pages. Same access controls, same cache rules
  • No secret is emitted in any agent-facing file. We audited llms.txt, vendor-info.json, and the MCP JSON responses for accidental leaks
  • Route handlers validate slugs with a strict regex before touching the database
  • Supabase queries are SELECTs only. RLS is configured. The anon key stays server-side because no component uses the NEXT_PUBLIC_ prefix

Agent-readiness is not an excuse to lower your security posture. Read-only surface, strict schemas, and no user input touching sensitive systems are the rules.

Measuring what you ship

Quick Recap: Scanners are crude but useful. Score changes are a lagging indicator. Real signal comes from actual agent referrals.

Two scanners worth running: isitagentready.com and orank.ai. Both probe public endpoints and grade on a point scale.

Use them for coverage, not absolute scores. A site missing every well-known file will score badly. A site with everything honest in place will score well. Chasing fabricated endpoints to bump a score is a bad trade.

Real signal comes from logs. Watch your server logs for GPTBot, ClaudeBot, ChatGPT-User, Perplexity-User hits. Count them over time. That is the real traffic signal.

Once agents start referring users, you will see it in qualified inbound. Prospects who already know what you do, because the agent explained it from your llms.txt and MCP server before they clicked.

Tradeoffs and pitfalls

Quick Recap: Agent-readiness adds surface area and maintenance. Worth it for most product and service sites. Not free.

Real cost:

  • Every /.well-known/ file is a commitment to keep information current. Stale metadata misleads agents
  • llms-full.txt generated at runtime needs the same cache strategy as your pages
  • An MCP server is an always-on public endpoint. Treat it as production infrastructure
  • Markdown routes double the surface for every page you negotiate. Test both rendering paths

Worst case: a misconfigured MCP server returning error pages for every call. Agents downrank you for that. An up-to-date, clean robots.txt and llms.txt is better than a half-broken MCP server.

Ship the parts you can keep running reliably. Do not ship what you cannot maintain.

E

Ethereal Labs

Web3 Development Studio · London, UK

Ethereal Labs is a Web3 development studio and official Base Services Hub agency. Founded in 2020, the team has delivered 15+ projects handling $1B+ in total volume with zero security incidents. Specializing in smart contract development, full-stack dApps, and token launch infrastructure across Ethereum, Base, Solana, and Polygon.

Smart ContractsDeFiNFTsToken LaunchesBase BlockchainSolidity
All articles