Context
The Grounding Page Standard defines an architecture for entity-centric web pages optimized for AI retrieval. It was developed from practical experience with AI systems and structured data, not from a single academic study.
However, a growing body of research is beginning to investigate similar questions: How should web content be structured for AI systems? What role does structured data play in Retrieval-Augmented Generation? And do entity-centric page designs improve answer quality?
This page collects academic and independent research that supports similar architectural principles. Each entry includes a summary of the study, its key findings, and an assessment of how it relates to the Grounding Page Standard.
Paper
Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval
Summary
This paper investigates whether structured linked data can improve retrieval accuracy in RAG systems. The authors tested different document representations across four domains (editorial, legal, travel, e-commerce) using Vertex AI Vector Search 2.0 and Google's Agent Development Kit. Seven experimental conditions were evaluated, comparing plain HTML, HTML with JSON-LD markup, and enhanced entity pages.
Key Findings
The study found that adding JSON-LD markup to existing HTML pages produced only modest improvements in retrieval accuracy. The significant gains came from a different approach: restructuring the content itself into dedicated entity pages where facts, properties and relationships are directly visible and navigable in the page content.
These "enhanced entity pages" achieved a +29.6% accuracy improvement for standard RAG and +29.8% for the full agentic pipeline compared to plain HTML. The best-performing variant (Enhanced+) reached 4.85 out of 5.0 in accuracy and 4.55 out of 5.0 in completeness. The study follows the principle of one page per entity.
Relevance for the Grounding Page Standard
This paper provides the closest architectural parallel to Grounding Pages found in academic literature so far. Several design decisions align directly:
- Visible content as primary source: The study confirms that AI retrieval systems extract information primarily from visible page content, not from metadata or markup alone. The Grounding Page Standard follows the same principle: the visible text is the authoritative source, with JSON-LD mirroring it as a secondary layer.
- One page per entity: The enhanced entity pages in the study follow the same structural unit as Grounding Pages: each page describes exactly one entity with its facts, properties and relationships.
- Structured navigation: The study's best-performing variant includes breadcrumbs and navigable relationships between entities. The Grounding Page Standard requires a similar structure through its entity ontology and cross-referencing rules.
- JSON-LD as complement, not substitute: The finding that JSON-LD alone produces only marginal improvements supports the Grounding Page Standard's position that structured data must mirror visible content, not replace it.
Paper
Generative Engine Optimization: How to Dominate AI Search
Summary
This paper examines how generative AI search engines (ChatGPT, Perplexity, Gemini) differ from traditional search in sourcing information. It introduces "Generative Engine Optimization" (GEO) as a framework for understanding visibility in AI-generated answers.
Key Findings
The study identifies a strong bias in AI search systems toward earned media (third-party, authoritative sources) over brand-owned content. It also reveals that AI engines differ significantly from each other in domain diversity, freshness, cross-language stability, and sensitivity to phrasing. This variation makes engine-specific optimization impractical and points toward a single, well-structured source of truth as a more sustainable strategy.
The research recommends engineering content for machine scannability and building structured technical foundations rather than relying on traditional marketing content.
Relevance for the Grounding Page Standard
The GEO research supports two foundational assumptions of the Grounding Page Standard:
- Structured content over marketing language: The study's finding that AI systems favor structured, machine-readable data over unstructured marketing content aligns with the Grounding Page Standard's core rule: no adjectives, no marketing claims, one fact per sentence.
- One source of truth across engines: The significant variation across AI engines supports the decision to create a single authoritative page per entity rather than attempting platform-specific optimization. A Grounding Page serves as this centralized definition.
Analysis
The Importance of About Pages for Digital Identity
Summary
This independent analysis examined more than 17,000 URLs across domains and entity types. It investigated which page types AI systems tend to reference when interpreting brands and entities.
Key Findings
The analysis found patterns suggesting that AI systems frequently draw on clear identity pages — particularly About pages — when constructing brand interpretations. Pages with a clear factual structure and consistent self-description appeared to serve as anchor points for entity understanding.
Relevance for the Grounding Page Standard
This observation supports the conceptual premise of Grounding Pages: that organizations benefit from maintaining a dedicated, clearly structured page that defines what an entity is. The Grounding Page Standard can be understood as a structured evolution of the classical About page — formalized for machine readability and optimized for AI retrieval.
How to Read This Page
This page is maintained as a living document. As new research on AI retrieval, entity representation, and structured data emerges, relevant studies will be added.
The selection criteria for inclusion are:
- The research addresses a question relevant to entity-centric web content and AI retrieval.
- The findings support, challenge, or add nuance to the architectural principles behind the Grounding Page Standard.
- The source is either peer-reviewed, published on a recognized preprint server, or based on a substantial independent dataset.
If you are aware of research that should be considered for this page, please get in touch.