AI visibility glossary

Plain-English guide
to getting found by AI.

AI visibility has its own language. This glossary explains every term in plain English — what it means, why it matters, and what you can do about it.

Use it to understand why AI skips some sites and cites others — and to get your team speaking the same language when it comes to getting found.

Check your AI readiness score:

Jump to:Core Concepts: The Shift SEO Essentials GEO, AIO, and AISEO Strategy LLM Retrieval, RAG, and Answer Systems Content Design for AI Readability Entities, Structured Data, and the Semantic Web Crawling, Access, and Indexing Trust, Authority, and Safety Measurement, Testing, and Operations Technical Performance, Accessibility, and Multimodal

Core Concepts: The Shift

Fundamental terms describing the transition from traditional search to AI synthesis and agent-driven discovery.

AIO (AI Optimization)#

Optimizing pages and site signals so AI systems can reliably retrieve, understand, and cite your content in generated answers.

GEO (Generative Engine Optimization)#

Strategies to earn visibility inside generative engines (chat assistants and answer engines) where output is synthesized, not ranked links.

AEO (Answer Engine Optimization)#

Optimization focused on inclusion, attribution, and accuracy inside systems that produce direct answers from multiple sources.

Answer Engine #

A system that reads and fuses information from several sources to produce a single response, often with citations.

Generative Search #

Search experiences where the primary interface is a generated summary/answer rather than a list of blue links.

Assistant Referral #

Traffic or leads originating from users who found you via an AI assistant recommendation.

Citation Rate #

The share of relevant assistant answers where your domain is cited as a source.

Mention Rate #

How often your brand/product is named in answers, regardless of whether a clickable citation is provided.

Zero-Click Experience #

A journey where the user gets an answer without visiting a site; success is measured by being cited and trusted, not clicks.

Direct Answer #

A response that attempts to fully satisfy the query in one output, reducing the need for follow-up browsing.

Synthesis #

The process of combining information from multiple documents into a single coherent response.

Attribution #

Explicit linking or referencing of the sources used to support a generated answer.

Grounded Answer #

An answer constrained by retrieved sources so claims can be traced back to evidence.

Confidence Signal #

A cue that increases the likelihood a model will use/cite a page (clarity, provenance, structured data, consistent identity).

Freshness #

How up-to-date the content is relative to the query; critical for time-sensitive or rapidly changing topics.

Query Intent #

The underlying goal behind a query (learn, compare, buy, troubleshoot), used to decide which content format best satisfies it.

Conversational Query #

A natural-language question with context and follow-ups, typical of chat and assistant interfaces.

Agentic Search #

Workflows where an AI agent plans multiple steps (search, read, compare, act) to solve a task.

Multi-hop Question #

A question that requires combining multiple facts or steps of reasoning across sources.

Source Preference #

Biases in how a system selects sources (e.g., clarity, authority, format, accessibility, and structured signals).

Answer Surface #

Any interface where answers are consumed (chat, overviews, voice, browser sidebars), each with different citation and formatting constraints.

SEO Essentials

Classic search terminology that still matters—especially for crawlability, indexing, and trustworthy site architecture.

SEO (Search Engine Optimization)#

Practices that improve visibility in traditional search engines by aligning content, structure, and authority signals with ranking systems.

SERP (Search Engine Results Page)#

The results page shown by a search engine, including organic results, ads, and rich features.

Keyword #

A word or phrase representing a search demand; often used to plan content and measure search performance.

Search Intent #

Why a user searches (informational, navigational, commercial, transactional) and what content format best satisfies it.

On-page SEO #

Optimizations on the page itself: titles, headings, content clarity, internal links, and structured markup.

Title Tag #

The HTML title used for page identification in browsers and search previews; also a strong summarization cue.

Meta Description #

A short summary used in previews; not a direct ranking factor in many systems but influences click behavior and context cues.

Heading Hierarchy #

Use of H1–H6 to structure content; helps both humans and machines map sections and extract answers.

Internal Linking #

Links between your pages that help crawlers discover content and help models navigate topic relationships.

Backlink #

A link from another domain to yours; often used as a proxy for authority and popularity.

Anchor Text #

Clickable text of a link; provides topical context about the target page.

Crawl Budget #

The practical limit of how many pages a crawler will fetch from a site within a time window.

Indexing #

Storing and organizing fetched content so it can be retrieved for future queries.

Canonical URL #

A declared primary URL for a piece of content to reduce duplication and consolidate signals.

Duplicate Content #

Substantially similar content across multiple URLs; can dilute signals and confuse retrieval.

Rich Result #

Enhanced search presentation driven by structured data (e.g., product, FAQ, review).

Featured Snippet #

A prominent excerpt intended to answer a query directly; conceptually similar to answer-engine extraction.

CTR (Click-through Rate)#

Clicks divided by impressions; a key performance metric for classic search surfaces.

Impressions #

How often a page appears in results for a query, regardless of clicks.

Query Expansion #

Search systems rewriting or broadening a query to retrieve more relevant documents.

Topical Authority #

Perceived expertise in a topic area, built through comprehensive, internally consistent coverage.

GEO, AIO, and AISEO Strategy

Terms used to plan content and site signals for assistant-led discovery, citations, and generated answers.

AISEO #

A broad umbrella term for optimizing for AI-influenced discovery across assistants, overviews, and chat-based search.

AIASEO (AI-Assisted SEO)#

Using AI tools to accelerate SEO work (research, drafts, clustering) while still optimizing for human and crawler requirements.

LLMO (Large Language Model Optimization)#

Optimizations tailored to how LLMs retrieve, summarize, and cite sources, including format, clarity, and entity definition.

Citation-first Content #

Content designed so key facts are easy to extract and cite (definitions, tables, concise claims with evidence).

Narrative Control #

Ensuring assistants describe your product the way you intend by providing canonical language and unambiguous definitions.

Claim-to-Evidence Mapping #

Pairing each important claim with supporting proof (data, references, policies) that a model can cite.

Comparison Page #

A page that contrasts alternatives using consistent criteria, useful for best X and X vs Y queries.

Alternatives Page #

A page that frames competitor options and differentiators, often used in commercial investigation queries.

Use-case Landing Page #

A page centered on a specific job-to-be-done so assistants can match you to intent-driven prompts.

Definition-first Writing #

Stating the what it is and who it's for early to prevent wrong summaries from limited context.

Entity-led IA (Information Architecture)#

Structuring navigation and pages around core entities (product, features, industries) to improve retrieval and disambiguation.

Prompt Demand #

The real user questions asked in assistants, which may differ from traditional keyword phrasing.

Answer Format Fit #

Choosing the format that best matches common assistant outputs (lists, steps, tables, TL;DR, FAQs).

Citable Snippet #

A short, self-contained statement that remains correct when quoted out of context.

Brand SERP Hygiene #

Controlling what shows when the brand is searched so assistants pull consistent facts (profiles, docs, reviews).

Content Hub #

A cluster of interconnected pages that cover a topic comprehensively, signaling expertise and improving navigation for agents.

Information Gain #

New, unique value your page adds beyond common summaries; increases selection when many sources look similar.

First-party Proof #

Evidence originating from you (data, docs, changelogs) that supports claims and improves trust.

Content Refresh Cadence #

A schedule for updating key pages so assistants retrieve current facts (pricing, features, policies).

Citation Leakage #

When assistants cite third-party pages about you instead of your canonical pages due to better structure or trust signals.

Answer Ownership #

The practice of maintaining canonical pages that assistants consistently use as the source of truth for core facts.

LLM Retrieval, RAG, and Answer Systems

How AI systems fetch, rank, and compress information—useful for designing pages that survive retrieval and summarization.

RAG (Retrieval-Augmented Generation)#

A pattern where a model retrieves documents at query-time and generates an answer grounded in those sources.

Retrieval #

Selecting candidate documents or passages likely to contain the answer.

Embedding #

A numeric representation of text used to measure semantic similarity for retrieval.

Vector Database #

A store optimized for similarity search over embeddings.

Chunking #

Splitting content into smaller passages for retrieval and context packing.

Chunk Size #

The length of each passage; too small loses context, too large wastes limited context space.

Overlap #

Repeated text between chunks to preserve continuity across boundaries.

Passage Ranking #

Ordering retrieved chunks by relevance before sending them to a model.

Re-ranking #

A second-stage model that refines which passages are most useful after initial retrieval.

Hybrid Search #

Combining lexical retrieval (keyword) with semantic retrieval (embeddings) to improve recall and precision.

BM25 #

A classic keyword-based ranking function often used as a baseline lexical retriever.

Dense Retrieval #

Embedding-based retrieval that matches meaning rather than exact keywords.

Grounding #

Constraining generation to retrieved evidence to reduce unsupported claims.

Citation #

A reference to a source used to support a claim in a generated response.

Hallucination #

Generated content that is not supported by evidence; often triggered by ambiguity or missing sources.

Token Budget #

The limited amount of text the model can process at once; influences how much of your page is actually read.

Context Packing #

Selecting and formatting the best passages to fit within the model's context window.

Context Window #

The maximum amount of text a model can attend to in a single response.

Tool Use #

When a model calls external tools (search, browse, APIs) to gather information or take actions.

Agent #

A system that uses an LLM to plan steps, call tools, and iteratively refine outcomes.

Source Conflict Resolution #

How a system handles disagreeing sources; clear canonical pages reduce conflicts and improve accuracy.

Content Design for AI Readability

Writing and information architecture patterns that improve extractability, reduce ambiguity, and increase citation likelihood.

Readability #

How easily text can be parsed for meaning; shorter sentences and clear structure reduce misinterpretation.

Content Density #

The ratio of meaningful information to boilerplate; denser pages are easier to summarize within limited context.

Above-the-fold Clarity #

Whether the first screen communicates what you do, who it's for, and why it's credible.

Value Proposition #

A precise statement of the primary benefit and differentiation.

Ideal Customer Profile (ICP)#

The target customer definition; helps assistants match you to intent-driven prompts.

Job To Be Done (JTBD)#

The outcome a user hires a product/service to achieve; a strong framing for AI queries.

Disambiguation Copy #

Language that prevents confusion with similarly named entities or overlapping categories.

Canonical Facts Block #

A compact section containing key facts (pricing, features, geography, policies) in a machine-extractable form.

Specs Table #

A table of measurable attributes that models can cite and compare.

FAQ Block #

Question-and-answer formatting that maps directly to assistant behavior and reduces paraphrase errors.

TL;DR #

A short summary that preserves correctness when assistants compress your content.

Scannability #

Use of headings, bullets, and short paragraphs so both people and models can locate key statements quickly.

Definition Box #

A standard pattern where a term is defined in one or two sentences before deeper explanation.

Terminology Consistency #

Using the same names for the same things across pages (features, plans) to avoid conflicting summaries.

Pricing Clarity #

Pricing and packaging stated unambiguously (numbers, units, limits) to prevent incorrect assistant answers.

Limitations Disclosure #

Explicitly stating constraints and exclusions to prevent overclaiming in summaries.

Contact Path #

A clear, extractable way to reach you (email, form, phone) that models can surface confidently.

Outcome Proof #

Evidence of results (metrics, case studies) tied to specific claims.

Content Modularization #

Designing content in reusable blocks so retrieval can pick the right piece without needing the whole page.

Internal Context Links #

Links that provide definitions and supporting pages to keep retrieval grounded.

Change Log #

A page or section that records updates; helps assistants confirm what changed and when.

Entities, Structured Data, and the Semantic Web

Standards and modeling terms that reduce ambiguity and help machines understand who/what your site is about.

Structured Data #

Machine-readable markup that describes entities and relationships on a page.

Schema.org #

A shared vocabulary for structured data used by many search and parsing systems.

JSON-LD #

A common format for embedding structured data in web pages.

Entity #

A uniquely identifiable thing (brand, person, product, place) that can be referenced consistently.

Entity Disambiguation #

Making it clear which entity a term refers to when names overlap.

Knowledge Graph #

A network of entities and relationships used to support retrieval and reasoning.

SameAs #

A schema property linking an entity to authoritative profiles (e.g., social, knowledge bases) to confirm identity.

Organization Schema #

Structured data describing a company, including name, URL, logo, and identifiers.

Person Schema #

Markup describing an individual (author, executive) with role and identity links.

Product Schema #

Markup describing a product with attributes like brand, offers, and identifiers.

Service Schema #

Markup describing a service offering, useful when the product is delivered as a service.

Offer #

A structured description of a purchasable option (price, currency, availability, terms).

AggregateRating #

Structured data summarizing review ratings for a product or organization.

Review Schema #

Markup describing individual reviews and their authorship.

FAQPage Schema #

Markup for FAQ content that clarifies Q/A pairs for machine extraction.

HowTo Schema #

Markup for step-by-step procedures and required materials.

BreadcrumbList #

Structured navigation that clarifies page hierarchy and improves discovery.

WebSite Schema #

Site-level markup that can define search actions and canonical identity.

WebPage Schema #

Page-level markup describing the type and key properties of a page.

Article Schema #

Markup for editorial content, including author, publish date, and headline.

Dataset Schema #

Markup describing datasets; useful when your site publishes data intended for reuse and citation.

Crawling, Access, and Indexing

How bots fetch pages and how your site allows or blocks them—critical for both SEO and AI visibility.

Crawler #

An automated program that fetches pages for discovery and indexing.

User-Agent #

An identifier sent by a bot or browser; used in server rules and robots directives.

robots.txt #

A file that provides crawling directives; can allow or disallow specific user agents and paths.

Sitemap.xml #

A file listing important URLs and metadata to help crawlers discover and prioritize pages.

llms.txt #

A proposed convention for providing an LLM-friendly site summary and key links in a compact format.

Crawlability #

Whether bots can reach your pages through links and allowed paths.

Fetchability #

Whether bots can successfully download the page (status codes, auth, paywalls, blocks).

Renderability #

Whether the important content is present after rendering (especially for JS-heavy sites).

Server-Side Rendering (SSR)#

Rendering HTML on the server so bots and users receive content immediately.

Client-Side Rendering (CSR)#

Rendering in the browser; can hide content from bots that don't execute scripts fully.

Hydration #

The process of attaching JS interactivity to server-rendered HTML.

HTTP Status Code #

A response code (200, 301, 404, etc.) that signals success, redirects, or errors.

Redirect #

A response that forwards a request to another URL; excessive chains can harm fetchability.

404 Not Found #

A response indicating the URL doesn't exist; should be used for removed pages.

410 Gone #

A response indicating content was intentionally removed; can speed de-indexing.

Canonicalization #

Ensuring one primary URL represents a piece of content to avoid duplication and confusion.

Pagination #

Splitting lists across pages; requires clear linking for discovery and retrieval.

Filter-based navigation that can generate many URL combinations; needs careful control.

Parameter Handling #

Rules for URLs with query parameters to prevent duplicate or low-value pages.

Rate Limiting #

Restricting request frequency; misconfigured limits can block legitimate bots.

Bot Blocking #

Firewall or rules that deny bots; can inadvertently block AI crawlers and prevent visibility.

Trust, Authority, and Safety

Signals that increase the likelihood a system will cite you as a reliable source and avoid suppressing your content.

E-E-A-T #

Experience, Expertise, Authoritativeness, and Trustworthiness signals that support credibility assessments.

Authorship #

Clear identification of who wrote the content and why they're qualified.

Byline #

A visible author name connected to a profile with credentials and history.

Editorial Policy #

A public explanation of how content is created, reviewed, and updated.

About Page #

A canonical page explaining who you are, mission, team, and identity proof.

Contact Information #

Clear, consistent ways to reach the organization; supports legitimacy checks.

NAP Consistency #

Consistent Name, Address, Phone across the web; helps identity reconciliation.

Privacy Policy #

A page describing data practices; often used as a trust and compliance signal.

Terms of Service #

A page outlining usage terms; supports legitimacy and reduces uncertainty.

Content Provenance #

Information about where claims come from and how they were verified.

Primary Source #

Original evidence (docs, data, research) that can be cited over secondary summaries.

Citations (Sources)#

Links or references that support claims and make verification easy.

HTTPS #

Encrypted transport; a baseline trust and security requirement for many systems.

HSTS #

A security header enforcing HTTPS; reduces downgrade and interception risks.

security.txt #

A standard file to disclose security contact and policies; supports responsible disclosure.

DMARC #

An email authentication policy reducing spoofing; contributes to brand trust and deliverability.

Brand Consistency #

Consistent naming, logos, and descriptions across pages to reduce ambiguity.

Reputation Signal #

Indicators of reliability (reviews, coverage, certifications) that support confidence.

Review Management #

Practices to monitor and respond to reviews; impacts perceived trust.

Content Safety #

Ensuring content avoids policy-violating material that could trigger suppression.

YMYL (Your Money or Your Life)#

Topics like health/finance where higher trust standards apply and weak signals reduce citation likelihood.

Measurement, Testing, and Operations

How to quantify visibility in both classic search and answer engines, and how to run improvement cycles.

KPI #

A key performance indicator used to track progress toward an objective.

Baseline #

The starting measurement used to compare future changes and attribute improvement.

Benchmark #

A reference point (often competitors or industry) used to judge performance.

Competitive Set #

The group of alternatives you compare against for visibility and citations.

Audit Run #

A point-in-time evaluation of signals (content, trust, tech) across chosen URLs.

Score Normalization #

Converting different signals into a comparable scale so they can be combined into one score.

Prioritization #

Ranking fixes by expected impact and effort so work focuses on what moves outcomes.

Impact Estimate #

A reasoned projection of how much a change could affect visibility or citations.

Experiment #

A structured change designed to test a hypothesis under controlled conditions.

A/B Test #

Comparing two variants to determine which performs better on a metric.

Holdout #

A group excluded from changes to measure the true effect of an intervention.

Event Tracking #

Collecting user interactions (clicks, form submits) to evaluate outcomes.

UTM Parameters #

URL parameters used to label campaigns and traffic sources for analytics.

Conversion Rate #

Conversions divided by visits; a business outcome metric.

Cohort Analysis #

Tracking groups over time to understand retention and behavior changes.

Monitoring #

Ongoing measurement to detect drift, regressions, or improvements.

Alerting #

Automated notifications when metrics or signals cross thresholds.

Regression #

A drop in performance after a change; requires diagnosis and rollback plans.

Change Tracking #

Recording what changed (content, markup, routing) so score changes can be explained.

Release Notes #

A public or internal log of updates; helps users and agents understand what's new.

Technical Performance, Accessibility, and Multimodal

Technical quality signals that affect fetchability, comprehension, and compatibility across devices and AI modalities.

Core Web Vitals #

A set of user experience metrics used to quantify loading speed, responsiveness, and visual stability.

LCP (Largest Contentful Paint)#

Measures perceived load speed by timing when the largest content element appears.

INP (Interaction to Next Paint)#

Measures responsiveness by timing how quickly the page responds to user interactions.

CLS (Cumulative Layout Shift)#

Measures visual stability by tracking unexpected layout movement.

Page Speed #

How quickly a page becomes usable; impacts user satisfaction and some ranking systems.

Caching #

Storing resources to reduce repeat load time and server strain.

CDN (Content Delivery Network)#

A distributed network that serves assets closer to users and bots, improving speed and resilience.

Image Optimization #

Compressing and properly sizing images to reduce load while preserving quality.

Lazy Loading #

Deferring offscreen resources until needed; must be implemented carefully for bots and accessibility.

Accessibility #

Designing content usable by assistive technologies; often correlates with machine readability.

WCAG #

Web Content Accessibility Guidelines defining accessibility requirements and best practices.

People keep asking us these

Can't find what you're looking for? Get in touch.

The basics

Getting started

Now apply it

See your score. Fix what's missing.

Run a free audit on your site. Get a score on all 15 dimensions and a clear list of what to fix — in plain English.

Plain-English guideto getting found by AI.