Home » AI Search » Entities for SEO and AI Search

Entities for SEO and AI Search

Published on

Updated on

The featured image for the Entities for AI Search optimization article.

We spent years teaching algorithms to read keywords. We stuffed H1 tags, obsessed over density ratios, and prayed to the PageRank gods.

The new scenario is a bit different – becoming a recognized entity in the machine’s mental model of the world. When Perplexity or ChatGPT Search or Google’s Gemini needs to answer “who are the leaders in CRM,” you either exist as a distinct entity in their understanding, or you don’t exist at all.

From Strings to Things

Search engines underwent a revolution around 2012 when Google launched the Knowledge Graph. The shift was philosophical – stop treating the web as a bag of words and start treating it as a database of things.

A “thing” in this context is an entity – a person, place, organization, product, concept, or event that exists in the real world.

“Apple” isn’t just five letters. It’s multiple entities, like a fruit, a tech company, a record label, The Beatles’ corporation. Modern search systems must distinguish between these, and they do it through entity recognition.

Google’s Knowledge Graph contains billions of facts about entities and their relationships. When you search “who founded Tesla,” Google queries the graph by entity “Tesla, Inc.” → relationship “founder” → returns “Elon Musk” (plus the four other co-founders most people forget).

Schema.org markup is how you explicitly tell machines what you write about. According to data published by Google, pages on Rotten Tomatoes that utilized structured data saw a 25% higher click-through rate (CTR) compared to pages without it.

A diagram showcasing some examples of entities connecting to a knowledge graph.

The AI Connection

AI search systems, like Perplexity, ChatGPT Search, Bing Chat, extended this approach.

They are processing tokens and context, and use entity recognition to understand queries, retrieve relevant sources, and synthesize coherent answers. The difference is they’re doing it in real-time across the live web, not just querying a pre-built graph.

These systems need to know what they’re talking about. When someone asks “best AI coding tools,” the system must identify which entities qualify as “AI coding tools,” which sources authoritatively discuss those entities, and how to attribute claims correctly.

Why AI Search is an Entity Retrieval Problem?

Traditional search was a matching problem – find pages containing the query terms. AI search is a comprehension problem – understand what the user wants, find entities that satisfy that need, retrieve authoritative information about those entities, synthesize an answer.

Query Understanding

When someone types “jaguar speed,” is that the animal, the car, or the football team?

Entity disambiguation resolves this. Context from previous queries, user location, and co-occurring terms tell the system which jaguar entity is relevant.

Source Selection

Once the system knows you’re asking about the animal, it needs sources that authoritatively discuss Panthera onca.

Sources with clear entity markers, explicit mentions of “jaguar (Panthera onca)” or “the jaguar, a large cat native to the Americas” get prioritized. Vague sources that just say “jaguars are fast” without entity context get deprioritized.

Answer Attribution

AI systems cite sources. To cite you, they need to extract your entity as the source of specific claims.

If your content says “our research shows X” without identifying who “our” is, you won’t get cited. If it says “According to a study by MIT Media Lab’s Dr. Sarah Johnson, X,” you’ve created an attributable entity relationship.

How to Establish Entities with Schema Markup

Organizations

Whether you run a Fortune 500 company or a three-person startup, being recognized as an organizational entity is critical for visibility.

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Flux Robotics",
  "foundingDate": "2019-06-12",
  "founders": [
    {"@type": "Person", "name": "Maria Santos"},
    {"@type": "Person", "name": "Jin Park"}
  ],
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Austin",
    "addressRegion": "TX",
    "addressCountry": "US"
  }
}

The SameAs Protocol

The biggest threat to entity recognition is ambiguity. There are hundreds of companies with “Summit” in the name. How does an AI know which one you are?

The sameAs property solves this. It links your entity to canonical records in authoritative databases:

"sameAs": [
  "https://www.wikidata.org/wiki/Q137796337",
  "https://www.crunchbase.com/organization/flux-robotics",
  "https://www.linkedin.com/company/flux-robotics"
]

Wikidata is the go-to option. It’s the structured backbone of Wikipedia and the primary seed for Google’s Knowledge Graph. Getting a Wikidata entry gives you a Q-number, or a permanent, globally recognized entity identifier. Every AI system that uses Wikidata (which is most of them) can now resolve your identity with certainty.

Crunchbase works similarly for companies. LinkedIn provides employment and relationship verification. Twitter/X provides social proof and activity signals.

The more authoritative sources you link to via sameAs, the stronger your entity identity.

Write for Entity Extraction

Marketing copy optimizes for emotion. Entity-optimized copy optimizes for machine extraction. These goals conflict.

Consider this sentence from a typical About page: “We’re transforming how enterprises think about digital innovation through passionate commitment to excellence and customer-first values.”

To a human, this sounds impressive (if generic). To an NER system, it’s garbage. Zero extractable entities, zero concrete relationships, zero machine-comprehensible facts.

Now consider this: “Flux Robotics is a warehouse automation company. Flux Robotics was founded in 2019 by Maria Santos and Jin Park. The company is headquartered in Austin, Texas and manufactures the FluxBot autonomous forklift.”

This feels robotic, and that’s the point. You’ve created extractable triples:

  • (Flux Robotics) – (type) – (warehouse automation company)
  • (Flux Robotics) – (founded by) – (Maria Santos)
  • (Flux Robotics) – (founded by) – (Jin Park)
  • (Flux Robotics) – (founded in) – (2019)
  • (Flux Robotics) – (headquartered in) – (Austin, Texas)
  • (Flux Robotics) – (manufactures) – (FluxBot autonomous forklift)

Also, maintain consistency across all platforms. One source says founded in 2019, another says 2020? AI systems may drop it entirely.

People

Personal brands, executives, authors, researchers, and influencers all need entity optimization.

{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Dr. Sarah Chen",
  "jobTitle": "Chief AI Researcher",
  "worksFor": {
    "@type": "Organization",
    "name": "MIT Media Lab"
  },
  "alumniOf": {
    "@type": "EducationalOrganization",
    "name": "Stanford University"
  },
  "sameAs": [
    "https://www.linkedin.com/in/sarahchen",
    "https://twitter.com/sarahchen",
    "https://scholar.google.com/citations?user=ABC123"
  ]
}

Personal Entity Attributes

For people, AI systems track:

  • Professional role – Current position and employer
  • Education – Degrees and institutions
  • Publications – Books, papers, articles authored
  • Achievements – Awards, patents, notable projects
  • Expertise areas – Fields of knowledge or practice

Consistency for Personal Brands

Pick one canonical name format and use it everywhere:

  • “Dr. Sarah Chen” (not “Sarah Chen, PhD” on one platform and “S. Chen” on another)
  • Include credentials consistently (always PhD, or never)
  • Use the same profile photo across platforms (visual entity recognition)

Places

Restaurants, hotels, stores, landmarks, cities – any physical location needs entity optimization.

{
  "@context": "https://schema.org",
  "@type": "Restaurant",
  "name": "Blue Sage Bistro",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "742 Elm Street",
    "addressLocality": "Portland",
    "addressRegion": "OR",
    "postalCode": "97201",
    "addressCountry": "US"
  },
  "geo": {
    "@type": "GeoCoordinates",
    "latitude": "45.5152",
    "longitude": "-122.6784"
  },
  "servesCuisine": "New American",
  "priceRange": "$$"
}

Geographic Entity Disambiguation

The disambiguation problem is acute for places. There are 28 cities named “Paris” in the United States alone. Always include:

  • Country (for cities and regions)
  • State/Province (for cities)
  • City (for neighborhoods and landmarks)
  • Geographic coordinates (for precision)

“Our restaurant in Paris” is ambiguous. “Blue Sage Bistro in Portland, Oregon” is not.

Google Business Profile Integration

For local entities, Google Business Profile is non-negotiable. This feeds directly into:

  • Google Maps entity data
  • Local search results
  • AI Overview local recommendations
  • Knowledge Panel information

Claim your profile, complete every field, maintain accurate hours, and respond to reviews.

Products

Physical products, software, services – anything you sell needs entity recognition.

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "FluxBot X200",
  "description": "Autonomous warehouse forklift with computer vision",
  "brand": {
    "@type": "Brand",
    "name": "Flux Robotics"
  },
  "manufacturer": {
    "@type": "Organization",
    "name": "Flux Robotics"
  },
  "offers": {
    "@type": "Offer",
    "price": "125000",
    "priceCurrency": "USD"
  }
}

Product Entity Relationships

Products gain recognition through relationships:

  • Manufacturer – (FluxBot X200) → (manufactured by) → (Flux Robotics)
  • Category – (FluxBot X200) → (is a) → (autonomous forklift)
  • Features – (FluxBot X200) → (has feature) → (computer vision navigation)
  • Compatibility – (FluxBot X200) → (compatible with) → (standard pallet sizes)

The more specific these relationships, the better AI systems understand when to recommend your product.

Review Platforms as Entity Validators

Product reviews on authoritative platforms strengthen entity recognition:

  • G2 Crowd (for software)
  • Capterra (for SaaS)
  • Amazon (for physical products)
  • Consumer Reports (for consumer goods)
  • And more…

Each verified review is corroboration that your product entity exists and has specific attributes.

Creative Works

Books, articles, movies, music, research papers – all creative works are entities.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "The Future of Autonomous Robotics in Logistics",
  "author": {
    "@type": "Person",
    "name": "Dr. Sarah Chen"
  },
  "publisher": {
    "@type": "Organization",
    "name": "MIT Technology Review"
  },
  "datePublished": "2024-03-15"
}

DOI for Research Papers

Digital Object Identifiers (DOIs) are permanent entity identifiers for academic papers. If you publish research, always:

  • Register a DOI through your publisher
  • Include it in all citations
  • Reference it in your author profiles

DOIs become canonical entity IDs that persist even if URL structures change.

Book Entity Optimization

For authors, each book is a distinct entity:

{
  "@context": "https://schema.org",
  "@type": "Book",
  "name": "The Entity Economy",
  "author": {
    "@type": "Person",
    "name": "James Martinez"
  },
  "isbn": "978-1234567890",
  "publisher": {
    "@type": "Organization",
    "name": "Tech Press"
  },
  "datePublished": "2024"
}

ISBN numbers are your book’s entity ID. Use them everywhere you mention the book.

Entity Relationships

Entities gain meaning through connections to other entities. AI systems understand you by understanding your entity graph.

A diagram representing the entity graph relationships with an example.

Organizational Relationships

  • Technology stack – “Flux Robotics uses NVIDIA Jetson processors, ROS 2, and Velodyne lidar”
  • Partnerships – “Flux Robotics partnered with Amazon Robotics and DHL”
  • Customers – “Flux Robotics customers include Walmart and Target”
  • Investors – “Flux Robotics received funding from Sequoia Capital”
  • Competitors – “Flux Robotics competes with Boston Dynamics in warehouse automation”

Personal Relationships

  • Employment – “Dr. Chen previously worked at Google Brain and DeepMind”
  • Education – “Dr. Chen earned her PhD from Stanford under Geoffrey Hinton”
  • Collaborations – “Dr. Chen co-authored papers with Yann LeCun”
  • Affiliations – “Dr. Chen serves on the advisory board of OpenAI”

Product Relationships

  • Compatibility – “FluxBot X200 integrates with SAP and Oracle WMS”
  • Alternatives – “FluxBot X200 competes with Locus Robotics and Fetch Robotics”
  • Components – “FluxBot X200 uses Sick lidar sensors and Cognex cameras”

The more authoritative entities you’re connected to, the more authority you inherit.

Confidence Score

AI systems don’t believe claims just because you make them. They assign confidence scores based on corroboration across sources.

Low confidence is when only your website claims you’re “the industry leader” and high confidence is when your website, three industry reports, and five news articles all state it

To build confidence, first you’ll need to maintain absolute consistency across every property.

Pick canonical facts and replicate them exactly, like “Founded March 15, 2019” (everywhere, same format), “Maria Santos and Jin Park” (always both names, same order), “Austin, Texas” (not “Austin, TX” in one place and “Austin, Texas” in another).

Inconsistency destroys confidence. If sources disagree about your founding date, AI systems may drop the fact entirely or flag it as disputed.

Second, not all sources are equal. AI systems weight sources by authority.

Ziff Davis conducted a study that analyzed datasets like Common Crawl and OpenWebText. The study proved that “curated” training data disproportionately favors high-DA (Domain Authority) sites, effectively hard-coding a bias for authority into the models before they even search the web.

A claim on your website (low authority) needs corroboration from high authority sources to achieve high confidence.

How to Measure Entity Recognition

The test is simple – do AI systems recognize and cite you?

Google Knowledge Panel

Search your entity name. Do you get a Knowledge Panel with correct information?

If yes, you’re recognized. If no, you have work to do.

AI Search Citations

Ask questions where your entity should appear:

  • “Best [your category] companies”
  • “Who are the experts in [your field]”
  • “Top [your product type] products”

Test across Perplexity, ChatGPT Search, Google AI Overviews, and Bing Chat.

Are you mentioned? How are you described?

Entity Recognition APIs

Run your content through:

Do they recognize your entity? What type do they classify you as? What confidence score?

If these systems don’t recognize you or misclassify you, your entity signals need strengthening.

Final Note

You can’t hack entity recognition. You can only do the work, which is clear identity declaration, ruthless consistency, authoritative relationships, and earned corroboration.

The good news – this is harder for competitors than keyword optimization ever was. It requires organizational discipline, long-term consistency, and genuine authority. It can’t be automated or gamed.

Your job is to make your entity’s place in reality unambiguous.


Discover more from SEO Automata by Preslav Atanasov

Subscribe now to keep reading and get access to the full archive.

Continue reading