SEO and the Information Gain Patent

SEO has always had a bit of a split personality. On one side, you’ve got the spreadsheets, ranking factors, backlinks, and audits. On the other, you’ve got this almost philosophical advice: “Add value. Be original. Say something new.”

For years, that second part sounded nice, but vague. Then Google filed the “Contextual estimation of link information gain” patent, and suddenly that advice got better.

This patent tries to measure originality and if you’re creating content for search, this changes the game.

What is the Information Gain Patent?

At its core, the “Contextual estimation of link information gain” patent describes a system that asks one deceptively simple question:

“If a user has already consumed some documents on a topic, how much new information will they gain from the next one?”

Information gain is novelty in context. Not relevance, authority, or freshness. According to the patent, Google tracks:

What documents a user has already seen (or heard, via Assistant).
What semantic information those documents contain.
How much overlap exists between previously consumed documents and new ones.
And then assigns an information gain score to each new candidate document.

The more new information a document adds, the higher its score.

Why Information Gain Matters for SEO?

Classic SEO ranking models assume a mostly static world:

User searches a query
Google ranks documents
User clicks one
Session ends

The Information Gain model assumes something much more realistic:

Users consume multiple results
They refine queries
They return to SERPs
They ask follow-ups
They get bored when results repeat themselves

Google wants to optimize not first clicks, but entire information journeys.

All that means – ranking result #1 isn’t enough, being different from result #1 matters, and redundancy becomes a liability.

How Information Gain Is Calculated

The patent outlines a machine-learning approach that works roughly like this:

Documents are converted into semantic representations
- Vector embeddings
- Bag-of-words or learned representations (think Word2Vec-style models)
Previously viewed documents are grouped
- A “first set” of documents already consumed by the user
New candidate documents are evaluated
- A “second set” of documents not yet viewed
A model compares semantic overlap
- If document B mostly repeats document A – low information gain
- If document B introduces new entities, causes, steps, perspectives – higher information gain
Documents are re-ranked dynamically
- Rankings can change after a user clicks something
- Documents can be demoted, promoted, or even excluded if they add nothing new

In other words Google is ranking what the user hasn’t learned yet. And that’s beautiful.

A diagram depicting the Information gain patent flow.

Focus on Assistants

One of the most revealing parts of the patent is how heavily it focuses on automated assistants and text-to-speech.

You might ask why? Because spoken answers are linear, you can’t skim audio. And Google is hyper-motivated to avoid repeating information, strip redundancy, and deliver maximum insight per second.

The patent explicitly describes:

Removing already-heard information from later answers.
Extracting only novel sections from new documents.
Shortening dialog sessions by prioritizing high information gain.

Information gain is operational and anything that works for Assistant eventually bleeds into Search.

The SEO Takeaways

Let’s translate patent language into action.

1. “Comprehensive” Is No Longer Enough

If your article is just a cleaner version of the top 5 results, a remix of common steps, or the same headings, same flow, same examples – you might still rank, but you’re vulnerable.

Because the moment Google understands that your page adds no new information, your information gain score drops. Depth is differentiation.

2. Novelty Can Be Structural, Not Just Factual

You don’t always need brand-new facts. Information gain can come from:

A new framework
A better mental model
A clearer causal explanation
A different sequencing of ideas
Combining concepts that are usually explained separately

If everyone answers “what”, answer “why” or “when this fails”.

3. Follow-Up Content Is a Ranking Opportunity

Because rankings can be recalculated after a click, think in sequences:

Intro article – overview
Follow-up article – edge cases
Advanced article – tradeoffs and limitations

If your content is positioned as “Here’s what you haven’t seen yet”, you align perfectly with information gain logic.

4. Redundancy Is Now an SEO Risk

Historically, repeating key phrases and steps felt “safe.”

In an information gain world repetition without expansion comes with a low marginal value.

Similar subheadings across articles lead to semantic overlap. Last but not least, copycat content hides a big demotion risk.

An infographic of the information gain SEO takeaways.

Quiet Shift Toward Meaning-Based Search

Zooming out, this patent fits into a much bigger world of semantic search, passage ranking, helpful content systems, conversational search, and multi-turn queries.

Google is currently asking “Is this page useful right now, given what the user already knows?”.

This change rewards original thinking, clear explanations, honest expertise, and real help for user progress.

Final Thought

The Information Gain patent quietly reframes SEO as something closer to teaching. A good teacher doesn’t repeat the same lesson. Instead, they build on what you already understand.

Information gain is all about moving the user forward and that’s the kind of optimization worth doing.