LLMs do not rank anything. So what are you optimizing for?

A lot of the current AI search conversation starts from the wrong mental model.

It’s too easy to try to treat LLMs like a new kind of search engine. We ask what the ranking factors are. We ask how to reverse engineer them. We ask whether domain authority matters. We look for the AI equivalent of the old SEO playbook.

That was one of the reasons I was so excited to sit down with Britney Muller for this webinar. Britney has spent years close to this technology, including red-team work on BLOOM before ChatGPT changed the mainstream conversation. She has a rare ability to explain LLMs in a way that is technical enough to be accurate, but practical enough for marketers to act on.

And the main point she made was one the industry needs to hear: LLMs do not rank anything.

They do not work like search engines. They do not remember URLs in the way SEOs are used to thinking about them. They do not have a neat internal list of sources ordered by quality. When they answer, they are predicting continuations based on patterns in training data, sometimes combined with fresh information retrieved at runtime.

That changes the optimization question completely.

Key takeaways

LLMs are not ranking engines. They are probabilistic continuation systems, so traditional ranking-factor thinking does not map cleanly onto AI search.
What training data carries forward is not a clean memory of URLs. It uses mentions, concepts, and co-occurrence: which brands are talked about, where, and in what context.
Grounding is a real AI term, but the SEO industry is stretching it in ways that can create confusion and weaken credibility with AI researchers.
Google may be better positioned than the market narrative suggests because it has search infrastructure, spam-fighting experience, data, and integration all the way from the chip to the model to the user interface.
AI search spam is already happening. Many examples called 'hacking ChatGPT' are really old-fashioned search result manipulation being pulled into LLM answers.
To get useful value from AI tools, start with one small task you could hand to an intern. Do not start by trying to build the everything dashboard.
Social, video, Reddit, and first-hand experience matter more in AI discovery because they provide the kind of human signal models cannot create from scratch.

Showing is better than explaining

I started by asking Britney how she explains AI search to bosses, clients, and teams who are still trying to get their heads around the shift.

Her answer was simple: show them.

Run the same prompt several times in ChatGPT and watch how the answer changes. Show them how Google AI Overviews pulls in different sources. Show them how live search results get folded into AI responses. Show them how random and unstable it can feel when compared with the old mental model of a ranked search results page.

That demonstration matters because many people still think LLMs are retrieving facts in the same way a search engine retrieves documents. They are not. Traditional search is information retrieval. A language model is doing something else: predicting likely continuations based on patterns it has learned, and on fresh information retrieved in real time.

When you see that live, the penny drops faster than if you describe it in abstract terms.

The biggest misconception: LLMs rank things

Britney was blunt about the biggest misconception she sees.

LLMs do not rank anything.

There is no internal ranking system in the way SEOs mean that phrase. There is no list of URLs being scored and ordered. There is no domain authority inside the model. During training, the model does not preserve a tidy record of where every piece of text came from.

That is a difficult idea for SEOs because our industry has spent decades thinking in terms of pages, links, domains, authority, and rankings. Those concepts still matter when live retrieval is involved, because the model may query a search engine in the background. But they do not explain what is inside the model itself.

What the model learns from training data is closer to co-occurrence. It learns that certain words, brands, entities, and ideas appear together in certain contexts. It learns patterns. It learns associations.

That is why Britney described this as a mentions game more than a backlinks game.

Mentions are not backlinks, but they matter

One of the most useful ideas from the conversation was that mentions are becoming more important in AI visibility.

In classic SEO, links carried a lot of the signal. They still matter in traditional search and therefore for fan-out queries. But when we are talking about what a model has absorbed from training data, links are not the core unit in the same way. The model is not storing URLs, and remembering source authority like a search engine.

It is learning patterns of language.

If your brand is often mentioned alongside a category, a problem, a use case, or a competitor set, that association can matter. If your brand is absent from the conversations that define your market, that also matters.

This is not an excuse to chase low-quality mentions everywhere. It is a reason to think more broadly about visibility. Are the right people talking about you? Are you present in the right comparisons, communities, guides, reviews, videos, and discussions? Are you associated with the concepts you want to be associated with?

That feels familiar to SEO, PR, brand, and content teams, but the mechanism is different.

Why URLs are still frustrating

I admit: I like URLs.

I like links. I like being able to click through to the source. I like knowing where information came from. One of the most frustrating parts of using LLM tools today is that they often break that connection.

Everyone has had the experience of an AI tool confidently giving a URL that goes nowhere. The model may produce something that looks like a plausible source, but that does not mean it is real. If you then ask why it gave you that link, it may produce another confident answer that is not at all connected to its actual internal process.

Britney made the important point that LLMs do not have introspection in the way users assume. They do not know why they generated one output over another. They can produce a convincing explanation, but that explanation is generated just like any other response rather than necessarily being a real account of what happened.

That is one of the reasons retrieval matters so much. If an AI answer uses live search results or a factual database at runtime, then we have a better chance of connecting the answer back to a real source.

But that brings us to another term that needs care.

What grounding actually means

We spent a good chunk of the conversation on 'grounding', because Britney has strong feelings about how the SEO industry is using that word.

In AI research, grounding has a specific history. It is about connecting a model to real-world information: sensors, factual databases, or some form of ground truth. Britney also pointed out that the term has roots in statistics and cartography, where ground truth meant checking a map against the actual world.

In the SEO world, 'grounding' has started to mean something much looser: connecting AI outputs to website data, search results, or background queries.

Britney's concern is that this stretches the term too far. Websites are not ground truth. They are written by people. They are biased, incomplete, promotional, wrong, outdated, or adversarial in all the ways we already know from the web.

That does not mean retrieval is useless. Far from it. It is vital. But calling a retrieved web page 'ground truth' can make the process sound cleaner than it is.

Her warning was partly about credibility. If SEOs use AI research terms in ways researchers don’t recognize, we risk making the industry sound less serious at the exact moment we need to be more credible.

The web is not the truth

This connects to an older search problem: the most common thing written online is not always the most true thing.

We have always known this. The internet is full of wrong information, copied information, affiliate content, old pages, marketing claims, jokes, and conspiracy theories. Search engines have spent decades trying to sort through it all.

Now AI systems are reading and summarizing that same mess.

That creates new failure modes. Britney mentioned examples like people claiming they had 'hacked ChatGPT' by getting it to repeat odd claims from obscure articles. In many cases, what happened was not really about the core LLM. It was a familiar SEO pattern: find a low-competition query space, publish something that dominates that space, and watch the retrieval layer pick it up.

The AI answer then makes the result feel more authoritative than an ordinary search result might have felt.

That is a serious problem. It is one thing for a weird page to rank in a corner of the search results. It is another for a generated answer to present the same claim as if it were settled fact.

AI search spam is already here

A lot of red-team work has focused on dangerous outputs: can the model help someone build a weapon, commit fraud, or produce harmful instructions?

That work matters. But Britney and I agreed that there has been less public discussion of another problem: adversarial untruths. Can someone get the system to repeat something false? Can they plant information in the web ecosystem in ways that LLMs will then retrieve and summarize?

The answer is clearly yes, at least in some cases.

That is why Google's experience matters. Of all the frontier AI players, Google has spent the longest fighting web spam, dealing with adversarial incentives, and trying to stop bad information from being rewarded. That does not mean Google will get everything right. AI Overviews have already produced plenty of strange and wrong answers.

But if you ask who has the deepest scar tissue from this kind of battle, Google has a strong claim.

Why Google may be stronger than people think

A popular market narrative is that Google is on its heels because OpenAI got to the chat interface first.

There is truth in that. Google has clearly been pushed into a new competitive position. We are seeing UI changes, AI Overviews, AI Mode, and a much more visible race to show that they are still at the front of the market.

But I am not as bearish on Google as some people are.

They have a search engine attached directly to their AI efforts. They have years of spam-fighting experience. They have vast user behavior data. They have infrastructure. They have vertical integration down to the chip layer. That is a rare combination.

Britney pushed back in a useful way. She pointed out that Google is not always transparent about accountability, publisher impact, or how they are managing AI answer quality. She also noted that their messaging can feel like classic PR: lots of announcements, lots of speed, lots of signals designed to show momentum.

Both things can be true. Google can be under real pressure and still have real advantages.

LLMs are not human, even when they sound human

Another thread we pulled on was whether it is useful to talk about models as if they 'want', 'think', or 'understand' things.

I admitted that I am often comfortable using that kind of shorthand. Britney was more cautious. Her view is that those comparisons can understate what human intelligence really is.

That was one of the best parts of the conversation.

Britney made the point that the most advanced neural network we know of is still the human brain. A one-year-old can do things a language model cannot. Animals can navigate the real world without language. Human intelligence includes experience, embodiment, memory, goals, and consciousness. LLMs may sound human, but they do not have those things.

Her framing of LLMs as continuation engines is useful here. Before ChatGPT, when Britney was red-teaming BLOOM, the model was not framed as a Q&A machine. You gave it text and it continued the text. One of the engineers pushed back hard when she asked it a question, because that was not how language models were supposed to work.

ChatGPT changed the interface and the expectations. OpenAI did an incredible job turning continuation into something that feels like conversation. But the underlying nature of the system still matters.

It is fluent. It is useful. It can be superhuman at some tasks. It also does not understand the world like a person does.

Start small with AI or you will waste weeks

I asked Britney what she tells people who feel overwhelmed by the volume of AI tools.

Her answer was one of the most practical parts of the whole session: start with problems, not tools.

Do not begin by asking which tool you should use. Begin by asking what small task you need help with. Better yet, choose a task so small you could write it out and hand it to an intern in another room.

That is the scale to start with.

Britney said one of the most common failure modes she sees in Orange Labs is that people start too big. They want the everything dashboard. They want one button that tells them what to optimize, what to write, what to fix, and what to report. That is where frustration sets in.

The better route is tiny. Pick one repetitive task. Break it into steps. Solve the first step. Then add the next feature.

If the word 'and' appears in the task description, Britney says it may already be too big.

What comes next for organic discovery

Near the end, I asked Britney to break out the crystal ball.

Her concern is that the search interface keeps moving toward AI Mode-style results, where traditional organic listings are less visible by default. Maybe users get an answer first, and the organic results are hidden behind a click or an expansion.

We are already seeing the direction of travel. The interface is becoming more conversational, more answer-led, and less like the classic list of blue links.

That does not mean traditional SEO disappears. If AI systems keep retrieving live search results, traditional search visibility still feeds the answer layer. But the way users encounter information is changing.

That means SEOs need to think beyond ranking a page in a list. We need to think about where our brand, products, people, data, opinions, and experiences show up across the wider web.

The social and video visibility play

Britney made a strong case that social and video are becoming more important for AI visibility.

The reason is simple: AI cannot create real first-hand experience. It can summarize what people have said. It can imitate language. It can combine patterns. But it cannot have a real product experience, a personal opinion, or a hot take rooted in lived experience.

That is part of why Reddit has become so important to Google. It is full of human opinions, messy discussions, and first-hand accounts. The same applies to social video.

Britney pointed out that smart marketers are already thinking about how video gets transcribed and fed into systems. They are putting target terms and key context early in the video because they know the transcript may matter later.

That is a different way of thinking about content. It is about the transcript, the mention, the clip, the discussion, and the way all of that becomes part of the web's language around your brand.

What I want you to take away

If there was one theme running through the whole conversation, it was this: stop forcing AI search into the old SEO mental model.

Some old skills still matter. Technical SEO matters. Content matters. Search visibility matters. Brand matters. Digital PR matters. Testing matters.

But the mechanism is different enough that we need better language and better questions.

LLMs do not rank pages. They predict continuations. When they need fresh information, they retrieve it. When they retrieve it, search still matters. When they rely on training data, mentions and co-occurrence matter. When they generate confident nonsense, source quality and adversarial thinking matter.

So what are we optimizing for?

We are optimizing for being present in the right contexts. We are optimizing for being accurately described across the web. We are optimizing for content that can be retrieved, understood, and trusted. We are optimizing for real human experience that AI systems cannot invent. And we are optimizing for a future where measurement is messier, but not impossible.

Put Search in Control Mode with SearchPilot

This conversation ended with a reminder that the future of organic discovery is not going to be solved by one dashboard or one theory about how LLMs behave. The systems are changing too quickly, and the answers vary too much by site, page type, query class, and market.

That is why testing matters.

SearchPilot helps teams make SEO and GEO testable, so they can move from guessing to knowing. We run controlled experiments across category pages, product detail pages, navigation, templates, and content, then measure what actually changes.

For ecommerce and marketplace teams, that means testing the surfaces that matter most: PLPs, PDPs, product grids, Merchant Center feeds, internal linking, content blocks, and AI discovery traffic. The goal is not to predict the future perfectly. It is to build a system that can learn as the future arrives.

Stop trying to predict the future. Experiment to discover it. If you want tailored test ideas for your top PLPs and PDPs, schedule a demo and we will share a starter list and a clear path from validation to velocity to control.

Share this post