Back in 2016, I surveyed a few thousand people to see how well they could determine which of two pages would rank better for a search query.
The rules were simple: participants could take as long as needed and use any tools to assess the pages, but they couldn’t check the actual search results.
The punchline was that humans weren’t good at this task. The headline on my write-up was “most SEOs are no better than a coin flip at predicting which page will rank better”.
I remember using TensorFlow for machine learning (ML) at the time. Looking back, everything was basic compared to today's advanced artificial intelligence (AI) models. Despite that, it beat humans (and coins!) by getting about two-thirds of the questions right.
You can watch me discuss this in full via the video below:
We hear more about AI than ML these days, so we decided to re-run the process to see the current state of the art. We had to re-run everything because the search results and ranking algorithms have changed significantly in the past decade, so we started with a new set of search queries and results. You can still take the quiz here.
The first punchline was that humans haven’t improved:
As before, more experience predicts greater success:
I’ve overlaid a linear regression on that chart, but the data seems to follow an S-curve. This makes sense because there is probably a steeper early learning curve and limits to the value of more experience, but we lack enough data to be sure.
Regardless of the value of experience, we found the value of AI. Gemini handily beat the humans (and, although it’s not a fair comparison, beat the old TensorFlow model) scoring 75%:
This is just the beginning of AI’s capabilities. These results are from last year, the models are improving, and will likely continue to do so in the near future. If we wanted to see how good computers can get at this task, we’d experiment with tuning a dedicated model. Tuning these models refines them to perform better on specific tasks based on specific patterns in the data. If we showed the model a few hundred queries or ranking pairs, we’d expect it to get better at the quiz.
What does this mean for humans?
I thought it was appropriate to ask Claude, Anthropic’s large language model (LLM):
“It means we must adapt and evolve. SEO professionals aren't becoming obsolete, but their roles will transform. Instead of manual ranking predictions, they'll focus on strategic insights, creative content development, and leveraging AI as a powerful analysis tool. The most successful practitioners will be those who understand how to effectively collaborate with AI, using these advanced models to augment their expertise rather than replace their strategic thinking and nuanced understanding of digital landscapes.”
I don’t have a lot to add to that. Obviously, that last part about powerful analysis tools is fuelling a lot of my thinking as I guide SearchPilot to be increasingly valuable and especially positioning our work to be valuable in an age of AI-powered discovery. AI might have beaten humans at SEO this time, but neither AI nor SEO stay still for long. It’s up to us to stay ahead of the game — one where the rules and players are constantly changing.
Want to receive our latest testing insights? Sign up to our case study mailing list and please feel free to get in touch to learn more about SearchPilot.