Things The Google Leak Taught Me As a Content Marketer

There’s been big news in the SEO world recently. On May 5, there was a massive leak of API documentation from inside Google’s Search division. Historically, Google has always kept its cards close to its chest regarding the search algorithm – but this leak, which consists of 14,014 lines of code from Google Search Internal Engineering, gives us rare insight.

The leak has been confirmed to be legitimate and includes information from as recently as March 2024.  There’s no mention of how each of these elements is weighted in the algorithm, and some elements may be experimental or not in use. While it isn’t a complete blueprint of the Google Ranking Algorithm, and there’s a lot of deduction involved, there’s still much to be gleaned from it.

In this article, I’ll discuss some of the leak’s biggest revelations and some of the insights I’ve gained as a content marketer, including:

  • Increasing time on page
  • Distinguishing your website as a business site
  • Putting the most important content first
  • Regularly updating pages and articles

Google Lies – DA and Click Data Are Ranking Factors

The first conclusion most people have drawn from this leak is one we’ve all suspected for a while: Google representatives do more than just omit information regarding ranking factors so SEOs can’t hack the system. They actively lie.

Google representatives have repeatedly stated that there’s no such thing as ‘Domain Authority’, a mythic ranking metric developed by Moz that shows the relevance of your site with respect to your industry or a specific topic. For years, Google has insisted there is no metric to measure a site’s authority. The leaked document says otherwise: 

Google has also repeatedly denied that clicks have anything to do with rankings – not the number, origin, or quality of the clicks. This statement has been met with disbelief for years, and lo and behold:

As you can see, Google not only refers to clicks but good clicks, long clicks, and bad clicks. We’ll get into what this means for you and your content strategy later. Right now, it just goes to show that just because Google denies something up and down doesn’t mean it’s not true. Rand Fishkin and Mike King, along with other prominent SEO experts, have been speculating about clicks and DA as ranking factors for years while Google essentially painted them as conspiracy theorists.

My takeaway: ‘Good clicks’ likely refers to users who spend a decent amount of time on your website after clicking on it rather than bouncing. This reiterates the need to make your pages as engaging as possible to keep people on them for longer. Consider incorporating video or interactive content, starting with your most high-value pages.

Germaine’s takeaway: While Google has actively denied and rejected much of this, I think they have historically been concerned that these particular elements are too easy to game, so they’ve played it down. It’s a fairly open secret that DA and click data are both important; link building as a practice wouldn’t otherwise exist. It’s nice to see some confirmation of the popular theory.

‘Small Personal Websites’ Are Flagged

The leak shows that Google has a flag for ‘small personal sites’. While the leak doesn’t indicate whether this is meant to rank small personal websites higher or lower, I’ve concluded that it’s lower, and so have many others. Ever since Google’s March 2023 Core Algorithm Update, there’s been an outcry from small businesses whose websites have tanked in the search ranking. We’ve seen this ourselves on some websites we own for SEO experimentation.

My takeaway: If you own a website for a small business, make sure you indicate that to Google as much as possible to distinguish yourself from a small personal site. Ensure your Google My Business page is up to date and that you have all your contact info and address on your website. This will help Google recognise you as a business.

Germaine’s takeaway: I can quite easily see the other side of the argument, especially with the rise of social media and other channels that allow people to disseminate information quickly. The argument is that small websites have less to lose, have less authority, and a well funded group could very quickly put together tens if not hundreds of small websites around a topic. Google is ultimately trying to give us the most accurate answer, and they don’t want to take any risks doing this, so they are attempting to ignore ‘small websites’.

Pages Are Only Allocated a Certain Number of Crawling Tokens

Google counts the number of tokens and the ratio of total words in the body to the number of unique tokens. The leak shows that there’s a maximum number of tokens per page.

My takeaway: Frontload your content with the most important information and keywords first. Don’t wait until the end of your article or page to answer the search query. The leak indicates Google doesn’t crawl that far.

Germaine’s takeaway: This is an interesting one because everyone, even Google, certainly does not have unlimited resources. We forget this, but crawling and analysing the whole internet would take a lot of effort (and money), so it makes sense. This limited number of tokens does go against the ‘skyscraper’ approach to content or long-form content altogether, which is interesting – but as Juliette says, frontloading content might be the way to go while still maintaining long-form content.

Content Freshness Really Matters

The leak indicates that content that isn’t frequently updated has the lowest storage priority for Google. Google’s index is stratified into tiers where the most important, regularly updated, and accessed content is stored in flash memory. Less important content is stored on solid-state drives, and irregularly updated content is stored on standard hard drives. 

The leak also shows that Google places huge value on dates, with three separate modules it uses to identify them. Google looks at dates in the byline (bylineDate), URL (syntacticDate) and on-page content (semanticDate).

My takeaway: Regularly review your existing content. Update it with new information, revise outdated sections, and ensure that all details are current. Lastly, make sure your website includes both the ‘published’ and ‘last updated’ date in the byline of articles!

Germaine’s takeaway: Like the previous finding, this has a lot to do with resources. I can see a not too distant future where websites manipulate dates to play with the system, and Google will likely then finetune their approach. But this is cat and mouse game to an extent, and it will likely never stop.

Final Thoughts

The Google Search algorithm leak confirms much of what SEO experts have suspected for a long time: Domain Authority is real, clicks affect ranking, and regularly updating content is important. These are things I, and many other people in the industry, already assumed to be true. 

There are a few insights I found particularly interesting that will impact my content strategy going forward: 

  • I’m going to dedicate more time to updating all content and ensuring information is always as current as possible.
  • I’m going to put more effort into frontloading my content so that crawlers don’t miss important information, context or keywords.
  • To increase engagement and Time on Page, I’m going to prioritise more video content.

As always, the most important thing is to continue regularly creating helpful, high-quality content that serves users’ needs and answers your target search query as well as possible.

If you want to do a deep dive on the Google algorithm leak yourself, there are two articles you have to read. ‘Secrets from the Algorithm: Google Search’s Internal Engineering Documentation Has Leaked’ by Mike King and ‘An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me’ by Rand Fishkin. Fishkin originally published the leak. He and King have been working together to decipher tangible insights from it. Both articles are a bit dense and aren’t written for the average layperson, but they give a great overview of the leak.

ABOUT THE AUTHOR
  • Juliette Owen-Jones
  • View all posts by Juliette