SEOs can gain surprising insights into the Google Search algorithm with a peek at Yandex’s code leak.
Google’s Search algorithm is rather like the ‘Everlasting Gobstopper’ from Charlie and the Chocolate Factory: The exact recipe remains a closely guarded secret — and it’s a secret many would love to get their filthy hands on. Yandex’s code leak last month has reverberated throughout both the tech and digital marketing worlds. SEOs may be able to gain valuable insights into the workings of top modern search engines — yes, even Google’s — by taking a peek at Yandex’s codebase leak.
So-called “Fragments” of Yandex’s codebase became available online a couple of weeks ago when proprietary source code leaked onto BreachForums. The Yandex Git Sources contain files dated from February 2022 through to July 2022. And it seems unlikely to have been a hack. Software engineer Arseniy Shestakov claims to have verified the authenticity of the code with a number of Yandex employees. The dates of the repository have fed speculation that a dissatisfied former employee let the information out.
Like Yandex, the Google Search algorithm is not public information. Beautiful in its complexity, it is believed there are in excess of 200 algorithmic ranking factors. While Google has provided general guidelines on what factors are considered in the ranking of search results, the exact details are not in the public domain. SEO experts can only really discover experimentally how the algorithm functions. Even within Google, it may be the case there’s no single individual who knows the full details of each and every ranking factor.
Back in 2014, the German Justice Minister, Heiko Maas, went as far as to challenge Google to reveal the workings of its algorithm, raising concerns over its “exceptional” dominance of the European market: “When a search engine has such an impact on economic development,” Maas told the Financial Times, “this is an issue we have to address.
Google’s response was to continue processing searches, keep calm, and carry on. Unsurprisingly. Google has clearly stated in the past that it won’t reveal its algorithm for two primary reasons:
- The algorithm is a business secret, and revealing that information would give an edge to the competition, undermining the company’s business model.
- It protects the integrity of its search results, ensuring that high-quality, relevant content is presented to users. This means Google can stay ahead of spammers and scammers who might exploit the algorithm for their own gain.
Google handles about 76% of web searches in the US, while it commands a staggering 90% share across much of Europe. With one notable exception: Russia.
The largest technology company in Russia, Yandex was the leading search engine in the country from October to December 2022, with over 62% of searches (and rising). Yandex therefore carries the largest market share of any search engine from Europe and is the fifth most popular search engine worldwide after Google, Bing, Yahoo!, and Baidu.
The Russian search engine was founded in Moscow in 1997 with the name “Yet Another Indexer” or “Yandex” for short. Initially, Yandex focused on providing search results for Russian-language websites, with a strong emphasis on local content. Yandex developed very much as a counterpoint to Google in the West, having been locked in a battle for market share for many years, with each company making efforts to appeal to Russian users through tailored services and features.
While Yandex and Google are not the same, they share similar algorithms in theory, both utilising a complex set of rules and calculations to determine the order in which web pages appear in search results pages. Strikingly, it seems that Yandex employs several ex-Google employees. The exact number of engineers who have worked at both companies is unknown, but a quick glance on LinkedIn reveals several hundred individuals who have been staff at both.
If you’d like to get a better sense of Yandex and its similarity to Google, Arseniy Shestakov drew attention to the files contained in the Yandex data leak:
- Yandex Search Engine and Indexing Bot
- Yandex Maps – Similar to Google Maps and Street View
- Yandex Alice – A virtual assistant like Google Assistant (or Siri/Alexa)
- Yandex Taxi – An Uber-like taxi service
- Yandex Direct – An Ads service like Google Ads / Adwords
- Yandex Mail – The equivalent of GMail
- Yandex Disk – A file storage service like Drive
- Yandex Market
- Yandex Travel
- Yandex 360 – Like Google Workspaces for services on your own domain
- Yandex Cloud
- Yandex Pay – Payment processing like Stripe
- Yandex Metrika – Similar to Google Analytics
Fundamentally, the leaks have revealed that Yandex uses PageRank and multiple other comparable text algorithms in a similar manner. PageRank is the algorithm used by Google to determine the importance and relevance of web pages, assigning a score to each web page based on the number and quality of other pages that link to it. The more high-quality links a page has pointing to it, the higher its PageRank score. Pages with higher PageRank scores are considered more important and are more likely to appear at the top of search engine results pages. Yandex seems to use a modified version of the PageRank algorithm in its search engine algorithm.
Yandex features an equivalent to Google’s RankBrain machine learning algorithm, dubbed MatrixNet. It is also known that Yandex takes advantage of many open source Google technologies that have played a vital role in advancing search, including TensorFlow (a library for building and training machine learning models), BERT (a natural language processing tool), MapReduce (software used to process data in parallel across clusters of computers), and to a lesser extent, Protocol Buffers.
However, Yandex has its own algorithms that prioritise different factors than Google, such as keyword density, site age, and links from .ru domains. Yandex places a greater emphasis on the geographic location and language of the user, as well as the quality and relevance of social media signals, while Google places more weight on the relevance and authority of the content, as well as the quality of the backlinks pointing to the page. Yandex is also simply more geared towards Cyrillic-based languages.
In terms of search behaviour, Yandex users tend to search differently than Google users, with more long-tail queries and a focus on local search. Yandex also comes up with different SERP features, such as its “Zen” content recommendation platform, which can impact how users interact with search results. As detailed by Alex Buraks via Twitter , Yandex’s engine favours pages that:
- Are not too dated
- Have a lot of organic traffic (unique visitors) and less search-driven traffic
- Have fewer numbers and slashes in their URL
- Have optimised code
- Are hosted on reliable servers
- Happen to be Wikipedia pages or linked it
- Are hosted or linked from higher-level pages on a domain
- Have keywords in their URL (up to three)
Russian SEO experts certainly utilise equivalent white hat SEO methods for both search engines. If people in SEO knew the Google algorithm, they could potentially manipulate search results to their advantage by using tactics that would be in line with the specific ranking factors used by the algorithm. For example, they could focus their efforts on optimising their websites for the factors that Google gives the most weight to, such as high-quality content and authoritative backlinks.
However, it’s important to note that Google actively discourages and penalises any attempts to manipulate search rankings through unethical or “black hat” SEO tactics. Knowing the algorithm alone does not guarantee high search rankings, as Google’s algorithm is constantly evolving and takes into account many different factors.
Since the Nepot filter in 2005, Yandex has employed algorithms that are similar to Google’s to combat link manipulation. Based on the backlink ranking factors and descriptions, it’s recommended to follow these best practices for building links for Yandex SEO:
- Build links naturally and at different frequencies
- Use branded anchor texts and commercial keywords
- Avoid buying links from websites with mixed topics.
Looking at the various features and factors of the Yandex leak can provide valuable insights and ideas for testing and improving Google rankings. It can also offer additional data points for SEO crawling, link analysis, and ranking tools.
Having said all this, the advice for good SEO professionals is the same as it has ever been: focus on creating high-quality, relevant content that is valuable to their target audience, build strong backlinks from authoritative sources, and use white hat SEO techniques that are in line with the search engine’s guidelines. With that, SEO pros can improve their chances of ranking highly in search results and driving more traffic to their website over the long term.