Massive Yandex Code Leak Reveals Russian Search Engine Ranking Factors

Nearly 45GB of source code files allegedly stolen by a former employee have exposed the foundations of many of the apps and services of Russian tech giant Yandex. It also revealed key ranking factors for the Yandex search engine that are almost never publicly disclosed.

Yandex git sources “was published as a torrent file on January 25 and shows files allegedly taken in July 2022 and dating back to February 2022. Software engineer Arseniy Shestakov claims that he checked with current and former Yandex employees that some of the archives “probably contain up-to-date source code for the company’s services.”Yandex told security blog BleepingComputer that “Yandex was not hacked”and that the leak came from a former employee. Yandex stated that it “does not see a threat to user data or platform performance.”

Specifically, the files date back to February 2022, when Russia launched a full-scale invasion of Ukraine. The former Yandex executive told BleepingComputer that the leak was “political”and noted that the former employee was not trying to sell the code to Yandex’s competitors. The anti-spam code has not been leaked either.

While it’s unclear whether the disclosure of Yandex’s source code has security or structural implications, the leak of 1,922 ranking factors in Yandex’s search algorithm certainly made a lot of noise. SEO consultant Martin McDonald described the Twitter hack as “probably the most interesting thing to happen in SEO in years”(as noted by Search Engine Land). In a thread detailing some of the most notable factors, researcher Alex Buraks suggests that “there is a lot of useful information for Google SEO as well.”

Yandex, the fourth largest search engine, allegedly hires several former Google employees. Yandex tracks many of the Google ranking factors identified in its code and competes aggressively with Google. The Russian division of Google recently filed for bankruptcy after losing its bank accounts and payment services. Burax notes that the first factor on Yandex’s list of ranking factors is “PAGE_RANK”, which appears to be related to the underlying algorithm created by the co-founders of Google.

As Burax told in detail (in two topics), the Yandex engine prefers pages that:

  • not too old
  • Have a lot of organic traffic (unique visitors) and less search traffic.
  • Their URL should contain fewer numbers and slashes.
  • Have optimized code, not “hard pessimization”with “PR = 0”.
  • Hosted on secure servers
  • Be Wikipedia pages or links from Wikipedia
  • Hosted or linked to higher-level pages in the domain
  • Have keywords in your URL (up to three)

You can search and click on all factors in Rob Osby’s compiled search tool. You may have noticed that almost 1000 ranking factors have the “TG_DEPRECATED”tag, and more than 200 are listed as “TG_UNUSED”. Since the code is dated February 2022 and received in July 2022, Yandex search has certainly changed since then. But the leak provides a rare glimpse into how search rankings are compiled on a site that serves one of the world’s largest countries.

Previously, the Yandex search engine code disappeared in 2015 when a former employee tried to sell it on the black market for $28,000 to fund his own startup. The surprisingly low figure for Yandex’s main product core code indicated that he was unaware of its real value. This employee was given a two-year suspended prison sentence and the code was never made public.

CDN CTB