Google has updated its help document on Googlebot to determine that Googlebot will crawl up to the first 15MB of the page and then stop. So if you want to ensure that Google ranks your page properly, make sure Googlebot can crawl and index that part of the page within the first 15MB.
This left some in the SEO community wondering if Googlebot would totally ignore text that fell below images at the cutoff in HTML files.
“It’s specific to the HTML file itself like it’s written,” John Mueller, Google Search Advocate, clarified via Twitter.
“Embedded resources/content pulled in with IMG tags is not a part of the HTML file.”
What This Means For SEO
Important content must now be included near the top of web pages to ensure Googlebot weighs it. It means code must be structured to put the SEO-relevant information with the first 15 MB in an HTML or supported text-based file.
It also means images and videos should be compressed, not be encoded directly into the HTML, whenever possible.
SEO best practices recommend keeping HTML pages to 100 KB or less, so this change will not affect many sites. Page size can be checked with various tools, including Google Page Speed Insights. In theory, it may sound disturbing that you could potentially have content on a page that doesn’t get utilized for indexing. In practice, however, 15MB is a much large amount of HTML.
As Google states, resources such as images and videos are fetched individually. Based on Google’s wording, it sounds like this 15MB cutoff applies to HTML only. It would be challenging to go over that limit with HTML unless you publish entire books’ worth of text on a single page.
Should you have pages exceeding 15MB of HTML, you likely have underlying problems that need to be fixed.
Google’s John Mueller clarified in a tweet: “This is not a change, it’s just not previously been officially documented…” and this article has been updated to reflect that