Robots.txt pitfalls: what I learned the hard way

This applies to sites indexed on Google that hope to gain organic traffic. As an indie blogger and SEO enthusiast, I foolishly updated my robots.txt file to prevent indexing of certain unwanted parts of my site, leading to subtle repercussions that I c…


This content originally appeared on DEV Community and was authored by Prahlad Yeri

This applies to sites indexed on Google that hope to gain organic traffic. As an indie blogger and SEO enthusiast, I foolishly updated my robots.txt file to prevent indexing of certain unwanted parts of my site, leading to subtle repercussions that I couldn't have foreseen.

A few days ago, while reading about SEO, I came across the concept of a "crawl budget." Apparently, Google allocates a specific crawl budget to your indexed site, and the more useless content it has to index and store on its servers, the more it affects your site—resulting in delays for new content indexing, favicon updates, and robots.txt crawling.

Being a minimalist and utilitarian, I decided to prevent indexing of the /uploads/ directory on my site since it mostly contained images used in my articles. I thought blocking this "useless content" would free up more crawling budget for my primary content, i.e., articles. So, I added this directory to my site's robots.txt:

# Group 1
User-agent: *
Disallow: /public/
Disallow: /drafts/
Disallow: /theme/
Disallow: /page*
Disallow: /uploads/

Sitemap: https://prahladyeri.github.io/sitemap.xml

The way search engines work means there's typically a 5-7 day gap between updating the robots.txt file and crawlers processing it. After about a week, I noticed that my site's favicon disappeared from SERPs on mobile browsers! Instead, there was a bland (empty) icon in its place. That’s when I realized that my favicons also resided in the /uploads/ directory. After I recently optimized the favicon format by switching from WEBP to PNG, Google was unable to crawl and index the new favicon at all!

Once I realized this mistake, I removed the blocking of /uploads/ from the robots.txt and requested a recrawl. But who knows how long it will take for Google's systems to sync this change and start showing the site's favicon back in SERPs! Two lessons learned:

  1. The robots.txt file is highly sensitive; avoid modifying it if possible.
  2. Applying SEO is like steering an extremely large ship or vessel. You pull a lever now, and the ship only moves after several days!


This content originally appeared on DEV Community and was authored by Prahlad Yeri


Print Share Comment Cite Upload Translate Updates
APA

Prahlad Yeri | Sciencx (2024-10-27T05:08:15+00:00) Robots.txt pitfalls: what I learned the hard way. Retrieved from https://www.scien.cx/2024/10/27/robots-txt-pitfalls-what-i-learned-the-hard-way/

MLA
" » Robots.txt pitfalls: what I learned the hard way." Prahlad Yeri | Sciencx - Sunday October 27, 2024, https://www.scien.cx/2024/10/27/robots-txt-pitfalls-what-i-learned-the-hard-way/
HARVARD
Prahlad Yeri | Sciencx Sunday October 27, 2024 » Robots.txt pitfalls: what I learned the hard way., viewed ,<https://www.scien.cx/2024/10/27/robots-txt-pitfalls-what-i-learned-the-hard-way/>
VANCOUVER
Prahlad Yeri | Sciencx - » Robots.txt pitfalls: what I learned the hard way. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/27/robots-txt-pitfalls-what-i-learned-the-hard-way/
CHICAGO
" » Robots.txt pitfalls: what I learned the hard way." Prahlad Yeri | Sciencx - Accessed . https://www.scien.cx/2024/10/27/robots-txt-pitfalls-what-i-learned-the-hard-way/
IEEE
" » Robots.txt pitfalls: what I learned the hard way." Prahlad Yeri | Sciencx [Online]. Available: https://www.scien.cx/2024/10/27/robots-txt-pitfalls-what-i-learned-the-hard-way/. [Accessed: ]
rf:citation
» Robots.txt pitfalls: what I learned the hard way | Prahlad Yeri | Sciencx | https://www.scien.cx/2024/10/27/robots-txt-pitfalls-what-i-learned-the-hard-way/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.