A robots.txt file is a critical page on your website that provides a set of instructions to web crawlers and web robots on which pages they can or cannot access.
It is used to help control the indexing behavior of search engine crawlers, so that your website is not overwhelmed with requests and certain pages are not indexed by crawlers. If you want to keep a specific page off of Google Search, you should use a
noindex directive or protect your page with a password. But if you want to protect lots of pages, robots.txt works well.
It’s important that you fully understand the power of robots.txt because it can severely damage your site’s SEO if it is written improperly. On the flip side, it has plenty of benefits:
improve website performance by blocking crawlers from parts of your website they shouldn’t access which reduces traffic to your servers, improve your website’s security by protecting the most sensitive information from being accessed by unauthorized users, and improve the search indexing process by guiding crawlers to your most relevant pages.
Components of Robots.txt
The most important lines of a robots.txt file can be broken down into four buckets:
User-agent: This specifies which web crawler or user agent the rules apply to. A wildcard character (*) signifies that the rules apply to all crawlers. An example of calling out specific user agents like Google-Extended and GPTBot can be found in Narcity’s robots.txt.
Disallow: This directive simply tells crawlers which pages or directories they are not allowed to crawl. One aspect of using disallow is to prevent particularly sensitive information from being indexed. Google says it is a best practice to block pages you don’t want indexed with disallow, and this can also reduce crawl budget by preventing crawlers from wasting time on such pages. Oftentimes you’ll block certain directories of files, for example anything with /core/* is blocked in our robots.txt.
Allow: There may be instances when you want to make exceptions to the disallow rule. This is when you use the allow directive. These specific pages or directories are fine to be crawled despite a larger disallow rule. For example, Raw Story’s robots.txt allows for /r/kappa/api/ to be indexed as it contains a custom-built sitemap, despite otherwise disallowing the folder /r/.
Sitemap: This directive provides the location of your XML sitemap file, which lists all of the URLs on your website that you want to be indexed. A good crawler will find these on its own, but a sitemap speeds up the process. In some cases, websites have multiple sitemaps and this is where they belong. An example of listing multiple sitemaps can be found in Panorama’s robots.txt. Please check that any sitemap is working properly with elements in it when you're including it in robots.txt.
With the four components above, you can configure your robots.txt in a way that makes it clear which pages you want crawlers to index and which pages you want robots to stay away from. You can hide internal resources or non-public pages and block any duplicate content from confusing crawlers. Through the process, you are also optimizing your crawl budget.
One important note: While robots.txt provides a set of instructions, it doesn’t enforce them. Search engine crawlers and site health crawlers like
Semrush are among the good bots that follow the rules, but spam bots are likely to ignore them. For that reason, be especially careful with any sensitive information that you are exposing on your website.
Common Issues
Search Engine Journal has a great list of the most common issues with robots.txt files that you should definitely give a read. Some of these include:
noindex: If you have this in your robots.txt, your file may be very outdated, as Google began ignoring noindex rules in robots.txt as of 2019. It's best to remove noindex references.
crawl-delay: This is supported by Bing but not Google, and crawl settings were removed entirely from Google Search Console at the end of 2023. So it doesn't have a great usefulness if it's in your robots.txt.
missing sitemap: At least one sitemap should be in your robots.txt file.
incorrect use of wildcards: The asterisk (*) represents any instances of a valid character and the dollar sign ($) denotes the final part of a URL, such as a filetype extension. Use these carefully so you don't block entire parts of your site accidentally.
Update Your Robots.txt
RebelMouse users can easily make changes to their robots.txt by launching Layout & Design Tool in your Posts Dashboard menu. Navigate to Global Settings and you’ll find a line for robots.txt. After clicking it, you can make updates right there.
Validate Your Robots.txt Setup
Google Search Console has added the ability to check that your robots.txt is set up properly. To do this, simply navigate to
Settings at the bottom of the left-side navigation menu. Under crawling, you should see robots.txt: “Valid.” To gain more insights, you can open up the robots.txt report (right side of the screen), which tells you the last time it was checked, the file path, the fetch status (fetched successfully or not fetched for reasons such as not found), and the size of the file. Any issues will be noted. If you need to request a recrawl, you can do so on this page.
This is what you should see in Google Search Console for a valid robots.txt file.
If the robots.txt is not valid, you will see an error message and you can troubleshoot from there.
Request a Review
If you’d like one of our strategists to take a look at your robots.txt and make suggestions for optimizing it, simply
get in touch and we can set that up with you.
Our platform is a complete digital publishing toolbox that's built for modern-day content creators, and includes game-changing features such as our:
Why RebelMouse?
Unprecedented Scale
RebelMouse sites reach more than 120M people a month, with an always-modern solution that combines cutting-edge technology with decades of media savvy. And due to our massive scale, 1 in 3 Americans have visited a website powered by RebelMouse.
120M+ Users
550M+ Pageviews
17+ Avg. Minutes per User
6+ Avg. Pages per User
Today's Top Websites Use RebelMouse
Thanks to the tremendous scale of our network, we are able to analyze a wealth of traffic data that informs our strategies and allows us to be a true strategic partner instead of just a vendor.
What Clients Say
We’re here to help you weigh and understand every tech and strategic decision that affects
your digital presence. Spend less time managing everything yourself, and more time focused on
creating the quality content your users deserve.
From start to finish,
RebelMouse blasted away our expectations!
Kelley Beaucar Vlahos
Editorial Director
Responsible Statecraft
RebelMouse
brings the strengths of a large publisher (large datasets, expertise, lessons learned from trial and error, close
relationships with search and ad platforms) to smaller publishers. During the build and launch, the team went above
and beyond to ensure we were happy with the new site, and
they flawlessly migrated over 20,000 articles. I can't recommend them highly enough.
Ryan Koo
CEO
No Film School
I love the control panel. Awesome work. I have to say, what an amazing creative team RebelMouse is. I am so grateful!
Terrence O'Hanlon
Founder and Chairman
Reliabilityweb
It’s like we’ve been riding horses to commute all our lives and someone just showed us a Tesla.
Brendan Farley
Vice President, Product Management
Strada Education Network
Thanks to RebelMouse, United doesn’t have to sacrifice great content for speed. You can have both!
Ryan Bujeker
Director, Social Media Strategy & Digital Engagement
United Airlines
We're in a much better position to make it through the current crisis in part due to RebelMouse and their platform.
We've not only saved on overhead costs, we've more easily optimized our traffic with their smart measuring and SEO
tools, as well as their social integrations.
Jill Braff
President/Chief Operating Officer
Brit + Co
THANK YOU for all the good, fast work you put into making our public rollout such a big success. I know we threw a lot
at you very quickly, but you were calm under pressure and pulled it all off perfectly. Very glad to be working with
you all.
Tim Grieve
Executive Editor
Protocol
So inspired and impressed by these results from RebelMouse. Check them out and ask Andrea Breanna how they got there!
Just goes to show that spending millions doesn't necessarily get you to the right destination. It's spending no more
than you need on the right technology with the right platform that does! 🌐💙🏎️
Andrey Lipattsev
Partner Development Manager
Google
It is not often I jump out of my chair and scream in a positive way, but I just did! This is the happiest I have seen
my Managing Editor in 2021! Thank you.
Nick Smith
CEO
Investing News Network
Working with RebelMouse is an actual treat. From the CEO down, every single member of the team has partnered with us
to meet our needs. Everyone willingly (and with endless patience!) shares their expertise and insights to ensure we
optimize search and create a great audience experience. As we have implemented the various search tactics, positive
results are almost instantaneous. Training our content creators has been simple. The system is incredibly user
friendly. As a co-worker commented, "RebelMouse doesn't just sell you something, they partner with you to deliver
exactly what you want." I couldn't agree more!
Emma Schofield
Director of Change Management
Investing News Network
We continue to be huge fans of RebelMouse and envision a very long-term relationship with the platform. The service
has been impeccable.
Douglas Fischer
Executive Director
Environmental Health News
I am a HUGE RebelMouse fan, and basically tell everyone I meet who's looking to overhaul their website to get in touch
right away.
Susan Hassler
Editor Emeritus
IEEE Spectrum
Every member of the RebelMouse team, but especially our project team, are superstars! Our site launch on the platform
met all our expectations and we love the technology. We are positive about the future of our site and look forward to
improving our approaches and processes with RebelMouse's guidance!
Terrence O'Hanlon
Founder and Chairman
Reliabilityweb
RebelMouse's Layout & Design Tool is one of the most comprehensive and flexible tools I have ever used of any CMS.
Their product team is also extremely helpful and constantly supports to make sure your properties are up to date and
use the best industry standards, especially in terms of page speed.
Chuck Lapointe
CEO
Narcity Media
The best thing about RebelMouse is how much they care about their customers. When working with them, I always get a
sense that they go above and beyond for their customers. When working with the RebelMouse team, I very much consider
them to be a partner, rather than just a vendor. Upon migrating to their technology, we have seen tremendous
improvements in many key areas, particularly technical SEO. They have out-of-the-box functionality that is
tremendously helpful to customers. At the same time, their system is flexible enough to allow us to adapt it to any
new business needs that we have.
David Nguyen
Senior Director of Digital Marketing
Investing News Network
RebelMouse has shepherded us through the modernization phase, and has been keen, practical, and nimble throughout the
entire journey. They have helped modernize our website through an infinite scroll setup that allows us more
advertising real estate while creating more time on site to capture reader engagement. Plus, the RebelMouse team has
helped educate our own team on the various aspects of programmatic advertising and worked to best implement them.
Premier Guitar joining forces with RebelMouse in 2020 was one of the best choices we could have made!
Luke Viertel
Digital Strategist
Premier Guitar
RebelMouse has stepped up our content management process by providing a user-friendly interface, robust customization
options, and seamless collaboration tools. The platform's analytics and reporting capabilities have empowered us to
make data-driven decisions, while their exceptional customer support ensures any issues are promptly addressed in a
professional, respectful, and friendly way!
RebelMouse employs a unique, diverse, and decentralized team that consists of 70+ digital traffic experts across more than 25 different countries. We have no central office, and we cover every time zone to ensure that we’re able to deliver amazing results and enterprise-grade support around the clock.
Our team is well-versed in all things product, content, traffic, and revenue, and we strategically deploy ourselves to help with each element across all of our clients. We thrive on solving the complex.