FOLLOW

The Ultimate Guide to Optimizing Your Robots.txt

RebelMouse ready to assist to optimize robots.txt file

A robots.txt file is a critical page on your website that provides a set of instructions to web crawlers and web robots on which pages they can or cannot access.

It is used to help control the indexing behavior of search engine crawlers, so that your website is not overwhelmed with requests and certain pages are not indexed by crawlers. If you want to keep a specific page off of Google Search, you should use a noindex directive or protect your page with a password. But if you want to protect lots of pages, robots.txt works well.

It’s important that you fully understand the power of robots.txt because it can severely damage your site’s SEO if it is written improperly. On the flip side, it has plenty of benefits: improve website performance by blocking crawlers from parts of your website they shouldn’t access which reduces traffic to your servers, improve your website’s security by protecting the most sensitive information from being accessed by unauthorized users, and improve the search indexing process by guiding crawlers to your most relevant pages.

Components of Robots.txt

The most important lines of a robots.txt file can be broken down into four buckets:

User-agent: This specifies which web crawler or user agent the rules apply to. A wildcard character (*) signifies that the rules apply to all crawlers. An example of calling out specific user agents like Google-Extended and GPTBot can be found in Narcity’s robots.txt.
Disallow: This directive simply tells crawlers which pages or directories they are not allowed to crawl. One aspect of using disallow is to prevent particularly sensitive information from being indexed. Google says it is a best practice to block pages you don’t want indexed with disallow, and this can also reduce crawl budget by preventing crawlers from wasting time on such pages. Oftentimes you’ll block certain directories of files, for example anything with /core/* is blocked in our robots.txt.
Allow: There may be instances when you want to make exceptions to the disallow rule. This is when you use the allow directive. These specific pages or directories are fine to be crawled despite a larger disallow rule. For example, Raw Story’s robots.txt allows for /r/kappa/api/ to be indexed as it contains a custom-built sitemap, despite otherwise disallowing the folder /r/.
Sitemap: This directive provides the location of your XML sitemap file, which lists all of the URLs on your website that you want to be indexed. A good crawler will find these on its own, but a sitemap speeds up the process. In some cases, websites have multiple sitemaps and this is where they belong. An example of listing multiple sitemaps can be found in Panorama’s robots.txt. Please check that any sitemap is working properly with elements in it when you're including it in robots.txt.

With the four components above, you can configure your robots.txt in a way that makes it clear which pages you want crawlers to index and which pages you want robots to stay away from. You can hide internal resources or non-public pages and block any duplicate content from confusing crawlers. Through the process, you are also optimizing your crawl budget.

One important note: While robots.txt provides a set of instructions, it doesn’t enforce them. Search engine crawlers and site health crawlers like Semrush are among the good bots that follow the rules, but spam bots are likely to ignore them. For that reason, be especially careful with any sensitive information that you are exposing on your website.

Common Issues

Search Engine Journal has a great list of the most common issues with robots.txt files that you should definitely give a read. Some of these include:

noindex: If you have this in your robots.txt, your file may be very outdated, as Google began ignoring noindex rules in robots.txt as of 2019. It's best to remove noindex references.
crawl-delay: This is supported by Bing but not Google, and crawl settings were removed entirely from Google Search Console at the end of 2023. So it doesn't have a great usefulness if it's in your robots.txt.
missing sitemap: At least one sitemap should be in your robots.txt file.
incorrect use of wildcards: The asterisk (*) represents any instances of a valid character and the dollar sign ($) denotes the final part of a URL, such as a filetype extension. Use these carefully so you don't block entire parts of your site accidentally.

Update Your Robots.txt

RebelMouse users can easily make changes to their robots.txt by launching Layout & Design Tool in your Posts Dashboard menu. Navigate to Global Settings and you’ll find a line for robots.txt. After clicking it, you can make updates right there.

Validate Your Robots.txt Setup

Google Search Console has added the ability to check that your robots.txt is set up properly. To do this, simply navigate to Settings at the bottom of the left-side navigation menu. Under crawling, you should see robots.txt: “Valid.” To gain more insights, you can open up the robots.txt report (right side of the screen), which tells you the last time it was checked, the file path, the fetch status (fetched successfully or not fetched for reasons such as not found), and the size of the file. Any issues will be noted. If you need to request a recrawl, you can do so on this page.

valid robots.txt file in Google Search Console This is what you should see in Google Search Console for a valid robots.txt file.

If the robots.txt is not valid, you will see an error message and you can troubleshoot from there.

Request a Review

If you’d like one of our strategists to take a look at your robots.txt and make suggestions for optimizing it, simply get in touch and we can set that up with you.

From Your Site Articles

Related Articles Around the Web

robots.txt seo seo best practices seo strategy

Where AI-
Websites Are Built

The Fastest Sites in the World Run on RebelMouse

new!

RebelMouse Performance Monitoring

Real-Time Core Web Vitals

Our Core Features

Our platform is a complete digital publishing toolbox that's built for modern-day content creators, and includes game-changing features such as our:

Why RebelMouse?

Unprecedented Scale

RebelMouse sites reach more than 120M people a month, with an always-modern solution that combines cutting-edge technology with decades of media savvy. And due to our massive scale, 1 in 3 Americans have visited a website powered by RebelMouse.

120M+ Users

550M+ Pageviews

17+ Avg. Minutes per User

6+ Avg. Pages per User

Today's Top Websites Use RebelMouse

Thanks to the tremendous scale of our network, we are able to analyze a wealth of traffic data that informs our strategies and allows us to be a true strategic partner instead of just a vendor.

upworthy

indy100

Vault12

No Film School

RawStory

Responsible Statecraft

Pride

Rolling Stone Quebec

PremierGuitar

Penske

The Fulcrum

GZERO

Okayafrica

Brit+Co

Paper Magazine

PowerToFly

Narcity

CommonDreams

AllBusiness

What Clients Say

We’re here to help you weigh and understand every tech and strategic decision that affects your digital presence. Spend less time managing everything yourself, and more time focused on creating the quality content your users deserve.

Kelley Beaucar Vlahos

From start to finish, RebelMouse blasted away our expectations!

Kelley Beaucar Vlahos

Editorial Director Responsible Statecraft

Ryan Koo

RebelMouse brings the strengths of a large publisher (large datasets, expertise, lessons learned from trial and error, close relationships with search and ad platforms) to smaller publishers. During the build and launch, the team went above and beyond to ensure we were happy with the new site, and they flawlessly migrated over 20,000 articles. I can't recommend them highly enough.

Ryan Koo

CEO No Film School

Terrence O'Hanlon

I love the control panel. Awesome work. I have to say, what an amazing creative team RebelMouse is. I am so grateful!

Terrence O'Hanlon

Founder and Chairman Reliabilityweb

Brendan Farley

It’s like we’ve been riding horses to commute all our lives and someone just showed us a Tesla.

Brendan Farley

Vice President, Product Management Strada Education Network

Ryan Bujeker

Thanks to RebelMouse, United doesn’t have to sacrifice great content for speed. You can have both!

Ryan Bujeker

Director, Social Media Strategy & Digital Engagement United Airlines

Jill Braff

We're in a much better position to make it through the current crisis in part due to RebelMouse and their platform. We've not only saved on overhead costs, we've more easily optimized our traffic with their smart measuring and SEO tools, as well as their social integrations.

Jill Braff

President/Chief Operating Officer Brit + Co

Tim Grieve

THANK YOU for all the good, fast work you put into making our public rollout such a big success. I know we threw a lot at you very quickly, but you were calm under pressure and pulled it all off perfectly. Very glad to be working with you all.

Tim Grieve

Executive Editor Protocol

Andrey Lipattsev

So inspired and impressed by these results from RebelMouse. Check them out and ask Andrea Breanna how they got there! Just goes to show that spending millions doesn't necessarily get you to the right destination. It's spending no more than you need on the right technology with the right platform that does! 🌐💙🏎️

Andrey Lipattsev

Partner Development Manager Google

Nick Smith

It is not often I jump out of my chair and scream in a positive way, but I just did! This is the happiest I have seen my Managing Editor in 2021! Thank you.

Nick Smith

CEO Investing News Network

Emma Schofield

Working with RebelMouse is an actual treat. From the CEO down, every single member of the team has partnered with us to meet our needs. Everyone willingly (and with endless patience!) shares their expertise and insights to ensure we optimize search and create a great audience experience. As we have implemented the various search tactics, positive results are almost instantaneous. Training our content creators has been simple. The system is incredibly user friendly. As a co-worker commented, "RebelMouse doesn't just sell you something, they partner with you to deliver exactly what you want." I couldn't agree more!

Emma Schofield

Director of Change Management Investing News Network

Douglas Fischer

We continue to be huge fans of RebelMouse and envision a very long-term relationship with the platform. The service has been impeccable.

Douglas Fischer

Executive Director Environmental Health News

Susan Hassler

I am a HUGE RebelMouse fan, and basically tell everyone I meet who's looking to overhaul their website to get in touch right away.

Susan Hassler

Editor Emeritus IEEE Spectrum

Terrence O'Hanlon

Every member of the RebelMouse team, but especially our project team, are superstars! Our site launch on the platform met all our expectations and we love the technology. We are positive about the future of our site and look forward to improving our approaches and processes with RebelMouse's guidance!

Terrence O'Hanlon

Founder and Chairman Reliabilityweb

Chuck Lapointe

RebelMouse's Layout & Design Tool is one of the most comprehensive and flexible tools I have ever used of any CMS. Their product team is also extremely helpful and constantly supports to make sure your properties are up to date and use the best industry standards, especially in terms of page speed.

Chuck Lapointe

CEO Narcity Media

David Nguyen

The best thing about RebelMouse is how much they care about their customers. When working with them, I always get a sense that they go above and beyond for their customers. When working with the RebelMouse team, I very much consider them to be a partner, rather than just a vendor. Upon migrating to their technology, we have seen tremendous improvements in many key areas, particularly technical SEO. They have out-of-the-box functionality that is tremendously helpful to customers. At the same time, their system is flexible enough to allow us to adapt it to any new business needs that we have.

David Nguyen

Senior Director of Digital Marketing Investing News Network

Luke Viertel

RebelMouse has shepherded us through the modernization phase, and has been keen, practical, and nimble throughout the entire journey. They have helped modernize our website through an infinite scroll setup that allows us more advertising real estate while creating more time on site to capture reader engagement. Plus, the RebelMouse team has helped educate our own team on the various aspects of programmatic advertising and worked to best implement them. Premier Guitar joining forces with RebelMouse in 2020 was one of the best choices we could have made!

Luke Viertel

Digital Strategist Premier Guitar

Matt Roberts

RebelMouse has stepped up our content management process by providing a user-friendly interface, robust customization options, and seamless collaboration tools. The platform's analytics and reporting capabilities have empowered us to make data-driven decisions, while their exceptional customer support ensures any issues are promptly addressed in a professional, respectful, and friendly way!

Matt Roberts

Director of Growth & Development Premier Guitar

Case Studies

A Team (and an Agency) Built Like No Other

RebelMouse employs a unique, diverse, and decentralized team that consists of 70+ digital traffic experts across more than 25 different countries. We have no central office, and we cover every time zone to ensure that we’re able to deliver amazing results and enterprise-grade support around the clock.

Our team is well-versed in all things product, content, traffic, and revenue, and we strategically deploy ourselves to help with each element across all of our clients. We thrive on solving the complex.