Setting up a proper XML sitemap, a key component of technical SEO, is crucial for any website. It provides a roadmap to all of your content that you want to be accessible, highlighting to search crawlers which pages are the most important so they’re not missed.
It’s a good practice to set up multiple sitemaps, one for videos, one for news, and so on. You’ll list all of the sitemaps in your robots.txt file so they’re easy to find, like on Panorama.
A properly set up sitemap excludes posts that you don’t want indexed, like drafts, unlisted articles, or private content. When we set up your sitemaps at RebelMouse, we make sure all of this is done correctly so that there are no errors.
There are many aspects of XML sitemaps that are worth exploring. Let’s get started.
Google Search Central defines a sitemap as “a file where you provide information about the pages, videos, and other files on your site, and the relationships between them.” Quite simply, sitemaps tell search engine crawlers: These are the pages and files on our website that are most important to us. They ensure that your most relevant pages are discovered and indexed, potentially improving your site’s ranking in search engine results pages.
A sitemap helps crawlers discover new content and index it quickly, which is beneficial to websites that frequently publish new content or make edits to old content. Sitemaps are not limited to one type of content — they can incorporate photos, videos, and news articles, which helps search engines understand context and relevance of how content fits together.
One of the best use cases for sitemaps is that they help you identify crawling issues. For this reason, they should be reviewed regularly and any problems discovered should be addressed as soon as possible. By fixing outstanding issues, you ensure that your site is properly crawled and indexed.
Sitemaps are especially important for sites that fit any one of the following criteria:
Your site is very large. This makes it more likely that search engine crawlers miss some of your new or recently updated pages without a good sitemap.
You have a large archive of content pages that are isolated or not linked well to each other. You can list them in a sitemap to ensure they’re not overlooked.
Your site is new and has few external links pointing at it. Web crawlers like Googlebot crawl the web by following links from one page to another. Without a sitemap, Google might not discover your pages due to a lack of inbound links.
Your site uses rich media content, appears in Google News, or uses other sitemap-compatible annotations. Google can take additional information from sitemaps into account, where appropriate.
How We Set Up Sitemaps at RebelMouse
Your sitemaps are set up for you out of the box. Currently we support the following sitemaps on RebelMouse:
/sitemap.xml (for published posts)
/sitemap_news.xml (for posts published two days ago)
/sitemap_video.xml (for published posts with video content in the lead media)
/sitemap_pages.xml (for layouts)
/sitemap_sections.xml (for public sections)
/sitemap_custom_pages.xml (for intersection)
/sitemap_section_content/section.xml (for a section — example)
/sitemap_section_content/parent_section/child_section.xml (for a child section — example)
/sitemap_tags.xml (for tags)
Regular sitemaps, news sitemaps, and video sitemaps do not include all posts. The following posts are excluded from sitemaps:
Articles with an “Unpublished” status
Removed posts
Drafts
Community posts
Private posts
A post is private if all sections assigned to it are set up as private
Direct link outs
These are posts with source URLs (i.e., original URLs) that link directly to an external site
This is designated in the Advanced tab of Entry Editor where you’ll find “Source URL” and the ability to link out directly to the original source
Posts specifically excluded from search results
“Exclude from search engines” is an option in Entry Editor in our SEO tab
Suspicious posts
How to Submit a Sitemap to Google
Submitting a sitemap to Google is extremely easy to do in Google Search Console. Simply follow these steps:
Log in to Google Search Console for your website
Navigate to Sitemaps on the left side under “Indexing”
Add a new sitemap by providing the URL at the top and select “Submit”
See if each sitemap is successfully submitted in the “Status” column
In some cases, you may want to remove a sitemap. Go to your sitemaps and select the sitemap that you’d like to remove. Click the three dots in the top right of the screen for that sitemap and select “Remove sitemap.”
Sitemap Best Practices
Here are some best practices to keep in mind as you set up your sitemaps:
XML format: Use the XML file type, which is the preferred format for sitemaps and how we set up yours out of the box, since it’s specifically designed for search engines to understand the makeup of your site.
Make sure your sitemap is UTF-8 encoded: This ensures every character, regardless of language or special symbols, is properly read by crawlers. Yours is set up like this out of the box.
Create different sitemaps for different types of content (e.g., video, image, etc.).
Properly size your sitemaps so that they are each below 50 MB or 50,000 URLs. This is covered for you out of the box on RebelMouse.
Set up a sitemap index if you have more than one sitemap. We do this for you on RebelMouse.
Update your sitemaps regularly. Keep your sitemaps dynamic as you make changes to your website. This helps search engines find new content and crawl properly. We do this for you out of the box on RebelMouse.
Troubleshoot sitemap issues. Google Search Console indicates when there are issues. Make sure to address those to fully optimize your sitemaps. For more on this, check out the Troubleshooting Sitemap Issues section below.
Here are some more tips for setting up sitemaps properly:
Include all important pages: Make sure that every page you want indexed is included in your sitemaps, like main content, category pages, etc.
Include only 200 status code URLs: These are web pages or resources that can be successfully accessed. The “200 OK” code indicates that the request was successful and the correct resource is returned by the server to the user.
Include only canonical URLs: You want original content to be indexed. If something is pointing somewhere else, exclude it.
Don’t include URLs disallowed by your robots.txt: It is just confusing and a waste of space in your sitemaps.
Don’t include URLs with a “no-index” tag: Again, this is just confusing and a waste to include.
Google News Sitemaps
RebelMouse supports Google News sitemaps, which align with Google’s specifications for how articles should be treated in a sitemap.
Google News sitemaps identify key criteria for each article, like genre, access tag, title, publication, and date. They allow Google to quickly determine what is a news article on your site and will publish that content to Google News. According to Google, it’s best practice for Google News sitemaps to only show a site’s posts from the previous 48 hours. Adhering to Google’s guidelines has led to great benefits for many sites powered by RebelMouse.
Sitemaps built for Google News often include valuable metadata, like when the page was last updated, how often the page has changed, and the importance of the page relative to other URLs on your site. They can also include images.
There are two main reasons to build a sitemap specifically for Google News. It will help Google discover news articles from your site faster, and make sure all relevant news articles are crawled and indexed.
Google News sitemaps for websites powered by RebelMouse typically follow this URL structure:
RebelMouse supports and encourages video sitemaps for every web property we power. We support video sitemaps for JW Player and YouTube out of the box.
According to Google, creating a video sitemap is a great way to help Google find and understand the video content on your site, especially videos that were recently added or might be difficult to find through the usual crawling process.
The default video sitemap for sites powered by RebelMouse typically appears at:
Sitemaps created by RebelMouse include images by default, and they have the appropriate markup so that they’re aligned with Google’s image SEO best practices.
You might consider creating a separate image sitemap if images are important to your website and you want to make sure they are indexed properly and not missed. Unlike regular sitemaps, you can list URLs from other domains under <image:loc> in image sitemaps, which allows you to use content delivery networks to host images.
Troubleshooting Sitemap Issues in Google Search Console
When there are problems with your sitemap, you can find those issues listed in Google Search Console. Navigate to “Indexing,” then “Pages,” and then toggle to “All Submitted Pages” at the top of the screen. You’ll see something like the following, including a list of issues for why pages aren’t being indexed.
Here are the issues you’ll see pop up, along with how to handle each, according to Google:
Soft 404: A “not found” message was returned but not a 404. Soft 404s are problematic because they return a 200 (success) status code for users, but still lead to an error page. It’s better to delete a page, redirect to a working page, or optimize the page by adding content. Use the URL Inspection Tool to check the latest status of a URL.
Blocked due to unauthorized request (401): The page was blocked because the user is “unauthorized.” Remove authorization requirements or otherwise verify Googlebot to allow it.
Blocked due to access forbidden (403): The user agent was not granted access. If you still want Google to crawl it, verify Googlebot like above.
Not found (404): The page was not found when requested, delivering a 404 code. Check why the page that is returning a 404 is in your sitemap and remove it from the sitemap to resolve this issue.
URL blocked due to another 4xx issue: This means there’s another error, not one of the issues listed above. Use Google’s URL Inspection Tool to debug it.
Crawled — currently not indexed: The URL has been crawled already but not indexed. It may be indexed later. No need to submit this URL for crawling.
Discovered — currently not indexed: Google came across the page but didn’t crawl it yet. It will crawl it later or you can submit the URL in Google Search Console to request a crawl.
Alternate page with proper canonical tag: This is an alternate form of another page (typically a mobile version of a desktop canonical, or a desktop version of a mobile canonical). See why this is in the sitemap, as you should only include canonical URLs, and remove it from the sitemap.
Duplicate without user-selected canonical: The page is a duplicate but it doesn’t indicate the preferred canonical page. Google has chosen the other page as the canonical, so this one will not appear in Google Search. Google’s URL Inspection Tool allows you to see which page Google considers canonical.
If Google chose the wrong URL as the canonical, you can address this by specifying canonical (rel=”canonical”) for the page.
If you don’t believe it should be a duplicate, make meaningful changes so that the content is not considered a duplicate.
Duplicate, Google chose different canonical than user: The page is marked canonical, but Google thinks another URL is the better canonical and indexed that instead. Inspect the URL to see what Google chose as the canonical.
Page with redirect: This is a non-canonical URL that redirects to another page, so this URL will not be indexed. It should be removed from the sitemap. (Note: A canonical URL with a redirect can be indexed.)
Still have questions? Get in touch and we’ll be happy to have one of our strategists assist you with creating and maintaining your sitemaps.
Our platform is a complete digital publishing toolbox that's built for modern-day content creators, and includes game-changing features such as our:
Why RebelMouse?
Unprecedented Scale
RebelMouse sites reach more than 120M people a month, with an always-modern solution that combines cutting-edge technology with decades of media savvy. And due to our massive scale, 1 in 3 Americans have visited a website powered by RebelMouse.
120M+ Users
550M+ Pageviews
17+ Avg. Minutes per User
6+ Avg. Pages per User
Today's Top Websites Use RebelMouse
Thanks to the tremendous scale of our network, we are able to analyze a wealth of traffic data that informs our strategies and allows us to be a true strategic partner instead of just a vendor.
What Clients Say
We’re here to help you weigh and understand every tech and strategic decision that affects
your digital presence. Spend less time managing everything yourself, and more time focused on
creating the quality content your users deserve.
From start to finish,
RebelMouse blasted away our expectations!
Kelley Beaucar Vlahos
Editorial Director
Responsible Statecraft
RebelMouse
brings the strengths of a large publisher (large datasets, expertise, lessons learned from trial and error, close
relationships with search and ad platforms) to smaller publishers. During the build and launch, the team went above
and beyond to ensure we were happy with the new site, and
they flawlessly migrated over 20,000 articles. I can't recommend them highly enough.
Ryan Koo
CEO
No Film School
I love the control panel. Awesome work. I have to say, what an amazing creative team RebelMouse is. I am so grateful!
Terrence O'Hanlon
Founder and Chairman
Reliabilityweb
It’s like we’ve been riding horses to commute all our lives and someone just showed us a Tesla.
Brendan Farley
Vice President, Product Management
Strada Education Network
Thanks to RebelMouse, United doesn’t have to sacrifice great content for speed. You can have both!
Ryan Bujeker
Director, Social Media Strategy & Digital Engagement
United Airlines
We're in a much better position to make it through the current crisis in part due to RebelMouse and their platform.
We've not only saved on overhead costs, we've more easily optimized our traffic with their smart measuring and SEO
tools, as well as their social integrations.
Jill Braff
President/Chief Operating Officer
Brit + Co
THANK YOU for all the good, fast work you put into making our public rollout such a big success. I know we threw a lot
at you very quickly, but you were calm under pressure and pulled it all off perfectly. Very glad to be working with
you all.
Tim Grieve
Executive Editor
Protocol
So inspired and impressed by these results from RebelMouse. Check them out and ask Andrea Breanna how they got there!
Just goes to show that spending millions doesn't necessarily get you to the right destination. It's spending no more
than you need on the right technology with the right platform that does! 🌐💙🏎️
Andrey Lipattsev
Partner Development Manager
Google
It is not often I jump out of my chair and scream in a positive way, but I just did! This is the happiest I have seen
my Managing Editor in 2021! Thank you.
Nick Smith
CEO
Investing News Network
Working with RebelMouse is an actual treat. From the CEO down, every single member of the team has partnered with us
to meet our needs. Everyone willingly (and with endless patience!) shares their expertise and insights to ensure we
optimize search and create a great audience experience. As we have implemented the various search tactics, positive
results are almost instantaneous. Training our content creators has been simple. The system is incredibly user
friendly. As a co-worker commented, "RebelMouse doesn't just sell you something, they partner with you to deliver
exactly what you want." I couldn't agree more!
Emma Schofield
Director of Change Management
Investing News Network
We continue to be huge fans of RebelMouse and envision a very long-term relationship with the platform. The service
has been impeccable.
Douglas Fischer
Executive Director
Environmental Health News
I am a HUGE RebelMouse fan, and basically tell everyone I meet who's looking to overhaul their website to get in touch
right away.
Susan Hassler
Editor Emeritus
IEEE Spectrum
Every member of the RebelMouse team, but especially our project team, are superstars! Our site launch on the platform
met all our expectations and we love the technology. We are positive about the future of our site and look forward to
improving our approaches and processes with RebelMouse's guidance!
Terrence O'Hanlon
Founder and Chairman
Reliabilityweb
RebelMouse's Layout & Design Tool is one of the most comprehensive and flexible tools I have ever used of any CMS.
Their product team is also extremely helpful and constantly supports to make sure your properties are up to date and
use the best industry standards, especially in terms of page speed.
Chuck Lapointe
CEO
Narcity Media
The best thing about RebelMouse is how much they care about their customers. When working with them, I always get a
sense that they go above and beyond for their customers. When working with the RebelMouse team, I very much consider
them to be a partner, rather than just a vendor. Upon migrating to their technology, we have seen tremendous
improvements in many key areas, particularly technical SEO. They have out-of-the-box functionality that is
tremendously helpful to customers. At the same time, their system is flexible enough to allow us to adapt it to any
new business needs that we have.
David Nguyen
Senior Director of Digital Marketing
Investing News Network
RebelMouse has shepherded us through the modernization phase, and has been keen, practical, and nimble throughout the
entire journey. They have helped modernize our website through an infinite scroll setup that allows us more
advertising real estate while creating more time on site to capture reader engagement. Plus, the RebelMouse team has
helped educate our own team on the various aspects of programmatic advertising and worked to best implement them.
Premier Guitar joining forces with RebelMouse in 2020 was one of the best choices we could have made!
Luke Viertel
Digital Strategist
Premier Guitar
RebelMouse has stepped up our content management process by providing a user-friendly interface, robust customization
options, and seamless collaboration tools. The platform's analytics and reporting capabilities have empowered us to
make data-driven decisions, while their exceptional customer support ensures any issues are promptly addressed in a
professional, respectful, and friendly way!
RebelMouse employs a unique, diverse, and decentralized team that consists of 70+ digital traffic experts across more than 25 different countries. We have no central office, and we cover every time zone to ensure that we’re able to deliver amazing results and enterprise-grade support around the clock.
Our team is well-versed in all things product, content, traffic, and revenue, and we strategically deploy ourselves to help with each element across all of our clients. We thrive on solving the complex.