What is Robots.txt in WordPress and How to Use it

Have you ever come across the robots.txt file and wondered what its purpose was?

In this article, we are going to explain what a robots.txt file is and how to use it.

What is Robots.txt?

Robots.txt is a type of file that tells the web crawlers on how to crawl the website. The file may have instructions on which pages or directories aren’t allowed to be crawled by the bot.

While most bots usually follow the rules there can be malicious bots which ignore the robots.txt file completely.

How to use the Robots.txt file

During installation WordPress automatically creates a robots.txt file for your website. You can check your websites robots.txt file by visiting www.website.com/robots.txt. Replace website.com with your domain url.

Editing the Robots.txt file

You can access your robots.txt file using an FTP client. Visit the root folder of your web server and locate the robots.txt file and open it. If for some reason there is no robots.txt file then you can simply create a text file and name it robots.txt. Add the below code in it.

User-agent: *

Disallow: /wp-admin/

Allow: /wp-admin/admin-ajax.php

Sitemap: https://website.com/wp-sitemap.xml

On the Sitemap line replace the website.com with your domain.

Disallowing a section of your website from being indexed

To disallow web crawlers from indexing a section of your website simply place the below line of code.

Disallow: /restricted-area/

Replace restricted-area with your folder name.

Sometimes you may want to allow a page inside a disallowed section to be indexable. You can let the bots index it by adding the below line of code.

User-agent: *

Disallow: /restricted-area/

Allow: /restricted-area/allow-access.php

Here replace the folder and page name with yours. This will allow the bot to index only that page inside the restricted section.

Disallowing a specific bot from indexing your website

Every bot has a user-agent name. You can restrict specific bots from indexing or crawling your website through their user-agent name. You can do so by adding the below line of code in your robots.txt file.

User-agent: *

Disallow: /restricted-area/

User-agent: Bingbot

Disallow: /

Here Bingbot is the bing search engine bot that we want to restrict. You can replace Bingbot with any bot that you don’t want indexing your website.

Things to keep in mind while using the Robots.txt file

  • Each subdomain should have its own robots.txt file.
  • You cannot force a bot to follow the robots.txt file. There are many malicious bots that will ignore the rules set in the file and still crawl the restricted pages.

Final Thoughts

Search engine crawlers have a set limit of pages they can crawl per session. This is why it is important to prevent unnecessary page crawls and only allow the bots to crawl important pages. You can do so by setting the robots.txt file properly. You can also disallow specific spam bots which will help in preventing unnecessary load on servers.

Have questions or confused about something WordPress Related? Join Our Discord Server & ask a Question

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Scroll to Top