Have you ever come across the robots.txt file and wondered what its purpose was?
In this article, we are going to explain what a robots.txt file is and how to use it.
What is Robots.txt?
Robots.txt is a type of file that tells the web crawlers on how to crawl the website. The file may have instructions on which pages or directories aren’t allowed to be crawled by the bot.
While most bots usually follow the rules there can be malicious bots which ignore the robots.txt file completely.
How to use the Robots.txt file
During installation WordPress automatically creates a robots.txt file for your website. You can check your websites robots.txt file by visiting www.website.com/robots.txt. Replace website.com with your domain url.
Editing the Robots.txt file
You can access your robots.txt file using an FTP client. Visit the root folder of your web server and locate the robots.txt file and open it. If for some reason there is no robots.txt file then you can simply create a text file and name it robots.txt. Add the below code in it.
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://website.com/wp-sitemap.xml
On the Sitemap line replace the website.com with your domain.
Disallowing a section of your website from being indexed
To disallow web crawlers from indexing a section of your website simply place the below line of code.
Replace restricted-area with your folder name.
Sometimes you may want to allow a page inside a disallowed section to be indexable. You can let the bots index it by adding the below line of code.
User-agent: * Disallow: /restricted-area/ Allow: /restricted-area/allow-access.php
Here replace the folder and page name with yours. This will allow the bot to index only that page inside the restricted section.
Disallowing a specific bot from indexing your website
Every bot has a user-agent name. You can restrict specific bots from indexing or crawling your website through their user-agent name. You can do so by adding the below line of code in your robots.txt file.
User-agent: * Disallow: /restricted-area/ User-agent: Bingbot Disallow: /
Here Bingbot is the bing search engine bot that we want to restrict. You can replace Bingbot with any bot that you don’t want indexing your website.
Things to keep in mind while using the Robots.txt file
- Each subdomain should have its own robots.txt file.
- You cannot force a bot to follow the robots.txt file. There are many malicious bots that will ignore the rules set in the file and still crawl the restricted pages.
Search engine crawlers have a set limit of pages they can crawl per session. This is why it is important to prevent unnecessary page crawls and only allow the bots to crawl important pages. You can do so by setting the robots.txt file properly. You can also disallow specific spam bots which will help in preventing unnecessary load on servers.