robots.txt is ASCII-encoded text file stored in the root directory of the website, it usually tells Web search engine spiders or robots, what contents are allowed to indexed and what are not allowed. Basically this text file is kind of protected shield for your website which hide pages from search engines. But if you see in other terms It also hides duplicate contents of your website to be indexed in search engines.
What’s robots.txt file is used?
The robots.txt file is created for bots or spiders of the search engines.
It indicates the site structure, the location of the sitemap.xml and pages and directories that should not be reviewed.
It is not mandatory to have one, it is only necessary if you want to restrict the content of your site to be crawled search engine robots.
It specify files or directories that should not be tracked.
Why use robots.txt file?
The main reasons why we need to use a robots.txt on our site are:
1 – Improve the ranking of the site, clarifying and facilitating the bots, what directories can be indexed.
1 – Preventing the consumption of unnecessary bandwidth by restricting bots crawling unimportant files and directories.
3 – Prevent personal or private content files getting indexed. It could be documents, images, pictures or other files that we do not want to be accidentally appeared in the search results.
4 – Preventing duplicate contents to be indexed (mainly WordPress)
Points to consideration while creating a robots.txt file.
- All the command are case sensitive, ignoring case could reveal your hidden pages to web
- Each row in a file represents a command, blank line will be ignored.
- The “#” character after the parameter is ignored
- A separate User-agent rules will be excluded from the wildcard “*” User agent outside the rules
- Linking sitemap file can be written to facilitate the search engine spiders crawl the entire station.
- Minimize the use Allow directive, because different search engines have different Allow directive for different locations
How to optimize WordPress for robots.txt setting
Disallow: / WP-ADMIN / Disallow: / WP-content / Disallow: / WP-includes /
Disallow: / * / trackback
Try to block trackback crawling, if search engine crawl trackback, it looks duplicate to them
Disallow: / feed Disallow: / * / feed Disallow: / comments / feed
Disallow: /? S = * Disallow: / * /? s = *
Disallow: /? R = *
Disallow: / *. Jpg $ Disallow: / *. JPEG $ Disallow: / *. gif $ Disallow: / *. PNG $ Disallow: / *. bmp $
This will protect your picture files and you would save some bandwith
Disallow: /? P = *
This hide the short links from search engines
Disallow: / * / comment-Page-* Disallow: / *? replytocom *
Disallow: / a / Date / Disallow: / a / author / Disallow: / a / Category / Disallow: /? P = * & preview = true Disallow: /? page_id = * & preview = true Disallow: / WP-login. php
Add website sitemap to robots.txt file and facilitate search engine capture all site content, of course, you can set up multiple sitemap links. It should be noted that the Sitemap to use uppercase S, the link should be the absolute address to the sitemap link
Examples of robots.txt for WordPress
Example 1 User-agent: * Disallow: / wp-content /
In this case, we indicates that wp-content folder name should not be crawled.
Example 2 User-agent: Googlebot Disallow: / wp-admin / Disallow: private.html
In this example the Google bot can not crawl the content of the folder “wp-admin” or page “private.html”.
Google uses multiple robots:
➔ Googlebot, crawls web pages and content.
➔ Googlebot-Image, trace images and photos for Google images.
➔ Googlebot-Mobile content for mobile devices.
➔ Mediapartners-Google is the AdSense service robot.
Example 3 User-agent: Googlebot-Image Disallow: / photos /
This prevents bot to crawl personal photographs that are in the folder”photos”.
Example 4 User-agent: * Disallow:. / * doc $
This code tells bot to exclude all doc files while crawling.
Tips to create a robots.txt
1 – If you are using Google AdSense service you must allow the full crawl to adsense robot using following code:
2 – If you are using customized design on your site, you should not block access to the directory containing the CSS files.
3 – Always keep in mind that search engine robots are case sensitive.
4 – robots.txt should also specify the location of the sitemap.xml file.
Create perfect robots.txt file for WordPress
WordPress recommends creating a robots.txt file with the following structure:
User-agent: * Disallow: / cgi-bin / Disallow: / wp-admin / Disallow: / wp-includes / Disallow: / wp-content/plugins / Disallow: / wp-content/cache / Disallow: / wp-content / themes / Disallow: / trackback / Disallow: / feed / Disallow: / comments / Disallow: / category / * / * Disallow: * / trackback / Disallow: * / feed / Disallow: * / comments / Disallow: / * Allow : / wp-content/uploads /