robots.txt is ASCII-encoded text file stored in the root directory of the website, it usually tells Web search engine spiders or robots, what contents are allowed to indexed and what are not allowed. Basically this text file is kind of protected shield for your website which hide pages from search engines. But if you see in other terms It also hides duplicate contents of your website to be indexed in search engines.

setup robots.txt for SEO

What’s robots.txt file is used?

The robots.txt file is created for bots or spiders of the search engines.
It indicates the site structure, the location of the sitemap.xml and pages and directories that should not be reviewed.
It is not mandatory to have one, it is only necessary if you want to restrict the content of your site to be crawled search engine robots.
It specify files or directories that should not be tracked.

Why use robots.txt file?

The main reasons why we need to use a robots.txt on our site are:
1 – Improve the ranking of the site, clarifying and facilitating the bots, what directories can be indexed.
1 – Preventing the consumption of unnecessary bandwidth by restricting bots crawling unimportant files and directories.
3 – Prevent personal or private content files getting indexed. It could be documents, images, pictures or other files that we do not want to be accidentally appeared in the search results.
4 – Preventing duplicate contents to be indexed (mainly WordPress)

Points to consideration while creating a robots.txt file.

  • All the command are case sensitive, ignoring case could reveal your hidden pages to web
  • Each row in a file represents a command, blank line will be ignored.
  • The “#” character after the parameter is ignored
  • A separate User-agent rules will be excluded from the wildcard “*” User agent outside the rules
  • Linking sitemap file can be written to facilitate the search engine spiders crawl the entire station.
  • Minimize the use Allow directive, because different search engines have different Allow directive for different locations

How to optimize WordPress for robots.txt setting

User-agent: *
   Disallow: / WP-ADMIN / 
  Disallow: / WP-content / 
  Disallow: / WP-includes /
 Disallow: / * / trackback

Try to block trackback crawling, if search engine crawl trackback, it looks duplicate to them

 Disallow: / feed 
  Disallow: / * / feed 
  Disallow: / comments / feed
 Disallow: /? S = * 
  Disallow: / * /? s = *
  Disallow: /? R = *
  Disallow: / *. Jpg $ 
  Disallow: / *. JPEG $ 
  Disallow: / *. gif $ 
  Disallow: / *. PNG $ 
  Disallow: / *. bmp $

This will protect your picture files and you would save some bandwith

Disallow: /? P = *

This hide the short links from search engines

   Disallow: / * / comment-Page-* 
  Disallow: / *? replytocom *
    Disallow: / a / Date / 
  Disallow: / a / author / 
  Disallow: / a / Category / 
  Disallow: /? P = * & preview = true 
  Disallow: /? page_id = * & preview = true 
  Disallow: / WP-login. php
    Sitemap: http://ssiddique.info/sitemap.txt

Add website sitemap to robots.txt file and facilitate search engine capture all site content, of course, you can set up multiple sitemap links. It should be noted that the Sitemap to use uppercase S, the link should be the absolute address to the sitemap link

Examples of robots.txt for WordPress

Example 1
User-agent: * 
Disallow: / wp-content /

In this case, we indicates that wp-content folder name should not be crawled.

Example 2
User-agent: Googlebot 
Disallow: / wp-admin / 
Disallow: private.html

In this example the Google bot can not crawl the content of the folder “wp-admin” or page “private.html”.

Google uses multiple robots:
➔ Googlebot, crawls web pages and content.
➔ Googlebot-Image, trace images and photos for Google images.
➔ Googlebot-Mobile content for mobile devices.
➔ Mediapartners-Google is the AdSense service robot.

Example 3
User-agent: Googlebot-Image 
Disallow: / photos /

This prevents bot to crawl personal photographs that are in the folder”photos”.

Example 4
User-agent: * 
Disallow:. / * doc $

This code tells bot to exclude all doc files while crawling.

Tips to create a robots.txt

1 – If you are using Google AdSense service you must allow the full crawl to adsense robot using following code:
User-agent: Mediapartners-Google
Disallow:
2 – If you are using customized design on your site, you should not block access to the directory containing the CSS files.
3 – Always keep in mind that search engine robots are case sensitive.
4 – robots.txt should also specify the location of the sitemap.xml file.

Create perfect robots.txt file for WordPress

WordPress recommends creating a robots.txt file with the following structure:

User-agent: * 
Disallow: / cgi-bin / 
Disallow: / wp-admin / 
Disallow: / wp-includes / 
Disallow: / wp-content/plugins / 
Disallow: / wp-content/cache / 
Disallow: / wp-content / themes / 
Disallow: / trackback / 
Disallow: / feed / 
Disallow: / comments / 
Disallow: / category / * / * 
Disallow: * / trackback / 
Disallow: * / feed / 
Disallow: * / comments / 
Disallow: / * 
Allow : / wp-content/uploads /