People have heard about this things called robots.txt that does not allow the googlebot to crawl page but not many are aware of the usage so I thought of writing a small post on it.
First of all you need to create a simple text file on the root of your webserver and name it “robots.txt”. The rest that I would be writing in this are the contents of the file.
If you want only one directory in the site to be not crawled you can specify something like:-
User-agent: *
Disallow: /private
Disallow: /impdata
That would prevent those things from being crawled as Google appreciates this standard.
If you are being about a particular search robot and want to apply restriction, you can have something like:-
User-agent: googlebot
Disallow: /private
or
User-agent: lycra
Disallow: /
One Comment, Comment or Ping
ok neat.. if I disallow an already crawled and indexed directory would it be removed from the index or would it just not ever be crawled in the future?.
Aug 25th, 2006
Reply to “How to use Robots.txt”