# robots.txt for http://www.netrot.net/ # # a list of URL directories & files for wandering robots to ignore. # $Id: robots.txt,v 1.3 2004/07/10 14:24:08 chang Exp chang $ # # url that this used to be documented by is at # Ref: (gone though) # # AGGGGGHH! records are separated by new lines (they are not ignored) # shucks, darn, and other niceties # # Email comments to webmaster@netrot.net # ##### # User-agent contains entries of robots that you want to match this against. User-agent: * # # Disallow specifies partial or full URL paths # # ignore local icon stores # Disallow: /icons/ # # ignore test directory Disallow: /test/ # # ignore potentially naughty CGI stuff #Disallow: /cgi-bin/finger #Disallow: /cgi-bin/info2www # Disallow: /~chang/baby/ Disallow: /~chang/video/ Disallow: /~chang/kat/ Disallow: /~chang/virus/ Disallow: /~chang/z/ ##### # additional entries... # User-agent: testbot Disallow: / ##### # obnoxious crawler that doesn't follow/read my robots.txt file properly User-agent: http://www.almaden.ibm.com/cs/crawler Disallow: / ##### # reputedly obnoxious robot from Korea User-agent: NaverBot Disallow: / # another obnoxious bot User-agent: aipbot Disallow: / # another obnoxious bot User-agent: OmniExplorer_Bot Disallow: / ##### # obnoxious pings, hits every few hours # User-agent: sitecheck.internetseer.com Disallow: / User-agent: ConveraCrawler Disallow: /users/chang/baby/ ##### # bad karma # User-agent: TurnitinBot Disallow: / User-agent: BecomeBot Disallow: /