# # This is the robots.txt file. It should be in the root (/) # of your web site. # # This file is used to tell search engine crawlers what # directories they should ignore when indexing. # # Lines beginning with # are comments, and are not processed # by crawlers. # # The format is a series of records. Each record begins with # a User-agent: statement, and is followed by at least one # Disallow: statement. The preset records below demonstrate # this format. # # Records are separated by blank lines # # For further information, visit # http://info.webcrawler.com/mak/projects/robots/norobots.html # # Leave these records intact: User-agent: stress-agent Disallow: / User-agent: vspider Disallow: / # Edit this record as appropriate for your site. Add any # other directories you want crawlers to ignore. User-agent: * Disallow: /_ Disallow: /asianmonth/ Disallow: /dyantable/ Disallow: /error/ Disallow: /forum/ Disallow: /gpower/catalog/ Disallow: /gpower/adultswhocare2/ Disallow: /gpower/girlarea/gamespuz/cootie/ Disallow: /gpower/girlarea/gpguests/simpson/ Disallow: /gpower/girlarea/images/ Disallow: /hisp98/ Disallow: /hp2010/ Disallow: /images/ Disallow: /intlorders/ Disallow: /kidsarea/ Disallow: /labs/ Disallow: /mediastudy/ Disallow: /mlk/ Disallow: /mulitcul/ Disallow: /nacoa/ Disallow: /posters/ Disallow: /presteleconf/ Disallow: /pubs/catalog/ Disallow: /pubs/alcruns/ Disallow: /pubs/prevpipe/ Disallow: /pubs/primer/ Disallow: /pubs/parguide/ Disallow: /pubs/strafact/ Disallow: /pubs/drugfree/ Disallow: /reality/ Disallow: /reality/images/ Disallow: /recovery99/ Disallow: /recovery00/ Disallow: /recovery2000/ Disallow: /res-brf/ Disallow: /topten/ Disallow: /workplce/ Disallow: /wpkit/ Disallow: /CSAT/ Disallow: /pw4npn/ Disallow: /NSAMI/ Disallow: /search/ Disallow: /dbases/Domain_Counter.aspx Disallow: /research/Domain_Counter.aspx Disallow: /govpubs/phd633 Disallow: /govpubs/phd633i Disallow: /features/lgbt User-agent: EmailSiphon Disallow: / User-agent: ExtractorPro Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailWolf Disallow: / User-agent: WebBandit Disallow: / User-agent: msnbot #MSN Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: Slurp #YAHOO Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: Googlebot # Google Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: VoilaBot # French Search Engine # VoilaBot-Not 100% this is a good bot. monitor see obeys robots.txt Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: Teoma # AskJeeves Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: baiduspider # Baidu Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: Gigabot # Gigabot-Not 100% this is a good bot. monitor see obeys robots.txt Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: Larbin # Larbin-Not 100% this is a good bot. monitor see obeys robots.txt Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST User-agent: ConveraCrawler # Converas RetrievalWare Internet Spider-Not 100% this is a good bot. # monitor see obeys robots.txt Crawl-delay: 10 Visit-time: 0600-0845 # and then only between 1 am to 3:45 am EST