Quick Login   
 
Register AdminFusion Tutorials
 
Featured Sponsors


One.com Domain and Hosting


vBulletin, phpBB, & IPB Skins vBulletin Skins

Register
Register
Forum of the Month
vBulletin Setup
fotm

A vBulletin site devoted to helping webmaster optimize their search results in search engines.

Tag Cloud
Latest Threads
Forum Stats
7,446 Members
163,371 Posts
66 Users Online

Please welcome our newest member, dcneiner!

Affiliates
Go Back AdminFusion » Front Desk » Admin Resources » Tutorials » How to build a Helpful & Working robots.txt
Welcome to the AdminFusion. AdminFusion is the ultimate resource for forum administrators and moderators. With exclusive articles, interviews with the experts, free downloadable skins, and the revolutionary post exchange system - PostFusion, AdminFusion is the place to go for all of your forum needs.  By joining AdminFusion, you will become part of a thriving admin community and immediately gain access to all of these resources. Registration is fast, simple and absolutely free so please join us today!
Want more than our forums? Try these: Post Fusion Forum Matrix
<!-- google_ad_section_start -->How to build a Helpful & Working robots.txt<!-- google_ad_section_end -->
How to build a Helpful & Working robots.txt
Published by Jolteon
02-15-2007
Post How to build a Helpful & Working robots.txt

Your Forum & robots.txt


Your forum relies on bandwidth, and reliable hosting.
But what happens when unruly spiders get loose upon your forums? Your bandwidth can seep away slowly, being lost to the searching spiders, your security and privacy could also seep away. This is because spiders often search everything on your site, your forum, images, control panels, and many other things you may not want it to.
Obviously we all want search engines to scan our site to some extent so we get listed on the search engines, and bring in guests.
However, most spiders will also search things like /admincp, /images & /usercp, which we may not want it to do. robots.txt is a good way to prevent spiders from scanning things you don't want it to. We can quite easily create such a file, I will give you a quick rundown of everything now. What I will show you in this article works on many software, by changing the file name you can alter it for any software.


Notice: Many things on this article will need adjusting to your software, and directory structure, so check. I am working with vBulletin software, but if you adjust filenames, it should work with any major software. I will also work as if my forum resided in a sub directory /forums/ if your does not, change it to yours. If your forum is in the root, don't put this down at all.


For a start we need to create a new file. Open up a text editor and create a blank document, save it, and name it as “robots.txt” (Minus the “” marks) Then open your document again, and lets begin!


Firstly, we shall start with security, and the control panels. Many people use this as an extra security precaution. So, lets type this to block them from /forum/admincp/ & /forum/modcp, as well as the UserCP
Type
Code:
User-agent: *  
Disallow: /forums/admincp/
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
The * means any spider that follows the robots.txt protocol. Also note the usercp is a .php file, not a folder. Extra Security Note: As well as the password protection on the AdminCP & ModCP, you may wish to consider adding an extra password. By pass-wording the directory on .htaccess, or through your cPanel, even if someone does gain access to an Admin/Mod account, they can't get into the panels without the extra password.


Next, we shall take care of the bandwidth, on very large board, possibly speed up the forum slightly.
Firstly, lets block spiders from searching things, some spiders are capable of using your boards search function, as a quick trip to “Whose Online” may reveal every once in a while. The following tips definitely work on vBulletin, and by changing the file names, will almost definitely work on other software.


Lets take our robots.txt file from before. It should look similar to this if you added the before.
Code:
User-agent: *  
Disallow: /forums/admincp/ 
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
After the Dissallow: /forums/modcp type this
Code:
Disallow: /forums/search.php
That will block spiders from using the search function on large boards, this can speed things up by stopping spiders from searching all posts by a high posting user.


Next up, Privacy. After an incident on one of my own boards, my members became worried about how I allowed spiders to scan members profiles, so I stopped it. (Whose Online can also reveal spiders crawling profiles) This makes your members feel more secure, knowing their profile, often containing age, location and sometimes other details aren't slapped about the Internet. To do this lets add this directly under the last entry,
Code:
 Disallow: /forums/memberlist.php
 Disallow: /forums/member.php
Blocking memberlist.php will prevent searching of the Member List, and. member.php will prevent scanning of Members profiles. (Definately works for vBulletin, change the names of the files in your robots.txt for other software)
We now have this in our robots.txt.
Code:
  
User-agent: *  
Disallow: /forums/admincp/
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
Disallow: /forums/search.php
Disallow: /forums/memberlist.php
Disallow: /forums/member.php
Next, lets prevent the spiders from accessing any test forum you may have. Lets say my test board is at /testboard, so add that:
Code:
Disallow: /testboard
Clearly not everyone will have a testboard, and if they do, it might not be at that location, so do remember to edit it to match your setup.
Next and our final addition to our robots.txt that i will show you in this article, is a few bandwidth issues.
Add all the major image directories. Such as this:
Code:
 Disallow: /forums/images/
You might also want to add any attachment directories you may have. After doing that, read on.
Next, review the file for any errors, It should look like this:
Code:
  
User-agent: *  
Disallow: /forums/admincp/ 
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
Disallow: /forums/search.php
Disallow: /forums/memberlist.php
Disallow: /forums/member.php
Disallow: /testboard
Disallow: /forums/images/
Plus an attachment folder if you added it. Before reading on, add any other things you may want, afterwards, read on!


REMEMBER: Change the “forums” if you don't use forums subdirectory, or if you use a different name like /board. If you use your root for the forum, just put disallow: /filename.php or /foldername


Finally, lets upload our file to its proper location, save the file as robots.txt if you haven't already, then open your FTP client, and connect it. Go to your root (/public_html/ in a standard FTP layout) and upload your file to there.
Lastly, lets check its in place. Go to your domain and open it. If done correctly that will be http://yourdomain.com/robots.txt
If its there, you done! If not, check you uploaded it correctly.


Other Options for robots.txt:
robots.txt doesn't have to go in your root, if you don't wish to put it in your root, you can alter it all by taking out the /forums directory (or whatever your forum directory) so you just have /filenames.php & /foldernames. You can then upload it to your forums directory, not your root.


Another option, is block individual bots. Do this by adding this.
Code:
 User-agent: Spiders name here
 Disallow: anyfolder/file you want here
Although generally, you can just leave it as described earlier, with a * for user agent. (where * = all bots)


Thanks for reading, any errors, or any other help with other software you may need, let me know and i will try to help!
Captain Kirk
Tutorial Tools

 
By Christophaa on 02-15-2007, 07:42 PM
Great tutorial, CK!

Chris
Reply With Quote
  #1  
By Jolteon on 02-15-2007, 08:46 PM
Thanks
Reply With Quote
  #2  
By Lord Howard on 05-05-2007, 09:23 PM
Thanks for that
Reply With Quote
  #3  
By SticKer on 05-27-2007, 08:46 AM
this is a very useful and important tip for webmasters.
Reply With Quote
  #4  
By Rizzo on 07-03-2007, 11:29 AM
Thank you

Can't wait to see if it works
Reply With Quote
  #5  
By DeathByPixels on 09-18-2007, 04:02 PM
Thanks for the tutorial!

And how should the file be chmod'ed?
Reply With Quote
  #6  
By bdude on 09-19-2007, 02:35 AM
Normal chmod - just so you can go to www.yourdomain.com/robots.txt
Reply With Quote
  #7  
By Rizzo on 09-21-2007, 03:51 AM
He he, it's helpful not helpfull.
Reply With Quote
  #8  
By Jolteon on 09-21-2007, 05:40 PM
What do you mean by that?
Reply With Quote
Comment



Currently Active Users Viewing This Tutorial: 1 (0 members and 1 guests)
 
Tutorial Tools
Display Modes

 
Posting Rules

Similar Threads
Tutorial Tutorial Starter Category Comments Last Post
Robots.txt And Your Forum Jolteon Articles 0 12-27-2006 06:17 PM

AdminFusion

All times are GMT +1. The time now is 05:49 AM. Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.1.0 © 2005-2008 AdminFusion - All Rights Reserved
Tutorial powered by GARS 2.1.9 ©2005-2006



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72