Your Forum & robots.txt
Your forum relies on bandwidth, and reliable hosting.
But what happens when unruly spiders get loose upon your forums? Your bandwidth can seep away slowly, being lost to the searching spiders, your security and privacy could also seep away. This is because spiders often search everything on your site, your forum, images, control panels, and many other things you may not want it to.
Obviously we all want search engines to scan our site to some extent so we get listed on the search engines, and bring in guests.
However, most spiders will also search things like /admincp, /images & /usercp, which we may not want it to do. robots.txt is a good way to prevent spiders from scanning things you don't want it to. We can quite easily create such a file, I will give you a quick rundown of everything now. What I will show you in this article works on many software, by changing the file name you can alter it for any software.
Notice: Many things on this article will need adjusting to your software, and directory structure, so check. I am working with vBulletin software, but if you adjust filenames, it should work with any major software. I will also work as if my forum resided in a sub directory /forums/ if your does not, change it to yours. If your forum is in the root, don't put this down at all.
For a start we need to create a new file. Open up a text editor and create a blank document, save it, and name it as “robots.txt” (Minus the “” marks) Then open your document again, and lets begin!
Firstly, we shall start with security, and the control panels. Many people use this as an extra security precaution. So, lets type this to block them from /forum/admincp/ & /forum/modcp, as well as the UserCP
Type
Code:
User-agent: *
Disallow: /forums/admincp/
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
The * means any spider that follows the robots.txt protocol. Also note the usercp is a .php file, not a folder. Extra Security Note: As well as the password protection on the AdminCP & ModCP, you may wish to consider adding an extra password. By pass-wording the directory on
.htaccess, or through your cPanel, even if someone does gain access to an Admin/Mod account, they can't get into the panels without the extra password.
Next, we shall take care of the bandwidth, on very large board, possibly speed up the forum slightly.
Firstly, lets block spiders from searching things, some spiders are capable of using your boards search function, as a quick trip to “Whose Online” may reveal every once in a while. The following tips definitely work on vBulletin, and by changing the file names, will almost definitely work on other software.
Lets take our robots.txt file from before. It should look similar to this if you added the before.
Code:
User-agent: *
Disallow: /forums/admincp/
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
After the Dissallow: /forums/modcp type this
Code:
Disallow: /forums/search.php
That will block spiders from using the search function on large boards, this can speed things up by stopping spiders from searching all posts by a high posting user.
Next up, Privacy. After an incident on one of my own boards, my members became worried about how I allowed spiders to scan members profiles, so I stopped it. (Whose Online can also reveal spiders crawling profiles) This makes your members feel more secure, knowing their profile, often containing age, location and sometimes other details aren't slapped about the Internet. To do this lets add this directly under the last entry,
Code:
Disallow: /forums/memberlist.php
Disallow: /forums/member.php
Blocking memberlist.php will prevent searching of the Member List, and. member.php will prevent scanning of Members profiles. (Definately works for vBulletin, change the names of the files in your robots.txt for other software)
We now have this in our robots.txt.
Code:
User-agent: *
Disallow: /forums/admincp/
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
Disallow: /forums/search.php
Disallow: /forums/memberlist.php
Disallow: /forums/member.php
Next, lets prevent the spiders from accessing any test forum you may have. Lets say my test board is at /testboard, so add that:
Code:
Disallow: /testboard
Clearly not everyone will have a testboard, and if they do, it might not be at that location, so do remember to edit it to match your setup.
Next and our final addition to our robots.txt that i will show you in this article, is a few bandwidth issues.
Add all the major image directories. Such as this:
Code:
Disallow: /forums/images/
You might also want to add any attachment directories you may have. After doing that, read on.
Next, review the file for any errors, It should look like this:
Code:
User-agent: *
Disallow: /forums/admincp/
Disallow: /forums/modcp/
Disallow: /forums/usercp.php
Disallow: /forums/search.php
Disallow: /forums/memberlist.php
Disallow: /forums/member.php
Disallow: /testboard
Disallow: /forums/images/
Plus an attachment folder if you added it. Before reading on, add any other things you may want, afterwards, read on!
REMEMBER: Change the “forums” if you don't use forums subdirectory, or if you use a different name like /board. If you use your root for the forum, just put disallow: /filename.php or /foldername
Finally, lets upload our file to its proper location, save the file as robots.txt if you haven't already, then open your FTP client, and connect it. Go to your root (/public_html/ in a standard FTP layout) and upload your file to there.
Lastly, lets check its in place. Go to your domain and open it. If done correctly that will be
http://yourdomain.com/robots.txt
If its there, you done! If not, check you uploaded it correctly.
Other Options for robots.txt:
robots.txt doesn't have to go in your root, if you don't wish to put it in your root, you can alter it all by taking out the /forums directory (or whatever your forum directory) so you just have /filenames.php & /foldernames. You can then upload it to your forums directory, not your root.
Another option, is block individual bots. Do this by adding this.
Code:
User-agent: Spiders name here
Disallow: anyfolder/file you want here
Although generally, you can just leave it as described earlier, with a * for user agent. (where * = all bots)
Thanks for reading, any errors, or any other help with other software you may need, let me know and i will try to help!
Captain Kirk