I’ve had a Roku for about two years now and just finally got around to hooking it up. There are a lot more channels than I was expecting, and some pretty good ones at that; one that I found and have been watching recently is the DerbyCon channel. DerbyCon is a hacker convention in Derby, Kentucky, and they have videos of all the speakers from 2011 and 2012. John Strand is the speaker I’m parroting. In his talk he spoke of “hacking back” as a way of interfering with webcrawlers and directory-scanning bots that I thought was neat.

The first trick is admittedly old-hat, but still quite effective: infinitely recursive directories. This could be implemented by leaving a user account open on honeypot and symlinking directories to each other (junctioning for Windows folk) In Windows this can be done with mkmlink /D (symlink -s in Linux) as such:

mkdir C:\goaway\
cd C:\goaway\
mklink /D dir1 C:\goaway\
mklink /D dir2 C:\goaway\

Now whenever a bot tries to recursively search C:\goaway\ with something like dir /S, they get stuck in a loop that looks something like: goaway -> dir1 -> goaway -> dir2 -> goaway -> dir1 -> goaway …..

This is quite effective for file systems, but what if we are working with a web application?

Well, Strand also mentioned a couple of tools that can mess with web crawlers, as well: Spidertrap and Weblabyrinth.

Spidertrap and Weblabryinth are written in Python and PHP, respectively. They are neat tools that can do all sorts of fun stuff to bots. Note that one should definitely have a robots.txt file in place if using these so that the nice web crawlers (Google, Microsoft, etc) go away and don’t get trapped. They work in a couple of ways: one by generating random URLs when requested, so that bots get stuck chasing links to nowhere. They can also modify the HTTP header information and return randomness there, too, such as changing the response type from 200 to 403.

Cool, huh? Now imagine if every website did something like this — if every site fed crawlers who didn’t respect the robots.txt junk, we could seriously affect the quality ratio of a hacker’s scanning.