nvv February 19, 2015 at 14:34

[Homebrew] do-it-yourself web honeypot

Tutorial

In the article “What and why“ bots of the dark side of power ”are searched on sites”, we examined typical examples from magazines of different sites. However, a variation on the topic of radio games in intelligence is much more interesting . What is it and how to cook it - I will tell further.

We list the main points. If you do not agree with them, then it is better not to waste your time and do not read further.

So, the main points:

you are interested in information security, web administration or study in the relevant specialties;
you have a little desire, time and resources that you can spend to feel like a researcher;
You do not expect to immediately become a super guru, but, developing the particular solutions proposed in the article, you can study some issues with interest.

Honeypot, in short, is a type of trap through which a researcher collects material. Information about varieties, existing solutions, including OpenSource, it’s easy to find on the network, so we will not dwell on them.

Let's get to the point:

take hosting;
take a domain;
wrap all incoming requests for your script;
we analyze incoming requests and, in addition to collecting statistics, we join the game.

We take hosting

It is necessary to determine the site where our honeypot will be located. To reduce the entry threshold, we will choose shared hosting, as this eliminates system administration issues (installation, optimal configuration, protection and updates), quickly and fairly cheaply. Servers (ranges of IP web-servers) of hosting companies are known and never complain about the lack of attention from bots.

Those who wish can immediately take up VPS / VDS, most importantly, do not get stuck at the server setup stage.

We take the domain

The new domain, as a rule, immediately attracts the attention of bots, although the "old" domains are also great. If you use an existing domain (site), then side effects may occur due to possible redirection errors or excessive load.

According to rough estimates, at the beginning of 2015, the new domain and several months of hosting will cost 1000 rubles.

We wrap all incoming requests for your script

There are many solutions for this task, depending on the web server used and the level of influence on the server settings. The proposed simplest option is suitable for a new domain. This does not interfere with the main business and allows you to quickly move on to the most, in our opinion, interesting.

simplest .htaccess option

RewriteEngine On
RewriteRule .* index.php [L]

All requests are wrapped on index.php.

For finer tuning of redirects, especially for an existing site, you should not be lazy and look at the documentation or articles, for example, “How mod_rewrite actually works. A manual for those who continue . ”

We analyze incoming requests, accumulate statistics

In the script to which we wrap the requests, we implement the following functionality:

logging some data from $ _SERVER to accumulate statistics;
the ability to search for patterns (patterns) in the data coming from $ _SERVER;
effective mechanism for connecting handlers for some templates (for efficiency, see Non-standard optimization of projects in PHP );
(for the future) a simplified and non-resource-intensive server-side session mechanism.

Join the game

Finally, we come to the main point. What will the game be like?

After analyzing the statistics, you select the bot that you want to explore. You can try to identify the bot by various criteria (IP ranges, scan time, User-Agent, specific URL requests, etc.).

After that, you disguise yourself as a bot's expectations and, giving it the information and files that it expects, fully describe its behavior from the scanning stage to attempts to use exploits, non-standard calls, download specific files, etc.

For example, the bot expects a certain css file - get it, after that it tries to access a specific file - look for information about it on the network and issue it, pass parameters - try to fake the answer, etc. This is where the lightweight session implementation comes in handy.

Of course, between the first call and the construction of the entire chain of answers several iterations can take place with elements of guessing and manual search for information. But this is the battle of minds (you <-> bot algorithm developer), real chess!

little hint

To make it more difficult for bots to identify your analysis, it is advisable (within reasonable limits) to use an element of randomness in the output of results. Namely, your algorithm does not yet know the “right answer” for the bot or the request has not been met before - give a probability of XX% message that simulates a server error or an empty file, try SQL-injection - give a plausible error message to the DBMS or PHP, etc. .

Instead of a conclusion

Go for it! And may your work be for the good.

Warning the offers to immediately lay out the finished code (why this is not done):

so as not to hamper the flight of fancy;
to “not copy-paste” students of specialized specialties / departments (greetings from KBiMMU at TVSU);
so as not to make the task easier for bot drivers who will immediately cut off novice researchers in the process of testing the code (if it was) proposed in the article.

Tags: