int0x80 January 18, 2011 at 22:00

Not an ordinary Python XMPP bot: tunneling

Not so long ago, an article was published about ICQ in Python , which prompted me to develop the topic, though in a slightly different direction. A few years ago I had difficulties with the home Internet: access only to the local network, from communication with the outside world only ICQ and the local Jabber server; there was no other way out. As a result, the idea was born to tunnel HTTP traffic in XMPP.

Scheme

The scheme is based on three main components:

bot server : receives messages with HTTP requests, executes, encodes and sends the result to the client
bot client : sends information about HTTP requests to the server to the server, waits for the result, processes and returns the result of the request ready for further use
http-proxy : a proxy server that processes HTTP requests using a bot client

The components are located as follows: on a remote machine with Internet access, a bot server is launched. On localhost, the bot client and proxies are launched; client applications are configured to use our proxy, for example: A simple XML-based protocol is used to interact the bot-client with the bot-server. Request to download example.com index page :

$ http_proxy="localhost:3128" wget ...

http://example.com

Answer:

encoded_data

The answer consists of several parts, chunk'ov. Here chunk is the number of chunk, count is the total number of chunks into which the response to the request was split. encoded_data - base64 encoded response piece.

For greater clarity, I will present the diagram graphically:

                                     local                                            
+ ------------------------------------------------- ---------------------------------- +
| http-client (browser, wget) -> http-proxy -> bot-client |
+ ------------------------------------------------- ---------------------------------- +
                                       / \
                                       ||
                                       \ /
                                    remote
+ ------------------------------------------------- ---------------------------------- +
| bot-server |
+ ------------------------------------------------- ---------------------------------- +

Implementation

General information

For work with XMPP xmpppy is used . No tricky features are needed, you just need to process incoming messages and send replies. XML is parsed and generated using the standard library - xml.dom.minidom .

Bot server

The server's task is to receive download requests, send them to the library, which itself will figure out what needs to be downloaded and return the result, and the server will forward this result to the client.

In a simplified scheme, server-side message processing looks like this:

import xmpp
from Fetcher import Fetcher
fetcher = None
def message_callback(con, msg):
    global fetcher
    if msg.getBody():
        try:
            ret = fetcher.process_command(msg.getBody())
        except:
            ret = ["failed to process command"]
        for i in ret:
            reply = xmpp.Message(msg.getFrom(), i)
            reply.setType('chat')
            con.send(reply)
if __name__ == "__main__":
    jid = xmpp.JID("my@server.jid")
    user = jid.getNode()
    server = jid.getDomain()
    password = "secret"
    conn = xmpp.Client(server, debug=[])
    conres = conn.connect()
    authres = conn.auth(user, password, resource="foo")
    conn.RegisterHandler('message', message_callback)
    conn.sendInitPresence()
    fetcher = Fetcher()
    while True:
         conn.Process(1)

I intentionally removed error handling and hardcoded values so that the code was more compact and easier to read. So what is going on here? We connect to the jabber server and hang up the message handler:

    conn.RegisterHandler('message', message_callback)

Thus, for each new incoming message, our message_callback (con, msg) function will be called , the arguments of which will be the connection handle and the message itself. The function itself calls the command handler from the Fetcher class , which does all the "black" work and returns a list of chunks given to the client. That's all, this is where the server ends.

Fetcher

The Fetcher class implements the very logic of executing and encoding HTTP requests. I will not give the whole code, it can be viewed in the archive attached to the article, I will describe only the main points:

    def process_command(self, command):
        doc = xml.dom.minidom.parseString(command)
        url = self._gettext(doc.getElementsByTagName("url")[0].childNodes)
        try:
            f = urllib2.urlopen(url)
        except Exception, err:
            return ["%s" % str(err)]
        lines = base64.b64encode(f.read())
        ret = []
        chunk_size = 1024
        x = 0 
        n = 1 
        chunk_count = (len(lines) + chunk_size - 1) / chunk_size
        while x < len(lines):
            ret.append(self._prepare_chunk(n, chunk_count, lines[x:x + chunk_size]))
            x += chunk_size
            n += 1
        return ret

The process_command function , as you probably recall, is called by our bot server. It parses the XML request, determines which url it needs to request, and does this with urllib2 . The downloaded is encoded in base64 so that there are no unexpected problems with special characters, and is split into equal parts so as not to run into a message length limit. Then each chunk is wrapped in XML and sent out.

Client

The client, in fact, is just one callback, which glues the data and decodes from base64:

def message_callback(con, msg):
    global fetcher, output, result
    if msg.getBody():
        message = msg.getBody()
        chunks, count, data = fetcher.parse_answer(message)
        output.append(data)
        if chunks == count:
            result = base64.b64decode(''.join(output))

Proxy

In order for the tunnel to be used transparently, an HTTP proxy is implemented. The proxy server binds to port 3128 / tcp and waits for requests. Received requests are sent to the bot server for processing, the result is decoded and sent to the client. From the point of view of client applications, our proxy is no different from “ordinary” ones.

To create a TCP server, use the SocketServer.StreamRequestHandler from the standard library.

class RequestHandler(SocketServer.StreamRequestHandler):
    def handle(self):
        data = self.request.recv(1024)
        method, url, headers = parse_http_request(data)
        if url is not None:
            response = fetch_file(server_jid, client_jid, password, url)
            self.wfile.write(response)
        self.request.close()

The parse_http_request () function parses an HTTP request, pulling url, headers and http version from it; fetch_file () - requests url using bot client.

Conclusion

The full source code is available here as a shar archive (you need to run the file and execute it as a shell script). Of course, this is more a prototype than a full-fledged application, but the prototype is working and at least small files can be downloaded without problems. This should be enough for the main purpose of the article: to demonstrate the "non-interactive" application of the IM bot.

There are a lot of things that can be improved in the project - from adding authentication, normal support for request types, to work on performance. It is very interesting what kind of performance can be achieved with such an architecture, the study of which, perhaps, I will take up soon.

Tags: