Not an ordinary Python XMPP bot: tunneling
Not so long ago, an article was published about ICQ in Python , which prompted me to develop the topic, though in a slightly different direction. A few years ago I had difficulties with the home Internet: access only to the local network, from communication with the outside world only ICQ and the local Jabber server; there was no other way out. As a result, the idea was born to tunnel HTTP traffic in XMPP.
The scheme is based on three main components:
The components are located as follows: on a remote machine with Internet access, a bot server is launched. On localhost, the bot client and proxies are launched; client applications are configured to use our proxy, for example: A simple XML-based protocol is used to interact the bot-client with the bot-server. Request to download example.com index page :
Answer:
The answer consists of several parts, chunk'ov. Here chunk is the number of chunk, count is the total number of chunks into which the response to the request was split. encoded_data - base64 encoded response piece.
For greater clarity, I will present the diagram graphically:
For work with XMPP xmpppy is used . No tricky features are needed, you just need to process incoming messages and send replies. XML is parsed and generated using the standard library - xml.dom.minidom .
The server's task is to receive download requests, send them to the library, which itself will figure out what needs to be downloaded and return the result, and the server will forward this result to the client.
In a simplified scheme, server-side message processing looks like this:
I intentionally removed error handling and hardcoded values so that the code was more compact and easier to read. So what is going on here? We connect to the jabber server and hang up the message handler:
Thus, for each new incoming message, our message_callback (con, msg) function will be called , the arguments of which will be the connection handle and the message itself. The function itself calls the command handler from the Fetcher class , which does all the "black" work and returns a list of chunks given to the client. That's all, this is where the server ends.
The Fetcher class implements the very logic of executing and encoding HTTP requests. I will not give the whole code, it can be viewed in the archive attached to the article, I will describe only the main points:
The process_command function , as you probably recall, is called by our bot server. It parses the XML request, determines which url it needs to request, and does this with urllib2 . The downloaded is encoded in base64 so that there are no unexpected problems with special characters, and is split into equal parts so as not to run into a message length limit. Then each chunk is wrapped in XML and sent out.
The client, in fact, is just one callback, which glues the data and decodes from base64:
In order for the tunnel to be used transparently, an HTTP proxy is implemented. The proxy server binds to port 3128 / tcp and waits for requests. Received requests are sent to the bot server for processing, the result is decoded and sent to the client. From the point of view of client applications, our proxy is no different from “ordinary” ones.
To create a TCP server, use the SocketServer.StreamRequestHandler from the standard library.
The parse_http_request () function parses an HTTP request, pulling url, headers and http version from it; fetch_file () - requests url using bot client.
The full source code is available here as a shar archive (you need to run the file and execute it as a shell script). Of course, this is more a prototype than a full-fledged application, but the prototype is working and at least small files can be downloaded without problems. This should be enough for the main purpose of the article: to demonstrate the "non-interactive" application of the IM bot.
There are a lot of things that can be improved in the project - from adding authentication, normal support for request types, to work on performance. It is very interesting what kind of performance can be achieved with such an architecture, the study of which, perhaps, I will take up soon.
Scheme
The scheme is based on three main components:
- bot server : receives messages with HTTP requests, executes, encodes and sends the result to the client
- bot client : sends information about HTTP requests to the server to the server, waits for the result, processes and returns the result of the request ready for further use
- http-proxy : a proxy server that processes HTTP requests using a bot client
The components are located as follows: on a remote machine with Internet access, a bot server is launched. On localhost, the bot client and proxies are launched; client applications are configured to use our proxy, for example: A simple XML-based protocol is used to interact the bot-client with the bot-server. Request to download example.com index page :
$ http_proxy="localhost:3128" wget ...
http://example.com Answer:
encoded_data The answer consists of several parts, chunk'ov. Here chunk is the number of chunk, count is the total number of chunks into which the response to the request was split. encoded_data - base64 encoded response piece.
For greater clarity, I will present the diagram graphically:
local
+ ------------------------------------------------- ---------------------------------- +
| http-client (browser, wget) -> http-proxy -> bot-client |
+ ------------------------------------------------- ---------------------------------- +
/ \
||
\ /
remote
+ ------------------------------------------------- ---------------------------------- +
| bot-server |
+ ------------------------------------------------- ---------------------------------- +
Implementation
General information
For work with XMPP xmpppy is used . No tricky features are needed, you just need to process incoming messages and send replies. XML is parsed and generated using the standard library - xml.dom.minidom .
Bot server
The server's task is to receive download requests, send them to the library, which itself will figure out what needs to be downloaded and return the result, and the server will forward this result to the client.
In a simplified scheme, server-side message processing looks like this:
import xmpp
from Fetcher import Fetcher
fetcher = None
def message_callback(con, msg):
global fetcher
if msg.getBody():
try:
ret = fetcher.process_command(msg.getBody())
except:
ret = ["failed to process command"]
for i in ret:
reply = xmpp.Message(msg.getFrom(), i)
reply.setType('chat')
con.send(reply)
if __name__ == "__main__":
jid = xmpp.JID("my@server.jid")
user = jid.getNode()
server = jid.getDomain()
password = "secret"
conn = xmpp.Client(server, debug=[])
conres = conn.connect()
authres = conn.auth(user, password, resource="foo")
conn.RegisterHandler('message', message_callback)
conn.sendInitPresence()
fetcher = Fetcher()
while True:
conn.Process(1)
I intentionally removed error handling and hardcoded values so that the code was more compact and easier to read. So what is going on here? We connect to the jabber server and hang up the message handler:
conn.RegisterHandler('message', message_callback)
Thus, for each new incoming message, our message_callback (con, msg) function will be called , the arguments of which will be the connection handle and the message itself. The function itself calls the command handler from the Fetcher class , which does all the "black" work and returns a list of chunks given to the client. That's all, this is where the server ends.
Fetcher
The Fetcher class implements the very logic of executing and encoding HTTP requests. I will not give the whole code, it can be viewed in the archive attached to the article, I will describe only the main points:
def process_command(self, command):
doc = xml.dom.minidom.parseString(command)
url = self._gettext(doc.getElementsByTagName("url")[0].childNodes)
try:
f = urllib2.urlopen(url)
except Exception, err:
return ["%s" % str(err)]
lines = base64.b64encode(f.read())
ret = []
chunk_size = 1024
x = 0
n = 1
chunk_count = (len(lines) + chunk_size - 1) / chunk_size
while x < len(lines):
ret.append(self._prepare_chunk(n, chunk_count, lines[x:x + chunk_size]))
x += chunk_size
n += 1
return ret
The process_command function , as you probably recall, is called by our bot server. It parses the XML request, determines which url it needs to request, and does this with urllib2 . The downloaded is encoded in base64 so that there are no unexpected problems with special characters, and is split into equal parts so as not to run into a message length limit. Then each chunk is wrapped in XML and sent out.
Client
The client, in fact, is just one callback, which glues the data and decodes from base64:
def message_callback(con, msg):
global fetcher, output, result
if msg.getBody():
message = msg.getBody()
chunks, count, data = fetcher.parse_answer(message)
output.append(data)
if chunks == count:
result = base64.b64decode(''.join(output))
Proxy
In order for the tunnel to be used transparently, an HTTP proxy is implemented. The proxy server binds to port 3128 / tcp and waits for requests. Received requests are sent to the bot server for processing, the result is decoded and sent to the client. From the point of view of client applications, our proxy is no different from “ordinary” ones.
To create a TCP server, use the SocketServer.StreamRequestHandler from the standard library.
class RequestHandler(SocketServer.StreamRequestHandler):
def handle(self):
data = self.request.recv(1024)
method, url, headers = parse_http_request(data)
if url is not None:
response = fetch_file(server_jid, client_jid, password, url)
self.wfile.write(response)
self.request.close()
The parse_http_request () function parses an HTTP request, pulling url, headers and http version from it; fetch_file () - requests url using bot client.
Conclusion
The full source code is available here as a shar archive (you need to run the file and execute it as a shell script). Of course, this is more a prototype than a full-fledged application, but the prototype is working and at least small files can be downloaded without problems. This should be enough for the main purpose of the article: to demonstrate the "non-interactive" application of the IM bot.
There are a lot of things that can be improved in the project - from adding authentication, normal support for request types, to work on performance. It is very interesting what kind of performance can be achieved with such an architecture, the study of which, perhaps, I will take up soon.