Deepwalker December 13, 2009 at 12:53 pm

Twisted in action - memcache in python

Preamble

In connection with the weekend, I spent a little time implementing the Memcache server using the Twisted python framework. As a result, I got a performance that is twice as slow, which I do not consider very critical, as well as the ability to implement a couple of extensions of the original protocol. Optimizations are also possible that will further improve performance.
The protocol has not been fully implemented - there are still moments to work on, but the standard set / get are fully functional and ready to use.

Facilities

To store the cache, we use the dict base class. As you can imagine, the python dict implementation is fast, this basic type is used in python so actively that it was not left without detailed optimization. Thus, we automatically have a structure for storing the cache in memory. It remains to implement the memcache protocol to provide dict access to other programs.

To implement the server, we use Twisted. There are many variations of non-blocking IO for python today, but Twisted is already a classic, and has enough tools in its arsenal to easily solve such problems.

Network protocol implementation

How are protocols implemented? First of all, of course, you need to find a description of the protocol. I found it here - code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt

After reading the protocol, it becomes clear that we will receive one or two lines from the client, and we can safely divide the first line into elements by spaces. The second line is used in commands that transmit data to the server - set, add, replace, etc. If you want to delve into the article in more detail, then I will send you to read the description yourself, there was no purpose to put his translation here.

Armed with this knowledge, we look at what Twisted can offer us to solve this problem, and immediately find LineOnlyReceiver - a protocol from the Twisted base delivery that works only with protocols exchanging strings, that is, what is needed. Twisted already has an implementation of the memcache protocol, but only its client part, and we are making a server.

1 class MemcacheProtocol (LineOnlyReceiver):
2 "" "
3 Implements the protocol basis - receiving messages from the client
4 and returning the result.
5" ""
6
7 def lineReceived (self, line):
8 debug (repr (line))
9 if not 'parameters' in self.instruction:
10 parameters = line.split (''
11 debug ("Got new command" + parameters [0])
12 self.instruction ['parameters'] = parameters
13
14 # If no data is expected, then execution
15 if parameters [0] in Cache.oneline_commands:
16 self. process ()
17 else:
18 # Received data for a two-line command, for execution
19 debug ("Got data" + line)
20 self.instruction ['data'] = line
21 self.process ()
22
23 def process (self) :
24 # Cache.call returns a
25 for line in Cache.call (self.instruction) generator:
26 # And we send everything that it generates in separate lines
27 debug ("Send line" + line)
28 self.sendLine (line)
29 # Ready for further instructions, napalnik!
30 self.instruction = {}
31
32 def connectionMade (self):
33 debug ("Connected!")
34 self.instruction = {}

As you can see from the code, Cache is used for the actual work. This is a singleton, essentially just a class whose methods are wrapped by the @classmethod decorator. The call to Cache.call should return a generator that will return strings, which, in turn, our protocol implementation, will return to the client.

Parse the request from the client

The first line is the command and parameters, separated by spaces, so we use the string method split, and the output is a list. Next, it must be disassembled into components before the team starts working with the data. I use the class because I like the prospect of accessing parameters by pointing them through a period. The code below already requires reading the description of the protocol, and for a lazy pair of guidance lines:

                                                                                                                        
Data Recording Commands:                                                                                                       
 [noreply] \ r \ n                                                                 
cas  [noreply] \ r \ n                                                               
Receiving data:
get * \ r \ n   
gets * \ r \ n  
delete \ r \ n 
Well and the like.

Implementation of parsing:

1 class Instruction (object):
2 def __init __ (self, i):
3 p = i ['parameters']
4 self.cmd = p.pop (0)
5
6 # Check noreply
7 if p [-1 ] == 'noreply':
8 self.reply = False
9 # Throw it
10 p.pop (-1)
11 else:
12 self.reply = True
13
14 if self.cmd in Cache.storage_commands:
15 # If CAS then there is another parameter (i.e. a special case)
16 if self.cmd == "cas":
17 self.unique = p.pop (-1)
18
19 # Now all parameters are unique, but we want to expand the protocol,
20 # because everything is not as simple as dict (zip ())
21 self.bytes = p.pop (-1)
22 self.exptime = p.pop (- 1)
23 self.flags = p.pop (-1)
24 self.keys = p
25 self.data = i.get ('data', None)
26
27 # get and gets
28 elif self.cmd in ["get "," gets "," getn "," delete "]:
29 self.keys = p
30
31 def __str __ (self):
32 return str (self .__ dict__)

Implementing cache storage and working with it

I immediately expanded the protocol, namely, it is possible to work with embedded data. The cache has been redesigned as a tree, and all operations that specify one key according to the standard can indicate a list of keys separated by spaces. However, it is easy to get rid of this, but then the meaning of the work will be completely unclear.

The Entry class is implemented as a storage unit, which contains a dictionary (childs of type dict) with child instances of Entry. Moreover - the top point in the hierarchy is also an instance of the Entry class.

Here I will give a fragment of the Cache singleton:

1 class Cache (object):
2 # consts
3 storage_commands = ["set", "add", "replace", "append", "prepend", "cas"]
4 oneline_commands = ["get", "gets", "getn", "delete", "incr", "decr", "stats"]
5
6 # cache storage
7 data = Entry (0,0,0)
8
9 # cache operations
10 @classmethod
11 def call (cls, instruction):
12 i = Instruction (instruction)
13 debug (i)
14 command = getattr (cls, i.cmd)
15 return command (i)
16
17 @classmethod
18 def set (cls, i):
19 "set, support for nested keys"
20 parent = cls.data.get_child (i.keys [: - 1])
21 if parent:
22 parent.set_child (i.keys [-1],Entry (i.data, i.flags, i.exptime))
23 yield "STORED"
24 else:
25 yield "NOT_STORED" We

look at the Entry code and everything else here - github.com/Deepwalker/tx-cache/blob/master/mck.py

Amendments

As was rightly noted in the comments, using singleton to store Cache is somewhat unjustified, so github has a revised version that fixes this annoying flaw. Thank you, drJonnie !

Excuse

What do I expect from this article? I expect that smart people, of whom there are many, will look at my code and poke their noses at the shortcomings of the lack of academic education. Perhaps this article will be useful to someone, with possible shortcomings, the path to the implementation of network protocols in your programs is described here. Someone might use this code for their memcache extensions.

Tags: