StraNNikk September 9, 2014 at 01:49

Impressions of a visit to EuroPython 2014

One of the distinguishing features of the Python language is the conferences dedicated to this language, the so-called PyCons. Not so long ago, I managed to attend one such PyCon - EuroPython 2014. EuroPython is one of the largest European annual conferences on Python, which has been held in Florence for the last three years, and in 2014 for the first time in Berlin. While the memories are still fresh, a small report decided to write what and how it was.

Instead of introducing

I’ll make a reservation right away, there will be only impressions and short abstracts, and there will be no detailed retelling of the content of the reports, since with great desire all of them can be viewed on YouTube - the organizers of this conference not only did not make any commercial secret from the video presentations, they also organized a live broadcast of all these videos (by the way, videos from last year’s conferences can also be found in the public domain on the same YouTube).

And further. Not all reports directly addressed Python. That is, often in the reports there was a review of some useful technologies, and a little side talked about how these technologies can be used in the Python world. Therefore, if in the process of reading a paragraph of this opus you get the thought “so where is the python?” o_O "- I advise you to immediately watch the video - everything will be there.

To begin with, almost every day of the conference was built according to a schedule: a sutra - Keynotes , then reports - 20-45 minutes each (with a break for lunch and coffee breaks), in the evening - Lightning Talks . I think it’s worthwhile to say in more detail what Keynotes and Lightning Talks are .

Keynotes- These are reports, not very technical, with a great deal of philosophy. In my opinion, there is little practical application in them, so in my narrative I will miss them.

Regarding Lightning Talks , these are such long sessions of about 1.5 hours, during which anyone could go out and speak out. Each performance was given about 10 minutes. Among these mini-speakers there were a lot of flames (advertising of their products, advertising of all kinds of events, like PyCon in Brazil, some general philosophical thoughts, etc.). Therefore, in my story I will try to reflect only those performances that seemed to me the most useful and interesting.

Day One (Python vs Haskell)

Since the opening day of the conference was on the first day, there were few reports or anything more or less useful. Actually, the main report of the day: what Python can learn from Haskell . In fact, the report was not only about one Haskell, but also a little about Erlang, but this is not the point. The main idea of the report was that static code analyzers never catch errors of the form 1 + "1", and that the dynamic strict implicit typing in Python is to blame, which entails refactoring problems, etc. The solution is to use annotations (hello, Python 3), use the experimental Python interpreter: mypy , which allows you to specify types of function arguments at the language syntax level. That is, you can write like this:

def fib(n: int) -> None:
    a, b = 0, 1
    while a < n:
        print(a)
        a, b = b, a+b

and this will be correctly interpreted by the interpreter. Of course, the thing is quite interesting, but again, it only works for Python 3 code. I tried looking for mypy in standard Debian turnips and did not find it, but manually compiling it was somehow lazy. Perhaps he would be a good judge, if he was a little more common, there would be support at the IDE level, etc. (By the way, the speaker actively called for a contribution to this project). Statements were also made that mutability is evil, as well as about weak support by Python for algebraic data types. All this, in my opinion, is very, very controversial. Nevertheless, I recommend watching a video of the report at least to have an idea of what is happening in other languages (and, of course, to be ready for a reasoned argument in holivars ala “which language is better”).

Report video 'What can python learn from Haskell?'

I also remember one report from Lightning Talks, a guy (by the way from Russia), PR his library called Architect , the main advantage of which is the addition of the ability to automatically partition tables in the database using ORM (Django, SQLAlchemy, Pony models are supported). From databases - MySQL and postgreSQL. To people who work with these bases, perhaps this lib can sometimes be useful.

Day Two (nix, Kafka, Storm, Marconi, Logstash)

A rather interesting report was made about the nix package manager . In fact, there is a whole distribution built on this package manager. And it's called NixOS . Its usefulness, to be honest, seems somewhat doubtful to me, but the nix package manager itself can be very useful in some cases (especially considering the fact that it does not prohibit the use of the main package manager, i.e. yum or apt). The main feature of this package manager is that all operations performed by it are not destructive. That is, roughly speaking, when installing each new package, the previous version of the package is not overwritten, but a new user environment is created, with a new set of symlinks. This allows:

1. at any point in time roll back to some previous state of the user environment
2. use several versions of packages at the same time (for example, several versions of ffmpeg, or several versions of python). And all this without any skirts in the form of virtualization, dockers, etc.
3. with updates there is no probability of breaking the system, because when updating, the old package is not demolished, but the new package is placed in a separate environment, and at the end of the installation, symlink switching

Of the minuses - if you store all versions of packages with all the dependencies, then naturally there will be more space on the HDD and in the appendage we get some redundancy of packages. In my opinion, these are shortcomings with which you can put up. The report also briefly described how you can build your packages for nix, and in particular, python-language packages. In general, if there is a Dependency hell problem , then nix allows you to solve this problem quite elegantly.

Video report 'Rethinking packaging, development and deployment'

On the same day there was a talk about streaming processing of large amounts of data using Kafka and Storm . The only useful thing that I learned from this report is that Storm is great for processing continuous streaming data (and not static, unlike Hadoop), and Kafka will give a giant head start rabbitMQ in terms of the highest message throughput (100k + / sec messages on a node against 20k / sec in rabbitMQ), but at the same time it loses in terms of the topology of message distribution between consumers. In the context of the report, these two technologists were considered together, and Kafka acted as a transport for delivering messages to Storm .

Video of the report 'Designing NRT (NearRealTime) stream processing systems'

There was a good introductory report about Marconi - this is a messaging system within the framework of OpenStack (who is not aware, OpenStack is completely written in python). Marconi is used as a link between the components of the OpenStack cloud, as well as a stand-alone notification service. Actually it is a direct analogue of SNS and SQS from Amazon. It provides a RESTfull API, can use MongoDB, Redis and SQLAlchemy as a message store (although SQLAlchemy was not recommended in production for performance reasons), there is no support for the AMPQ protocol, but they plan to add it in the future.

'Marconi - OpenStack Queuing and Notification Service' report video

There was also a report about Logstash / Elasticsearch / Kibana - a set of mega-useful utilities for collecting, filtering, storing, aggregating and displaying logs. By the way, the utility of logstash was mentioned several times in various reports from different people. Personally, I have not heard anything particularly new from this report. One of the ideas that was discussed in this report is how to use logstash to track all the logs from one request, as well as collect all logs connected by a single attribute from all components of a distributed system. By the way, an interesting library for logging called Logbook was mentioned during the report . Judging by the description, a worthy alternative to the standard logging library in Python.

Video report 'Log everything with logstash and elasticsearch'

Day Three (Sphinx, gevent, DevOps risk mitigation)

The third day began with writing multilingual Sphinx documentation . This report was very useful for me personally, because in the framework of the project that I am currently working on, the task was to support two language versions of the documentation APIs - English and Russian, and I would like to make this process as simple and transparent as possible. In fact, everything is quite simple. There is such a wonderful GNU utility gettext , which is actively used to internationalize various OpenSource projects (I think that everyone knows about gettext without explanation), and there is a wonderful sphinx-intl package. From the sphinx-th rst-th documentation with the help of simple commands * .po files are prepared, which are then translated in a special gettext-editor, and on the basis of which sphinx documentation is made for any particular selected language. Also in the report was mentioned SAAS Transifex service , which facilitates the work of translators. As I understand it, the general principle of the service is this: using simple console utilities, you can upload and download translation files to this service, which provides translators with a convenient Web-based interface for translating texts. Console utilities for this service, as I understand it, work on the principle of git push / pull. Service is not free. I think everyone interested (who faced the problem of internationalization) watch a video reportoptionally, just look through the slides to understand everything.

Report video 'Writing multi-language documentation using Sphinx'

Of the interesting reports that were on this day: a report about gevent (I considered it important to go to this report, because the webDAV service of the project that I am working on is a little more than fully built on gevent). In fact, they didn’t say anything fundamentally new, they started with an introduction to implementing asynchrony in Python and ended with the actual gevent . If anyone does not know what gevent is, this report will probably seem interesting to those, but for those who are already familiar with this technology , they ’re lying . From the heard interesting things: 1. a web microframework entirely made on gevent, with PostgreSQL support, 2. AMQP library , also entirely made on gevent.

Video report 'Gevent: asynchronous I / O made easy'

There was also a very entertaining report “DevOps Risk Mitigation: Test Driven Infrastructure” , about testing the infrastructure as part of the deploy process. Actually, there is no magic - an RPM is collected, it is rolled out somewhere to test machines, and then we automatically go to these machines through rsh and test everything we can, starting from the HTTP proxy and ending with the system for collecting logs. The speaker, a very colorful old school admin, as I understand it, does not recognize all these puppets / chefs / salts, but he is aware of the idea that to maintain the quality of the product, tests should cover not only one code . In my opinion, the idea is true, and this is really what we should strive for. Perhaps not in such ways, as stated in the report, but nonetheless. All DevOps - must see.

Video report 'DevOps Risk Mitigation: Test Driven Infrastructure'

Fourth day (source protection, SOA from Disqus, abstract debugger architecture, dh-virtualenv)

The day began with a remarkably useful report, “Multiplatform binary packaging and distribution of your client apps” . I think many programmers who write commercial applications at least once in their life thought about a problem: “they can copy and read our code!” That is, in other words, the task arises - to deliver the product in encrypted form, or in the form of binaries, from which it is quite problematic to extract and modify the source code. By the way, Dropbox, whose PC client is written in Python, solves this problem quite hemorrhoids - they put in the installer their own patched version of the Python interpreter, which can read encrypted * .pyc files. Solution proposed in the report:

1. cythonize sources - translate them into * .c
2. we compile received in native extensions
3. collect the exe using PyInstaller
4. We pack the file into setup.exe / dmg / rpm / deb

For more details, I recommend watching the video report and slides . Naturally, each of the 4 stages described by me in the report is analyzed in more detail - code samples are given, how and what to do. And of course, it’s worth mentioning that this kind of obfuscation does not save about reverse engineering, when a person can import an obfuscated package and simply run through the names of methods / variables. By the way, even on this topic, I recommend reading this article here (it is mentioned in the report).

Video report 'Multiplatform binary packaging and distribution of your client apps'

The next was a very good report from one of the Disqus developers . The report talked about the benefits of SOA architecture using the Disqus service as an example. Disqus service is a little more than completely built on Django, more precisely it is divided into a bunch of small microservices (REST APIs, workers, cron-s, etc.), each of which is built on Django. By the way, the speaker clearly explained why Django, and not something else - a large community, a bunch of ready-made solutions + it’s much easier to find specialists. If you look at the technological stack, then Disqus uses uwsgi, django, celery, postgreSQL as the base and redis for the cache from the main components. To fumble common code between their microservices, as I understand it, they collect separate python packages. From the pros of the SOA approach:

1. independent scalability
2. simplicity deploy
3. simplicity of work with code

of the minuses:

1. if any one API is changing (for example, an external API service), then you must remember to catch up with other services under the changed API
2. as mentioned just above - it’s harder to fumble common code between services

Report video 'How Disqus is using Django as the basis of our Service Oriented Architecture'

Python Debugger Uncovered - this is a very cool report from PyCharm developer. I advise all backend developers to look for general erudition how an abstract debugger is arranged in a vacuum. There is no high magic, all debuggers are made on the same principle and likeness using the native tools of the Python language itself. By the way, for reference, PyCharm and PyDev debuggers are combined.

'Python Debugger Uncovered' Report Video

That day there was a very worthwhile talk about the dh-virtualenv tool from Spotify. Spotify uses Debian as the basis for production OSs, and the goal of this utility was to combine the deploy project as a deb-ok with the encapsulated virtualenv. There’s a general sense - on the one hand, Debian is hellishly stable, and Debian packages are convenient because they allow you to register all non-python dependencies (like libxml), on the other hand virtualenv is convenient because it allows you to isolate python dependencies inside yourself, and all these dependencies will be the latest packages, as taken from PyPI. Tulsa dh-virtualenvallows you to combine one with the other, and roughly speaking, automatically collect deb-ki from the current deployed virtualenv. It is put by the way through the usual apt-get. Inside the project, in addition to setup.py and requirements.txt, a debian directory is created that describes the characteristics and dependencies of the deb package (rules, control, etc.), and the console command dpkg-buildpackage -us -uc is used to create the package. virtualenv on the final qa / prod machine do not need to be installed, because it is automatically downloaded and packaged by the utility when creating the package.

Video report 'Packaging in packaging: dh-virtualenv'

I personally remember the Lightning Talks of this day with one very interesting talk about why getattr () should not be abused.
Code example:

import random
class A(object):
    def get_prop(self):
        return getattr(self, 'prop', None)
class B(A):
    @property
    def prop(self):
        return random.chioce(['test prop1', 'test prop2', 'test prop3'])
print(B().get_prop())

This code will always output None, because an exception (due to an incorrect method name, i.e. random.chioce) will be ignored inside getattr.

Day Five (features of working with memory, DB API, make Go from Python)

The report “Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask” was very interesting to me personally, as a person very far from C / C ++ and used to think with more mundane matters. I already knew some things, some things I once again refreshed in my memory. I will not dwell on the details, I will say this - it was especially interesting to listen about existing tools that have real practical use ( objgraph , guppy memory profiler , etc.), and about the fact that in Python you can use different types of libraries that implement low-level malloc (), and what profit will be from these replacements. In general, I personally recommend everyone to see this report. Also on the same day there was another cool talk on a similar topic - “Fun with cPython memory allocator” . Unfortunately, I did not go to him, but judging by the reviews of my colleagues, the report is very worthwhile. Many probably encountered a problem when you create a list in Python of a large number of lines of lines, then delete it, and the memory does not decrease. Here is a report about this problem - how it is, because of what and how to deal with it.

Report video 'Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Report video 'Fun with cPython memory allocator

Further there was a very mixed report “Advanced Database Programming with Python” . For those who, in their practice, have worked little with databases, I recommend listening. You will learn such things as transaction isolation levels, for example, and how they differ from each other, as well as about the specifics of working with databases in python-based (according to PEP 249autocommit = 0 de facto and commit you must not forget to write manually) and about some basic things for query optimization. The report is ambiguous, because the author focuses on many very rare optimizations like, for example, generating the ID of the inserted record in Python, rather than relying on the auto_increment / sequences database. This is certainly good, only experience shows that after hearing such reports, some programmers begin to optimize everything and everything prematurely, and this in 99% of cases leads to very disastrous consequences.

Video report 'Advanced Database Programming with Python

And the last was a presentation from Benoit Shesno, creator of the gunicorn web server . 100500 existing options for multitasking implementation in python were considered, and the new 100501th option is the offset library , which introduces the Go language crawtin into python. During the performance, I delved a bit into the insides of this library - apparently, the basis of this lib is a lower-level implementation of the Crowtin based on the fibers library . The very same offsetbrings higher-level wrappers to the language. Those. roughly speaking, it lets you write programs in python akin to how they would look in Go. In his report, the author just gives examples of the similarity of the code for implementing some abstract task written in Go and written in Python, but using offset . In general, to all those who lack the existing functionality of threads, tornado / twisted, asyncio, gevent, and the multiprocessing module, this library may seem very interesting. To listen to the report itself does not make much sense - it’s better to immediately go into the code on github and try.

Report video 'Concurrent programming with Python and my little experiment'

The final Lightning Talks on that day were remembered by me as a talk about HSTS . A very useful thing I must say, about which few people know. In fact, this is an HTTP response header that tells the browser to always force an HTTPS connection for this hostname. Those. in the future, if the user drives some-url.com in the browser , then the browser itself will automatically substitute https. It is useful for security reasons as well as for reducing the number of redirects from HTTP to HTTPS returned from the server.

Conversations on the sidelines

The conference had a huge number of stands from companies somehow related to Python (Google, Amazon, DjangoCMS, JetBrains, Atlassian, etc.). It was possible to approach all of them and communicate on various questions of interest. We talked quite a lot with the guys from Google (though this was not at the conference itself, but at after party from Google). From the interesting part - they use Python mainly in internal products, well, except maybe in addition to Youtube. They also told us secretly that Google developers do not really like BigTable, and now in Google’s laboratories they are preparing to release a new revolutionary database (code name Spanner), which allows you to do distributed transactions on a cluster, and at the same time has all the advantages of NoSQL. According to rumors, it seems like even Open Source (of which, of course, there are big, big doubts).

We also talked with representatives of DjangoCMS (there’s nothing interesting, the banal unpretentious CMS on Django can be installed on your server, or you can use the SaaS solution) and with representatives of Amazon. Regarding the latter, I asked them a question raised at the highload conference of 2012 about the fact that the throughput of instances is quite different and disproportionate to the type of instance (see presentation- 25th slide), but received in response "well, this is the specifics of virtualization, we can’t say why, contact support." By the way, I think many will be interested, the guys from Amazon handed out questionnaires with questions on Python-themed topics. Already now I don’t remember whether they were playing prizes, or they were hunting in this way. In general, the questions are very specific, from the category “these are the very questions that never occur in practice, but they like to ask at interviews in large offices”:

1. Which is called first when creating an object:
a. __create__
b. __new__
c. __init__
d. __del__

2. What is printed by the last statement in:

def foo(x, l=[]):
    l+=2*x
    print l
foo('a')
foo('bc')

a. ['a', 'b', 'c']
b. ['a', 'bc']
c. ['a', 'a', 'b', 'c', 'b', 'c']
d. ['a', 'a', 'bc', 'bc']

3. What does the last statement print?

class A(str):
    pass
a=A('a')
d={'a':42}
d[a]=42
print type(d.keys()[0])

a. str
b. A
c. dict
d. int

4. Which of the following will these 2 statements return on Python vesrion 2?

5 * 10 is 50
100 * 100 is 10000

a. True, True
b. True, False
c. False, True
d. False, False

In general, the impressions of the conference are very positive. The main stake of the organizers was on communication on the sidelines. In fact, this is the first conference in my memory where reports are immediately posted on YouTube. And although I would rate the level of most reports as average, nevertheless, quite a few interesting things have been said that can be applied in real projects anyway.

Tags: