s_egorov June 17, 2019 at 17:17

PyDaCon meetup at Mail.ru Group: June 22

June 22 Mail.ru Group holds a joint meeting with the organizers of the conference PyCon Russia and PyData Moscow meetup. There are 2 sections waiting for you: reports on Python, the composition of which was formed on the basis of a general list of reports to PyCon Russia and the PyData-track from PyData Moscow meetup. The program of the event: keynote, technical reports, quiz and a lot of useful communication.

Keynote: “How to use JupyterHub 100% using the example of DataGym ML school and Lamoda”
Petr Ermakov, Senior Data Scientist at Lamoda and Data Coach at DataGym

More than two years ago I talked about using jupyter 100%. But what if you are not alone? How to get along on one machine to 20 students studying ML, or an RND team of 15? Ready-made recipes, recommendations and collected rake.

Python track:
“SQL botneks: finding and fixing bottlenecks when scaling”
Mikhail Novikov, lead developer, Fasttrack (fstrk.io)

You are starting a new project. Install the web framework, ORM framework, write models, make database queries. Everything is going well. Then 100,000 users come to you - and the project crashes under load. Your actions?

We had such a situation six months ago. I’ll tell you how we found a way out of it, show our approaches to finding bottlenecks, services that help in this. And I’ll explain why vanilla ORM is evil.

“Comparison of aiopg & asyncpg technologies”
Alexey Firsov, lead developer of aio-libs / aiopg

Let's see how two completely different technologies aiopg & asyncpg work - let's see how they work. What is important, we will not compare speed.

PyData Moscow meetup track:
“Pipeline Design in NLP Project”
Vitaliy Radchenko, Data Scientist, YouScan

In the report, we will focus on world best practices (AllenNLP) and our own experience. We will tell you how to structure your pipeline and the features of each of its components: how to format incoming data, iterators according to the dataset, what the dictionary should be like, data preparation, etc. Examples from real problems will be given and it will be shown how this helps in reproducibility and ease of further use.

“We flow down and Blendim. Analysis of popular Python libraries »
Dmitry Buslov

In the report, we will talk about the most popular libraries for the formation of ensembles. Let's start with simple ensemble in Sklearn-e, then manually assemble the simplest stacking in a couple of lines of code, and then consider the most popular libraries: Vecstack, Heamy, Pystacknet, Mlxtend, Mlens.

PyMC3 - Bayesian Statistical Modeling in Python, Maxim Kochurov, PyMC Dev / Samsung AI / Skoltech

Bayesian statistics recently began to be discussed in the context of deep learning. Unfortunately, this hides its main advantage over standard machine learning approaches. Unlike black-box models, Bayesian approach to white-box modeling. White-box is both good and bad. The analyst is required to fully understand the nature of the problem, only then the Bayesian approach is used at full capacity. It allows us to take into account not only what “data tells us”, but also what “common sense tells us”. The report will discuss why and when all this is necessary and how to conduct and interpret such an analysis in python.

“'Kiss-kis, inhale me through kes' or what rap lovers say: Python for thematic modeling of VKontakte comments”
Dmitry Sergeev, Aalto University / DataGym

We will show how to collect 10 million comments using the VKontakte and YouTube APIs, see what users are listening to different music genres talk about, and give answers to such important questions as:

Can topical modeling help with clustering genres?
Is there something in common between listeners of chanson and jazz?
How to measure the proximity of Kirkorov to Antokha MS?

Follow and subscribe to the events of the PyData.Moscow community

Gathering of participants and registration: 11:00. The beginning of the reports: 12:00.
Address: Leningradsky Prospekt 39, p. 79.

On the registration form , indicate which section you plan to go to: Python or PyData track. Registration on one track does not prohibit visiting another.

Tags:

PyDaCon meetup at Mail.ru Group: June 22

Keynote: “How to use JupyterHub 100% using the example of DataGym ML school and Lamoda” Petr Ermakov, Senior Data Scientist at Lamoda and Data Coach at DataGym

Python track: “SQL botneks: finding and fixing bottlenecks when scaling” Mikhail Novikov, lead developer, Fasttrack (fstrk.io)

“Comparison of aiopg & asyncpg technologies” Alexey Firsov, lead developer of aio-libs / aiopg

PyData Moscow meetup track: “Pipeline Design in NLP Project” Vitaliy Radchenko, Data Scientist, YouScan

“We flow down and Blendim. Analysis of popular Python libraries » Dmitry Buslov