Programming in the style of Teremka

Original author: Ted Jiuba
  • Transfer
I tried to make the translation as accurate as possible, and changed only the name of the company, which is used as an example, but in terms of area of ​​activity and the principle of work in retail it is similar to the one in the original.

Each pancake in the Teremka menu is just a set of about 8 ingredients. With such a simple periodic table of elements, the company earned $ 1.9 billion last year ( no, not Teremok, but still Taco Bell ).
The more I program and design systems, the more I understand that in many cases you can achieve the desired result by trivially combining the basic set of tools given to us by Unix. In the end, functionality is value, and code is debt. This statement is the opposite of the absurd trend of DevOps ( admin-developer), on the basis of which system administrators begin to write unit tests and other things to help developers, and says that Teremk-style programming is developers who know enough about administration (and Unix in general) so that they don’t reinvent the wheel, and come to simple and scalable solutions.

Here is a concrete example: imagine you need to download and burn millions of web pages to disk for further processing. How to do it? The little kids say they need to write a distributed spider on Clojure and run it on EC2, communicating using SQS or 0MQ.

xargs and wget. In the rare case of clogging the Internet channel, you can add split and rsync. A “distributed spider” is actually only about 10 lines of code for a shell script.

Moving on, once you have these millions of pages (or even tens of millions), how will you process them? Of course, you will need Hadoop MapReduce, after all, is this how Google handles web pages?

Bue, to hell with this nonsense:

find crawl_dir / -type f -print0 | xargs -n1 -0 -P32 ./process

32 parallel processes and zero slurred code to support. The requirements are satisfied.

Each time you write a code or use the services of third parties, you introduce the possibility of a failure in your system. I have a lot more trust in xargs than in Hadoop. Yes, I actually trust xargs more than myself in writing a multi-threaded handler. I trust syslog to write messages asynchronously much more than the queue service.

Teremk-style programming is one of the steps to Unix Zen. This is the path that I am just beginning, but dividends are already beginning to flow. To really enter it, you need to throw out a bunch of thoughts on how systems should be designed: I made most of the SOAP server using static files and Apache mod_rewrite. Everything could have been done in the style of the Teremk if I had only found the strength to figure out sed, but I got scared and wrote something in Python.

If you don't want to think about it from a Zen perspective, think from a capitalist perspective: you write code to put food on the table. You can reduce risks using well-known tools, or you can enter unknown land. You probably will not be invited to give a speech at the conference, but the work will be done, and your pager can be forced not to turn off at night.

Also popular now: