Shorts - short and funny, or how we prepared the site for Habraeffekt
Namba 0: Introduction
One day, thoughtfully leafing through a bashorg, I suddenly realized that most quotes are quite long and very often they are simply too lazy to read. In our age of general rush - it takes a lot of time. Thus, a new format was born: Shorts (from the English word Short - short). A short is a short (from one or two sentences) joke, strictly limited to 255 characters (so as not to be common).
Just at that moment I wanted to learn programming and was looking for a simple task. I posted the site quite quickly - in a couple of evenings, I told colleagues about work and a couple of acquaintances in the IRC, and was about to forget about it, when I suddenly discovered that in two days the site was visited by 500 people, almost a third of whom signed up for RSS feed. It became clear that people liked the concept. Having licked the site a bit externally, I decided to show it to the Khabrasociety - Shorts are short and funny , please love and favor.
PS Given that this is not quite a startup (the project is not yet commercial, and “startup” is too big a word for my craft), the habras conscience acquired over the years of staying on the site did not allow me to write a non-technical article. Therefore, under the cut you will find an interesting story about how we prepared Shorts to Habraeffektu.
Namba 1: Resources
Any web project, and especially a startup, always has a sharp influx of attendance caused by various reasons - publishing an article on a large resource (for example, on Habré), a press release, an advertising campaign, an unexpected mention in the news, etc. Often (as in the last example) a surge in traffic occurs SUDDENLY ™.
This leads to an important conclusion - it’s possible, of course, to put a resource with an attendance of 100k unique per day on the initial VPS for 200 rubles. This will be a great reason for pride in the circle of geek friends, however, most likely, it will lead to the fall of the site at the most crucial moment. In general, it is good when the web production system operates in normal mode by no more than 10% of its capacity. This will allow her to withstand at the time of a surge in attendance. Based on all this, it is better to put such a “lightweight” site as Shorts on a server with a decent supply of power.
Namba 2: FrontEnd and Backend
One of the first precepts of web optimization is to separate static and dynamic content. We will make this a standard solution: Nginx to the frontend, Apache listens at 127.0.0.1. Nginx gives away static content, and if it sees that the request is going to dynamic, it sends the request “inside” to the apache:
server {
listen 10.0.121.124:80;
server_name shortiki.com www.shortiki.com;
# Унифицируем домен
if ($host = 'www.shortiki.com' ) {
rewrite ^/(.*)$ http://shortiki.com/$1 permanent;
}
access_log /var/log/vhosts/nginx-shortiki.com-access.log main;
# Статику отдает нгинкс, пусть браузеры возвращаются за ней раз в месяц. А логи запросов к картинкам нам не нужны.
location ~ ^.+\.(html|jpg|jpeg|gif|png|ico|css|js)$ {
root /usr/home/vhosts/shortiki;
expires 30d;
access_log off;
}
location / {
proxy_pass http://127.0.0.1:8081;
# Здесь в нашем случае можно придумать хитрое кеширование, но это тема для отдельной статьи
}
}
* This source code was highlighted with Source Code Highlighter.
Namba 3: MPM
Today, Apache can work in two main MPM variants (Multi-Processing Modules, they determine how the web server works with threads / child processes, etc.) - prefork and worker.
Prefork works according to the following scheme - many processes, one thread per process, processes process requests. Prefork is stable, but it consumes more memory and is slower than worker.
Worker differs in that it uses many processes, many threads for each process, requests process threads. Worker works much faster than the competitor and uses less memory, but the isolation between the processes is inferior, which can create problems on sites where there are sessions, registrations, and other important data. Given the popularity of the prefix, we will analyze the optimization using its example, although, in my case, it is more logical to use worker with Shorts.
Namba 4: Accept Filters
Accept filters (in this case) are a kernel module that buffers incoming connections and sends a request to the web server only when the correct HTTP request is fully received.
In my case, FreeBSD is used, therefore, we load the module:
# kldload accf_http
We make the module load at each system boot:
echo 'accf_http_load="YES"' >> /boot/loader.conf
Configure the servers, for Apache:
AcceptFilter http httpready
And nginx:
listen 10.0.121.124:80 default sndbuf=16k rcvbuf=8k accept_filter=httpready
And restart the web servers:
/usr/local/etc/rc.d/nginx reload
/usr/local/etc/rc.d/apache22 restart
Namba 5: Tuning Apache
Tuning Apache traditionally begins with the most important recommendation - the first thing to do is to disable unnecessary modules. After the unnecessary modules are thrown out, we begin to twist the basic settings:
MaxClients - a parameter indicating the maximum number of simultaneous connections that the server can hold. If MaxClients = 300, then with 301 simultaneous requests, the last request will queue up and wait until one of the processes is free to service it. The main resource limiting MaxClients is RAM - 300 created child processes of the Apache should fit in memory at the same time. It is usually customary to calculate MaxClients based on the amount of free memory: The
MaxClients = СвободнаяПамять / РазмерПроцессаАпача
size of the Apache child process can be viewed in the RSS column of the output of the top or ps command.
Disable AllowOverride:
AllowOverride none
* This source code was highlighted with Source Code Highlighter.
The fact is that if you leave it turned on, it will force Apache to make a request to the file system every time, checking if there is a .htaccess file.
Turn off ExtendedStatus (adds 1 or 2 system calls for each request):
ExtendedStatus Off
Add FollowSymLinks for the web directory, otherwise Apache will do a symlink check before it or a file every time:
Options FollowSymLinks
* This source code was highlighted with Source Code Highlighter.
Reduce the timeout:
Timeout 10
Add compression of some data types (saves a lot of traffic):
AddOutputFilterByType DEFLATE text/html text/plain text/css text/xml application/x-javascript
* This source code was highlighted with Source Code Highlighter.
MinSpareServers and MaxSpareServers are these parameters that indicate how many child processes to keep idle. For example, if four processes are currently busy with requests, and MinSpareServers is 2, then Apache will start another 2 processes that will stand idle waiting for requests. The trick is that creating a new process is a relatively resource-intensive task, and, in principle, the settings here are reduced to avoiding the situation when the server starts to constantly create / kill processes. With these settings, Apache will always keep 2 processes waiting for requests, but at the same time, if 8 or more processes are idle, it will begin to beat them, freeing up resources. You can put more before the Habraeffect:
MinSpareServers 2
MaxSpareServers 8
MinSpareServers 8
MaxSpareServers 32
StartServers - The number of child processes that Apache will create at startup. If StartServers is less than MinSpareServers, then Apache will catch up to MinSpareServers. This setting depends on the initial load on the site, and the amount of resources in our case:
StartServers 8
MaxRequestPerChild indicates how many requests the process will process before a new one is killed and a new one is launched instead. This is done in order to mitigate the effects of memory leaks. If your code is well written and you don’t have such problems, you can safely increase the value, but setting it higher than 10000 is not recommended, once every 10,000 requests it’s not scary to distort the process, but it’s good prevention.
MaxRequestPerChild 3000
KeepAlive - a setting that allows you to process multiple requests within the same TCP session, and not open a new session for each request. It is important to keep KeepAliveTimeout low, as with a large number of requests, many processes will spend time waiting, and then the Apache will need to start more processes.
KeepAlive On
KeepAliveTimeout 5
Namba 6: PHP Tuning
In our specific case, we will reduce php tuning to installing the php memcache module and the memcached daemon itself, since all other parameters will not particularly affect performance. In FreeBSD, this is simple:
cd /usr/ports/databases/pecl-memcache
make install clean
cd /usr/ports/databases/memcached
make install clean
Namba 7: Add Memkesh
Almost everyone used that go to the site - go from the main page. Therefore, it is most important to optimize it. On the main page, we have one SQL query that takes the 20 most recent shorts from the database:
SELECT sid, sdate, stext, srating FROM quotes ORDER BY id ASC LIMIT $shortik_first, $shortiks_main
* This source code was highlighted with Source Code Highlighter.
It makes no sense to get into the database every time you open the page, and therefore, with a slight movement of the hand, the request turns into:
// Устанавливаем соединение с Memcached
$mem = new Memcache();
$mem->connect('localhost', 11211);
// Обнуляем переменные
$quotesonpage = '';
// Проверяем есть ли в мемкеше нужная запись
if ( !$mem->get['s_main'] ) {
// Если нет - забираем ее из MySQL...
$connect = @mysql_connect ($server, $user, $pass) or die('Could not connect: ' . mysql_error());
@mysql_select_db("ShoDB");
$query = "SELECT sid, sdate, stext, srating FROM quotes ORDER BY id ASC LIMIT $shortik_first, $shortiks_main";
@mysql_set_charset('utf8',$connect);
$get_smain = mysql_query ($query) or die('Cannot execure query: ' . mysql_error());
$quotesonpage = array();
while ($shortik = mysql_fetch_assoc($get_smain)) {
$quotesonpage[] = $shortik;
}
$quotesonpage = array_reverse($quotesonpage);
// ...и добавляем в memcache, с временем жизни в полчаса (1800 секунд).
$mem->set('quotes', $quotesonpage,MEMCACHE_COMPRESSED,1800);
} else {
$quotesonpage = $mem->get['quotes'];
}
* This source code was highlighted with Source Code Highlighter.
Thus, for any number of users, the load of the main page on MySQL is equal to one request in half an hour. Of course, fresh shorts will appear on the site with a delay of half an hour, but this is not scary.
Namba 8: Welcome :)
Welcome to Shorts ! I hope the site will be as warmly received as in the narrow circle of those who came in the early days of the resource.
I remind you that the initial goal of the project was to tighten up programming skills, so do not shoot the programmer's