
VBulletin Observations or Attempts to Cache Dynamic Content
There are several VPSs in my jurisdiction on which it is spinning ... in general, it is not my area of responsibility, and therefore it is spinning there that spinning, moderately slows down, moderately working. And it turned out that a forum was spinning on one of them, and the forum began to slow down. And I wanted to figure it out ...
Lived a forum, did not bother, showed xm top load in the region of 30-40 percent. And then the hour “X” came and the load jumped to a flat shelf of 90 percent with peaks higher, which, in fact, is not buzzing. Suspicion of DDOS has not been confirmed. According to the logs, the usual workload was observed. Well, before stupidly increasing resources, the idea arose to understand what was happening and try to cache everything that was possible.
Investigation. Part one - what a
Since I was not familiar with the ideology and features of this software, I began to study the problem with the analysis of logs and traffic between visitors and the server. First of all, I was surprised to find that attachments to messages in the forum are given exclusively by the attachment.php script , while the files themselves can be stored in the database, they can be on the local disk, but the return is only through the script. And no other way. That is, we get 8-10 extra twitches of the php interpreter per message branch with 8-10 photos. And this is for every visitor. Since registration is not required for viewing attachments on this forum, attachments can be cached, for example, for a couple of days. Something like this:
The second “revelation” for me was that the forum has archives, and they not only exist, but almost half of the requests are made by them. The appearance of the pages also allows you to cache their contents:
On the formation of the key "$ psUID" I will tell further.
From the point of view of the visitor to the forum, the visitor can be either a registered user or a guest. But a completely different situation arises if we observe the situation “came, walked, logged in, walked, logged in, walked” in terms of the appearance and disappearance of cookies in the browser. So, we clear the cookies for the domain and its subdomains, open HTTPfox and observe what happens:
With uid and PHPSESSID, everything is clear - these are the intrigues of nginx and the php interpreter with the session.auto_start option installed , but the rest are activity followers on the forum. But the main session cookie vBulletin has not yet been observed. Looking ahead, I’ll say that vBulletin does not use the standard php-session (more precisely, it ALMOST does not use), but it conducts its own, whose identifier is stored in the bbsessionhash cookie . So, the user has logged in, but there is no session — that is, he is an anonymous person without a session. Moreover, the links to the forum can take two forms (meaning all the links on the page, not one like that, but the other):
forum.domain.com/forumdisplay.php?s=12b66e447be52ebc84ab16d3f39626fb&f=69
forum.domain.com/forumdisplay .php? f = 69
And if you follow the link of the first type, then the next answer from the forum will be the cookie of the session, but if the link of the second is not. If the cookie didn’t come from the session with the second answer, then you can wander around the forum sessionless and restless until you run into a link of the first type (I couldn’t reveal the pattern of their appearance), or you want to log in. With a successful login, the session cookie will come in any way. If, prior to login, the guest was an anonymous session, then the session will be replaced. It looks like this:
After login, the session is “stable” and leapfrog with links does not occur. The logout procedure does not differ in originality - all existing forum cookies (even those that were not set) are deleted and a cookie of a new ("anonymous") session is written:
That is, at the output we get an anonymous (guest), but one hundred percent having a session.
As a result, from the point of view of forum software AND HTTP headers, we have three types of users: a guest without a session , a guest with a session , a logged-in visitor . Moreover, at the nginx level it’s extremely difficult to distinguish the second from the third.
Now, having understood what cookies are and how they run between the visitor and the server, we can approach the issue of caching dynamic content. As you know, the caching functionality of fastcgi backend responses in nginx is built into the ngx_http_fastcgi_module module. To do this, you need to set the cache zone globally in the http section, and the key in the desired location'e. And if for conditionally static content (images, archives) the key for caching could be considered URIs with minor additions, then for caching dynamics, you also need to consider the user. It would seem like a rule like
It could satisfy both guests and logged in users, however, in practice, visitors began to receive the contents of someone else’s cache. Caching "true" dynamics had to be disabled. I hope the sentence is not final.
However, this information is not useless. Based on it, we can generate a key to limit the frequency of requests based not only on the visitor’s IP address, but also on his status.
We place this fragment of the config in the server section of the nginx config before the description of all location. As a result, we get the original key for the user who has the session and the key based on the IP address for the session visitors who do not have (for example, for search crawlers).
As a result of the efforts undertaken, the total load on the virtual machine decreased from the shelf by 90 percent to a saw by 40 with bursts to 80 percent.
Original
- Forum under vBulletin 3.8.x
- Submitted to forum.domain.com subdomain
- Nginx 1.1.13, php 5.3.x (fpm)
- Apart from the forum, nothing is spinning on this server. ( this is important ).
- Mysql on a separate server, communication over TCP / IP.
Background
Lived a forum, did not bother, showed xm top load in the region of 30-40 percent. And then the hour “X” came and the load jumped to a flat shelf of 90 percent with peaks higher, which, in fact, is not buzzing. Suspicion of DDOS has not been confirmed. According to the logs, the usual workload was observed. Well, before stupidly increasing resources, the idea arose to understand what was happening and try to cache everything that was possible.
Investigation. Part one - what a woman visitor wants
Since I was not familiar with the ideology and features of this software, I began to study the problem with the analysis of logs and traffic between visitors and the server. First of all, I was surprised to find that attachments to messages in the forum are given exclusively by the attachment.php script , while the files themselves can be stored in the database, they can be on the local disk, but the return is only through the script. And no other way. That is, we get 8-10 extra twitches of the php interpreter per message branch with 8-10 photos. And this is for every visitor. Since registration is not required for viewing attachments on this forum, attachments can be cached, for example, for a couple of days. Something like this:
location = /attachment.php {
expires max;
limit_req zone=lim_req_1s_zone burst=5;
fastcgi_pass forum__php_cluster;
include /etc/nginx/fastcgi_params;
include /etc/nginx/fastcgi_params_php-fpm;
fastcgi_cache forum_att__cache;
fastcgi_ignore_headers Cache-Control Expires Set-Cookie;
fastcgi_hide_header Set-Cookie;
fastcgi_hide_header Pragma;
fastcgi_cache_key "$request_method:$http_if_modified_since:$http_if_none_match:$host:$request_uri:";
fastcgi_cache_use_stale updating error timeout invalid_header http_500;
fastcgi_cache_lock on;
fastcgi_cache_lock_timeout 2m;
fastcgi_cache_valid 2d;
}
и где-ньть в http-секци объявим forum_att__cache:
fastcgi_cache_path /var/cache/nginx/att levels=1:2 keys_zone=forum_att__cache:4m max_size=2g inactive=2d;
The second “revelation” for me was that the forum has archives, and they not only exist, but almost half of the requests are made by them. The appearance of the pages also allows you to cache their contents:
location /archive/ {
expires 10d;
limit_req zone=lim_req_1s_zone burst=2;
location ~ \.css$ {
expires max;
}
fastcgi_pass forum__php_cluster;
fastcgi_index index.php;
include /etc/nginx/fastcgi_params;
include /etc/nginx/fastcgi_params_php-fpm;
fastcgi_param SCRIPT_FILENAME $document_root/archive/index.php;
fastcgi_param SCRIPT_NAME $fastcgi_script_name;
fastcgi_cache forum_arc__cache;
fastcgi_hide_header Set-Cookie;
fastcgi_ignore_headers Cache-Control Expires Set-Cookie;
fastcgi_cache_key "$request_method:$http_if_modified_since:$http_if_none_match:$host:$request_uri:";
fastcgi_cache_use_stale updating error timeout invalid_header http_500;
fastcgi_cache_valid 2d;
}
и в http-секцию:
fastcgi_cache_path /var/cache/nginx/arc levels=1:2 keys_zone=forum_arc__cache:4m max_size=2g inactive=2d;
Заодно подстрахуемся от DDOS-атак:
limit_req_zone "$psUID" zone=lim_req_1s_zone:2m rate=1r/s;
On the formation of the key "$ psUID" I will tell further.
Investigation. Part two - authorization in vBulletin
From the point of view of the visitor to the forum, the visitor can be either a registered user or a guest. But a completely different situation arises if we observe the situation “came, walked, logged in, walked, logged in, walked” in terms of the appearance and disappearance of cookies in the browser. So, we clear the cookies for the domain and its subdomains, open HTTPfox and observe what happens:
HTTP/1.1 200 OK
Set-Cookie: PHPSESSID=cdme9rrptft67tbo97p4t1cua5; expires=Wed, 22-Feb-2012 15:04:12 GMT; path=/; domain=.domain.com
Set-Cookie: bblastvisit=1329059052; expires=Mon, 11-Feb-2013 15:04:12 GMT; path=/; domain=.domain.com
Set-Cookie: bblastactivity=0; expires=Mon, 11-Feb-2013 15:04:12 GMT; path=/; domain=.domain.com
Set-Cookie: uid=XCuiGU831OyC8VLqAx/QAg==; expires=Thu, 31-Dec-37 23:55:55 GMT; domain=.domain.com; path=/
With uid and PHPSESSID, everything is clear - these are the intrigues of nginx and the php interpreter with the session.auto_start option installed , but the rest are activity followers on the forum. But the main session cookie vBulletin has not yet been observed. Looking ahead, I’ll say that vBulletin does not use the standard php-session (more precisely, it ALMOST does not use), but it conducts its own, whose identifier is stored in the bbsessionhash cookie . So, the user has logged in, but there is no session — that is, he is an anonymous person without a session. Moreover, the links to the forum can take two forms (meaning all the links on the page, not one like that, but the other):
forum.domain.com/forumdisplay.php?s=12b66e447be52ebc84ab16d3f39626fb&f=69
forum.domain.com/forumdisplay .php? f = 69
And if you follow the link of the first type, then the next answer from the forum will be the cookie of the session, but if the link of the second is not. If the cookie didn’t come from the session with the second answer, then you can wander around the forum sessionless and restless until you run into a link of the first type (I couldn’t reveal the pattern of their appearance), or you want to log in. With a successful login, the session cookie will come in any way. If, prior to login, the guest was an anonymous session, then the session will be replaced. It looks like this:
HTTP/1.1 200 OK
Set-Cookie: bbsessionhash=85745bc6110db5221e159087bf037f24; path=/; domain=.domain.com; HttpOnly
After login, the session is “stable” and leapfrog with links does not occur. The logout procedure does not differ in originality - all existing forum cookies (even those that were not set) are deleted and a cookie of a new ("anonymous") session is written:
HTTP/1.1 200 OK
Set-Cookie: bbsessionhash=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bblastvisit=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bblastactivity=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bbthread_lastview=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bbreferrerid=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bbuserid=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bbpassword=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bbthreadedmode=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bbstyleid=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bblanguageid=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.domain.com
Set-Cookie: bbsessionhash=3d0bdc5dbe8dabae361deebe8f6048d2; path=/; domain=.domain.com; HttpOnly
That is, at the output we get an anonymous (guest), but one hundred percent having a session.
As a result, from the point of view of forum software AND HTTP headers, we have three types of users: a guest without a session , a guest with a session , a logged-in visitor . Moreover, at the nginx level it’s extremely difficult to distinguish the second from the third.
Now, having understood what cookies are and how they run between the visitor and the server, we can approach the issue of caching dynamic content. As you know, the caching functionality of fastcgi backend responses in nginx is built into the ngx_http_fastcgi_module module. To do this, you need to set the cache zone globally in the http section, and the key in the desired location'e. And if for conditionally static content (images, archives) the key for caching could be considered URIs with minor additions, then for caching dynamics, you also need to consider the user. It would seem like a rule like
fastcgi_cache_key "$request_method:$http_if_modified_since:$http_if_none_match:$host:$request_uri:$cookie_bbsessionhash:";
It could satisfy both guests and logged in users, however, in practice, visitors began to receive the contents of someone else’s cache. Caching "true" dynamics had to be disabled. I hope the sentence is not final.
However, this information is not useless. Based on it, we can generate a key to limit the frequency of requests based not only on the visitor’s IP address, but also on his status.
set $psUID "anon";
set $psUCL "anon";
if ($cookie_bbsessionhash) {
set $psUID "$cookie_bbsessionhash";
set $psUCL "user";
}
if ($psUCL = "anon") {
set $psUID "anon:$remote_addr";
}
We place this fragment of the config in the server section of the nginx config before the description of all location. As a result, we get the original key for the user who has the session and the key based on the IP address for the session visitors who do not have (for example, for search crawlers).
results
As a result of the efforts undertaken, the total load on the virtual machine decreased from the shelf by 90 percent to a saw by 40 with bursts to 80 percent.