Debugging RewriteRule rules, or a little about the intimate life of mod_rewrite

    RewriteEngine has always been a pretty stressful topic for me . Only recently, I suddenly discovered that everything somehow calmed down and became more or less clear. Since I am a completely ordinary person, I’m sure that the situation of the web server configuration error didn’t just get me, and I hasten to share my experience.

    It turned out to be a cross between a guide to using the mod_rewrite module and a kind of reference for configuring a web server using a .htaccess file. Along the way, I would like to focus on particularly complex or non-obvious points.

    It is assumed that the reader uses url rewriting in his work, knows, in general terms, what RewriteEngine isand already spent several hours setting it up. This article is not really for beginners, but not for super-pros, of course.


    Initial data for experiments


    • All experiments are performed on the local host.
    • Lampp server installed
    • Apache Version: 2.4.9 (Build for Unix)
    • The / opt / lampp / htdocs / bbb / _engine folder contains an experimental site on the engine.bbb.ru domain . The specified folder is the root (DocumentRoot).
    • In the root folder of the site is only one page ind.php .
    • There is one folder on the site / opt / lampp / htdocs / bbb / _engine / local .
    • It contains one script file ind1.php


    Configure Virtual Hosts


    In order to make it easier to work, debug and not get on your nerves where you can not do this, it would be nice to configure virtual hosts in terms of convenience. Consider the simplest settings that will greatly facilitate our lives.

    We configure logs

    Few people on the local server have only one domain. There are usually many domains. It would be nice to separate the logs by domain and by day so that they do not grow too much. This is done through the sectionsettings of our server.

    For the error log on our domain, add the following two lines.

      ErrorLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-error.%Y.%m.%d.log  86400"
      LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
    

    The first sets the name of the error log for the virtual server and forces it to start a new log every 86400 seconds. Rotatelogs is a program that is generally included with the Apache web server and I hope you have it installed too.

    The second line sets the format of each line of the error log. See the Apache server documentation for details. Everything is pretty clear there. As part of this article, it's important to just keep in mind that the format is customizable.

    For the access log, I include only one line at my place. The default string format usually suits me.

      CustomLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-access.%Y.%m.%d.log  86400" combined
    

    In both cases, pay attention to the paths to the web server programs and to the logs. You must install the paths that exist on your computer.

    The Most General Information on How Everything Works


    Suppose we want to enable the address rewriting service in some folder of our domain. To do this, we have a line

    RewriteEngine on
    

    in the .htaccess file that will manage this folder. In addition, we have other directives and several rules for rewriting below this line.

    Suppose the server receives some url as input. RewriteEngine starts checking this url using the rules. He does it from top to bottom in order. If the input URL DOES NOT SATISFY any rule, then it, that is called "passes". So, for example, suppose we have an index.php file in the root folder . If the input uri "/index.php" does not satisfy any rule, then we will see the result of this script in the browser.

    If we have the following rule

    RewriteRule ^index\.php$ / [L]
    

    then obviously this rule for uri “index.php” will work. In this case, the uri will be rewritten to "" and the new uri "/" will be sent to the server input. And the whole process of applying the rules will go anew. Only if the uri "/" does not satisfy any of our rules, we will see what we want. And if it satisfies, then it will be rewritten again and everything will be repeated anew.

    How the flag [L] works


    Probably, this flag introduces a lot of misunderstandings. The presence of a flag prevents the input uri from being checked for the rules following it, if this rule worked. That's all. That is, if our uri “index.php” passed the test (the rule worked for it), then, due to the presence of the [L] flag, we interrupt all subsequent checks, and the web server immediately rewrites “index.php” -> "" and receives at the input uri "/" ([INTERNAL REDIRECT]), and everything is repeated from the beginning, from the first rule. And if this flag is not present, then rewriting still occurs and verification continues from the next rule. But the uri will be already changed, namely "/".

    Understanding this process immediately prevents many cyclical redirects.

    But let me, does the above mean that if you do not use the [L] flag, we will save time and the page will open faster? We stumble upon the flag [L] and must go through all the rules anew, without exception, and if you do not set the flag [L], then we will rewrite the rule that worked, go through the end of all the rules and do that?

    I checked. This does not work. In the absence of flags [L], the module, as expected, replaces the uri on the triggered rule, goes through all the remaining rules to the end, then produces [INTERNAL REDIRECT] and still passes all the rules with this uri again. That is, it confirms what we wrote above. This rule seems to have no exceptions.

    Conclusion:whenever the RewriteRule rule is triggered, [INTERNAL REDIRECT] and all the rules are reapplied. This second pass starts either immediately after applying the rule with the [L] flag, or after all the rules end if we work without the [L] flags. The situation of "passing through" the URL, and it is called "pass through", can happen only if no rule has been applied. The [L] flag can indeed reduce uri processing time and should be used wherever possible.

    What is a RewriteBase?


    This instruction, in my opinion, is simply a record holder for incomprehensibility! I would give her a prize for that! In view of this, I have two stories about this beast - short and long. A short story for those who do not want to bother with this instruction. Long for those interested.

    Short story

    If you are doing relatively simple url rewriting using .htaccess files, then I recommend that you always do the following.

    • Do not use the described directive at all.
    • In all rewriting rules, the target URL should always start with a slash (indicating that uri is relative to the root of the site)


    Long story

    When rewriting, the following processes will occur:

    We have:
    • Document Root: / opt / lampp / htdocs / bbb / _engine
    • The .htaccess file is in the same place in Document Root

    We request url engine.bbb.ru/ind.php
    • The rewriting service will bring the path of the requested file to its path in the file system, namely opt / lampp / htdocs / bbb / _engine / ind.php
    • Remove the opt / lampp / htdocs / bbb / _engine / prefix from it (matches the path to the folder in which .htaccess is located)
    • Will apply rewriting rules using the string “ind.php”

    If in the / opt / lampp / htdocs / bbb / _engine / local folder there is no .htaccess file or there is, but RewriteEngine is not included in it
    • We request url engine.bbb.ru/local/ind1.php
    • The rewriting service will bring the path of the requested file to its path in the file system, namely opt / lampp / htdocs / bbb / _engine / local / ind1.php
    • Remove the opt / lampp / htdocs / bbb / _engine / prefix from it
    • Will apply rewriting rules using the string “local / ind.php”


    If there is a .htaccess file in / opt / lampp / htdocs / bbb / _engine / local folder and RewriteEngine is included in it
    • We request url engine.bbb.ru/local/ind1.php
    • The rewriting service will bring the path of the requested file to its path in the file system, namely opt / lampp / htdocs / bbb / _engine / local / ind1.php
    • Remove the prefix opt / lampp / htdocs / bbb / _engine / local / from it (this is the path to the folder where the .htaccess file of the / local directory is located)
    • Will apply rewriting rules using the string “ind1.php”

    Attention! Such an algorithm will always be executed. This algorithm expresses the specifics of the term " per-dir ", that is, the " director-wide " approach embedded in the Apache server. The value of the RewriteBase directive does not affect it (the algorithm).

    What does the RewriteBase directive affect?

    You need to remember very well that the URL is specified in the RewriteBase directive! You can not specify there " local / " There will be an error! You can only " / local ".

    Let in our /opt/lampp/htdocs/bbb/_engine/local/.htaccess we specified

    RewriteBase /local
    

    We request url engine.bbb.ru/local/

    Then the rule

    RewriteRule ^$ ind1.php
    

    It will work! And the transition to uri will be made /local/ind1.php

    A rule

    RewriteRule ^$ /ind1.php
    

    also works, but the transition will be done in uri /ind1.php . File not found! We don’t have such a uri (relative to the root of the site)!

    Conclusion 1: The URL that we specify in RewriteBase is added as a prefix to the target uri if it is relative, that is, there is no slash at the beginning.

    Conclusion 2: If we never use relative target uri in the rules, then we do not need the RewriteBase directive either!

    Conclusion 3: If we use “RewriteBase /”, then when the rule is triggered

    RewriteRule ^$ ind1.php
    

    There will be an attempt to go to uri /ind1.php. We just use "/" as a prefix.

    Another experience (hooligan)
    We have the following RewriteEngine rules in the root .htaccess:

    RewriteEngine on
    RewriteRule ^$ ind.php
    

    We ask at the same time engine.bbb.ru

    If RewriteBase is url, then let's install

    RewriteBase http://bbb.ru
    

    Not. Does not pass. Error " RewriteBase: argument is not a valid URL ". Strange, right? But we do not give up! Change RewriteBase!

    RewriteBase //bbb.ru
    

    In this case, there is no error! What is happening with the paths? A lot of interesting things!
    The server honestly receives the path / opt / lampp / htdocs / bbb / _engine / , removes the prefix / opt / lampp / htdocs / bbb / _engine / from it and works with an empty string ('').
    We come across a rule and change the empty line to 'ind.php'
    Honestly add the prefix " //bbb.ru " and go to the next pass. This second pass is equivalent to calling engine.bbb.ru//bbb.ru/ind.php , which, by and large, is not at all what we wanted (there was an initial desire to jump to another site). In short, the idea did not justify itself. As a result, we get a 404 error, which is logical. By the way, " //"were replaced by" / " during server rewriting . The tracing of this example is given much lower.


    How did I get all this breathtaking information about the intimate life of the Apache server? Or finally about debugging


    Really! How did I see the errors that the rename url service throws? After all, this is exactly what debugging is! There is a very useful directive that I inserted into the virtual hosts for the engine.bbb.ru domain . Namely

      LogLevel warn rewrite:trace4
    

    After pasting, I rebooted Apache. And from that moment , trace lines related to the rewrite module began to be inserted in the domain error log, namely in the /opt/lampp/logs/engine-bbb-error.2015.08.08.log file . There are a lot of lines. Why trace4 ? Maybe trace3 can be inserted ? Can. But then it will not be possible to debug the RewriteCond, there will be no detailed information about what and with what pattern we are comparing, and information about some other events (not so important as interesting) will disappear.

    What is warn ? Literally, our LogLevel entry means that for all modules the warn error level and only for the rewrite module is trace4

    What do we get as a result of enabling debugging?

    We get a trace, or a very, very detailed log. There are really a lot of trace lines. If I find myself in a difficult mess with the rules, and after some time of torment I do not succeed, and I decide to enable tracing, then in my .htaccess I turn off all the rules that do not apply to the tested url. I put the comment sign "#" in front of them. After that, I reload the page, which does not work and try to find the necessary lines in the log.

    I present a rewriting trace with the following conditions:

    Request:
    http://engine.bbb.ru/

    Rules:

    RewriteEngine on
    RewriteRule ^$ /ind.php [L]
    


    [Sat Aug 08 15:41:38.664920 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/ -> 
    [Sat Aug 08 15:41:38.664955 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri ''
    [Sat Aug 08 15:41:38.664960 2015] [rewrite:trace2] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] rewrite '' -> '/ind.php'
    [Sat Aug 08 15:41:38.664966 2015] [rewrite:trace1] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd7890/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] internal redirect with /ind.php [INTERNAL REDIRECT]
    [Sat Aug 08 15:41:38.665040 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dde8b8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/ind.php -> ind.php
    [Sat Aug 08 15:41:38.665044 2015] [rewrite:trace3] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dde8b8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri 'ind.php'
    [Sat Aug 08 15:41:38.665046 2015] [rewrite:trace1] [pid 21776] mod_rewrite.c(475): [client 127.0.0.1:45382] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dde8b8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] pass through /opt/lampp/htdocs/bbb/_engine/ind.php
    


    Tracing a bully example from a section of a long story
    [Sat Aug 08 15:09:37.475389 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/ -> 
    [Sat Aug 08 15:09:37.475406 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri ''
    [Sat Aug 08 15:09:37.475411 2015] [rewrite:trace2] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] rewrite '' -> 'ind.php'
    [Sat Aug 08 15:09:37.475414 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] add per-dir prefix: ind.php -> /opt/lampp/htdocs/bbb/_engine/ind.php
    [Sat Aug 08 15:09:37.475418 2015] [rewrite:trace2] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] trying to replace prefix /opt/lampp/htdocs/bbb/_engine/ with //bbb.ru
    [Sat Aug 08 15:09:37.475420 2015] [rewrite:trace4] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] add subst prefix: ind.php -> //bbb.ru/ind.php
    [Sat Aug 08 15:09:37.475422 2015] [rewrite:trace1] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#de2740/initial] [perdir /opt/lampp/htdocs/bbb/_engine/] internal redirect with //bbb.ru/ind.php [INTERNAL REDIRECT]
    [Sat Aug 08 15:09:37.475469 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] add path info postfix: /opt/lampp/htdocs/bbb/_engine/bbb.ru -> /opt/lampp/htdocs/bbb/_engine/bbb.ru/ind.php
    [Sat Aug 08 15:09:37.475473 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] strip per-dir prefix: /opt/lampp/htdocs/bbb/_engine/bbb.ru/ind.php -> bbb.ru/ind.php
    [Sat Aug 08 15:09:37.475476 2015] [rewrite:trace3] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] applying pattern '^$' to uri 'bbb.ru/ind.php'
    [Sat Aug 08 15:09:37.475478 2015] [rewrite:trace1] [pid 21775] mod_rewrite.c(475): [client 127.0.0.1:45327] 127.0.0.1 - - [engine.bbb.ru/sid#85a910][rid#dd8dc8/initial/redir#1] [perdir /opt/lampp/htdocs/bbb/_engine/] pass through /opt/lampp/htdocs/bbb/_engine/bbb.ru
    



    Note that each line is marked with a level indication ( rewrite: trace_ ). Apparently, if you find in the log any one line that you especially need, and want to see only the same type, then change the trace level, restart Apache and repeat the operation. It seems to me personally that this way does not completely facilitate the task. It is much easier, in my opinion, to first copy lines to a separate file, focusing only on the operation time (in minutes). Then separate from them the other necessary lines by removing unnecessary information (search-replace). At first I even thought of making a tool for viewing logs of this kind in PHP. But then the need disappeared by itself (I will dwell on this below).

    Debugging is valid for the virtual host for which it is specified

    If the engine.bbb.ru domain uses external css styles that are taken from the bbb.ru domain , and this is the problem, then you don’t need to enable debugging within the virtual engine.bbb.ru server, but you need to enable debugging in the virtual server bbb.ru . Then all calls to the bbb.ru domain must be looked in the error logs (not access!) Of the bbb.ru domain. In this case, calls to traced objects will not be in the access logs at all!

    And can you not use such a stressful RewriteEngine in general?


    You can switch to using just one script on the whole site and do all the rewriting in it. This is easier to do in PHP, and debugging is much easier. In addition to the obvious advantages in terms of site security, we get the convenience of rewriting without the hassle. In order to switch to such a scheme, our .htaccess should be something like this:

    RewriteEngine on
    # правило перенаправления "с www" на "без www"
    RewriteCond %{HTTP_HOST} ^www\.our-site\.ru$
    RewriteRule ^(.*)$ http://our-site.ru/$1 [R=301,L]
    # всего 4 конкретных файла, которые проходят мимо правила.
    RewriteCond %{REQUEST_URI} !favicon\.ico$
    RewriteCond %{REQUEST_URI} !robots\.txt$
    RewriteCond %{REQUEST_URI} !sitemap\.xml$
    RewriteCond %{REQUEST_URI} !^/dispatch\.php$
    RewriteRule ^.*$ /dispatch.php [L]
    


    And in the script dispatch.php I strongly advise you not to forget to prohibit the direct call of dispatch.php itself.



    Если вам этот подход вдруг захочется перенять, то рекомендую скрипт dispatch.php назвать как-нибудь иначе. Я использовал это название только в целях наглядности.

    К слову, этот подход внедряется довольно активно. Этому мы должны быть благодарны внедрению ЧПУ (урлов, понятных для человека, хотя лично для меня они очень непонятны). Практически во всех современных движках он уже действует.

    Раздел виртуального сервера engine.bbb.ru, который я использовал
    
      ServerAdmin webmaster@serv1.ru
      DocumentRoot "/opt/lampp/htdocs/bbb/_engine"
      ServerName "engine.bbb.ru"
      ServerAlias "www.engine.bbb.ru"  
      ScriptAlias /cgi/ "/opt/lampp/cgi-bin/"
      ScriptAlias /cgi-bin/ "/opt/lampp/cgi-bin/"
      ErrorLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-error.%Y.%m.%d.log  86400"
      LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
      CustomLog "|/opt/lampp/bin/rotatelogs /opt/lampp/logs/engine-bbb-access.%Y.%m.%d.log  86400" combined
    #  Следующую строку раскомментарить для включения отладки.
    #  LogLevel warn rewrite:trace4   
    


    Also popular now: