PHP class for convenient and safe working with MySQL
After writing an article about protection against injections, I set about writing a class that implements the ideas contained in it.
More precisely, since the key functionality has already been used within the framework of the working framework, I began to separate it into an independent class. I take this opportunity to thank the participants of PHPClub for their help in fixing several critical errors and useful comments. Below I will try to describe the main features, but first a small
In a nutshell, the class is built around a set of helper functions that allow you to perform most database operations on one line, while providing (unlike standard APIs) full protection against SQL injection, implemented using an extended set of placeholders that protect any data type, which may get a request.
The class is based on three basic principles:
I will dwell a little more on each of the points.
provided by the very two rules that I formulated in the article:
Unfortunately, standard libraries do not provide full protection against injection, protecting only strings and numbers with prepared statements.
Therefore, to make the defense complete, I had to abandon the obviously limited concept of prepared statements in favor of a broader concept - placeholders. Moreover, placeholders are typed (this thing is known to all of us by the printf () family of functions:% d is a placeholder that tells the parser how to process the substituted value, in this case, as an integer). The innovation turned out to be so successful that it simultaneously solved many problems and greatly simplified the code. I’ll write more about typed placeholders below.
Support for whitelisting filtering is provided by two functions, somewhat far-fetched, but nevertheless necessary.
Here, typed placeholders also helped me a lot, which made it possible to make function calls single-line, transferring immediately both the request and the data for it. Plus a set of helpers resembling those in PEAR :: DB - functions that immediately return the result of the desired type. All helpers are organized in the same way: one required parameter is passed to the function - a request with placeholders, and as many optional parameters as necessary, the number and order of which must coincide with the number and order of placeholders in the request. The functions of the Ind family use one more required parameter - the name of the field by which the returned array is indexed.
Based on my experience, I came to the following set of return values (and, as a result, helpers):
As a result, most of the database accesses come down to one or two lowercase constructions (instead of 5-10 with the traditional approach):
There are only necessary and significant elements in this code, but there is nothing superfluous and repetitive. All giblets are neatly hidden inside the class: the getAll () helper allows you to immediately get the desired result without writing cycles in the application code, and typed placeholders allow you to safely add dynamic elements of any type to the request without writing the bind_param manually. Extra DRY code! In cases of using placeholders? A and? U, the difference in the amount of code becomes even greater:
stand on three pillars:
As a result, all complex queries are collected in the old fashioned way - for example, in a loop - but at the same time, subject to all safety rules!
I’ll give a small example (more complicated examples can be found in the documentation at the link at the bottom of the article):
A fairly common case is when we need to add a condition to the query if there is a variable
It is important to note a few points.
Firstly, since we are not connected by the native API, no one forbids us to parse not the entire request, but only part of it. This turns out to be super-convenient for requests that are collected according to some logic: we parse only part of the request, and then it is inserted into the main request through a special “idle” placeholder to avoid repeated parsing (and observe the rule “any elements are substituted only through placeholder ").
But, unfortunately, this is the weak point of the whole class. Unlike all other placeholders (which, even if used incorrectly, will never lead to an injection), the incorrect use of the placeholder? P can lead to it.
However, protection from a fool would greatly complicate the class, but at the same time, it would not protect against stupidly inserting a variable into the query string. Therefore, I decided to leave it as it is. But if you know a way to solve this problem without too much over-engineering, I would be grateful for the ideas.
However, in the end we got a powerful and lightweight query generator that more than justifies this small flaw.
Powerful because we are not limited by the syntax of the build builder, “SQL written in PHP” - we write pure SQL.
Easy because the entire query compilation API consists of half a dozen placeholders and the parse () function.
Here is my favorite example - insert using Mysql functions
On the one hand, we save the SQL syntax, on the other, we make it safe, and on the third, we completely reduce the amount of code.
First, answer the question, why placeholders in general?
This, in general, is already a common place, but, nevertheless, I repeat - any dynamic data should get into the request only through placeholders for the following reasons:
Developing this concept further, we come to the conclusion that paisholders must be typed . But why?
Here I would like to stop briefly and trace the history of the development of programmer's thought in the field of protection against injections.
At first there was chaos - no protection at all, we shove everything as it is.
Then it didn’t get much better, with the paradigm “we will detach everything that came to the script from the user” and the climax in the form of magic quotes.
Further, the best minds came to the conclusion that it is right to speak not about screening, but about formatting. Since formatting is not always reduced to one iskeeping. So the quote () method appeared in PDO, which did the complete formatting of the string - not only escaped special characters in it, but also enclosed it in quotation marks, not hoping for a programmer. As a result, even if the programmer used this function out of place (for example, for a number), the injection still did not pass (and in the case of naked escaping through mysql_real_escape_string, it easily passes if we put a number in the query without quoting it ) Being used to format the identifier, this function led to an error at the development stage, which prompted the author of the code that he was a little wrong.
Unfortunately, the authors of PDO stopped at this, because the idea that formatting in the request only strings is still firmly held in the minds of developers. But in fact, the request has a lot more elements of the most various types. And each needs its own type of formatting! That is, the only quote () method doesn’t suit us at all - you need a lot of different quotes. And not as an exception, “here you have quoteName (),” but as one of the main concepts: each type has its own format. Well, since there are many types of formatting - the type must be specified somehow. And a typed placeholder is best for this.
In addition, a typed placeholder is VERY convenient!
Firstly, because it becomes unnecessary to use a special operator to bind a value to a placeholder (but it’s still possible to specify the type of value to be transmitted!)
Secondly, since we invented a typed placeholder, we can add a huge amount of these placeholders to solve a lot of routine tasks for compiling SQL queries.
First of all, we will create a placeholder for identifiers - we are desperately lacking it in real life, and not imagined by the authors of the standard APIs. As soon as the developer is faced with the need to dynamically add a field name to the request, each one begins to pervert in his own way, some to the forest, some to firewood. Here, everything is unified with the rest of the query elements, and adding an identifier becomes no more difficult than adding a row. But at the same time, the identifier is formatted not as a string, but in accordance with its own rules - it is enclosed in inverted commas, and inside these quotes are escaped by doubling.
Further more. The next headache of any developer who has ever tried to use the standard prepared statements in real life is the IN () operator. Voila, we have a placeholder for this operation too! Array substitution becomes no more complicated than any other elements, plus it is unified with them - no separate functions, only the letter in the placeholder changes.
In exactly the same way we do a placeholder for SET. I can not resist and demonstrate how simple the code becomes for such a confused query as INSERT ... ON DUPLICATE:
The class currently supports 6 types of placeholders
Which is quite enough for my tasks, but this set can always be expanded with any other placeholders, for example, for fractional numbers. I don’t see the point of making a separate placeholder for NULL - you can always enter it directly into the request.
I decided not to automatically translate PHP null to SQL null. Perhaps this will complicate the code a little (in those rare cases when it is needed), but it will reduce its ambiguity.
By the way, as many could notice, this class is much like the DbSimple library of Dmitry Koterov. But I have fundamental differences with some of the ideas embedded in it.
Firstly, I am opposed to any kind of magic, when the same function can return a different result depending on the type of data transferred. This, perhaps, makes writing a little easier, but at the same time it is monstrously difficult to maintain and debug the code. Therefore, in my class, all magic is minimized, and all operations and data types are always explicitly written.
Secondly, in DbSimple a little, in my opinion, an overcomplicated syntax. On the one hand, braces are a great idea. On the other hand, why, if we have all the power of PHP at our disposal? Therefore, I decided to go the other way and introduced the “internal” - obviously limited — logic to the “external”, limited only by the PHP syntax. The main thing is that any dynamic elements get into the request only through placeholders, and the rest depends only on the developer's imagination (and the parse () function).
The class code is available at Github, github.com/colshrapnel/safemysql/blob/master/safemysql.class.php
Cheat sheet with basic commands and examples: phpfaq.ru/misc/safemysql_cheatsheet_en.pdf
A good idea of the possibilities available on the documentation page examples (unfortunately, not yet complete), phpfaq.ru/safemysql
There are answers to frequently asked questions such as "why do not you use native prepared statements?» And so forth.
But less, I will be glad to answer any questions in the comments, as well as improve on your comments both the class itself and this article.
More precisely, since the key functionality has already been used within the framework of the working framework, I began to separate it into an independent class. I take this opportunity to thank the participants of PHPClub for their help in fixing several critical errors and useful comments. Below I will try to describe the main features, but first a small
disclaimer
There are several ways to work with SQL - you can use the q-builder, you can use ORM, you can work with pure SQL. I chose the last option, because it is closer to me. I do not think the first two are bad at all. It's just that I personally have always been closely within their framework. But by no means do I claim that my option is better. This is just another option. Which can be used, among other things, when writing an ORM. In any case, I believe that having a safe way to work with pure SQL cannot do any harm. But at the same time, it may help the last remaining adherents of using mysql_ * in the application code to finally abandon this vicious practice.
In a nutshell, the class is built around a set of helper functions that allow you to perform most database operations on one line, while providing (unlike standard APIs) full protection against SQL injection, implemented using an extended set of placeholders that protect any data type, which may get a request.
The class is based on three basic principles:
- 100% protection against SQL injection
- At the same time, the protection is very convenient to use, making the code shorter and not longer
- Versatility, portability and ease of development
I will dwell a little more on each of the points.
Security
provided by the very two rules that I formulated in the article:
- Any - without exception! - dynamic elements get into the request only through placeholders.
- Everything that cannot be substituted through placeholders is first run through the white list.
Unfortunately, standard libraries do not provide full protection against injection, protecting only strings and numbers with prepared statements.
Therefore, to make the defense complete, I had to abandon the obviously limited concept of prepared statements in favor of a broader concept - placeholders. Moreover, placeholders are typed (this thing is known to all of us by the printf () family of functions:% d is a placeholder that tells the parser how to process the substituted value, in this case, as an integer). The innovation turned out to be so successful that it simultaneously solved many problems and greatly simplified the code. I’ll write more about typed placeholders below.
Support for whitelisting filtering is provided by two functions, somewhat far-fetched, but nevertheless necessary.
Convenience and brevity of application code
Here, typed placeholders also helped me a lot, which made it possible to make function calls single-line, transferring immediately both the request and the data for it. Plus a set of helpers resembling those in PEAR :: DB - functions that immediately return the result of the desired type. All helpers are organized in the same way: one required parameter is passed to the function - a request with placeholders, and as many optional parameters as necessary, the number and order of which must coincide with the number and order of placeholders in the request. The functions of the Ind family use one more required parameter - the name of the field by which the returned array is indexed.
Based on my experience, I came to the following set of return values (and, as a result, helpers):
- query () - returns mysqli resource. It can be used traditionally with fetch (), etc.
- getOne () - returns the scalar, the first element of the first line of the result
- getRow () - one-dimensional array, the first row of the result
- getCol () - one-dimensional array of scalars - table column
- getAll () - two-dimensional array indexed by numbers in order
- getInd () - two-dimensional array indexed by the values of the field specified by the first parameter
- getIndCol () - an array of scalars indexed by the field from the first parameter. Indispensable for compiling dictionaries of the form key => value
As a result, most of the database accesses come down to one or two lowercase constructions (instead of 5-10 with the traditional approach):
$data = $db->getAll("SELECT * FROM ?n WHERE mod=?s LIMIT ?i",$table,$mod,$limit);
There are only necessary and significant elements in this code, but there is nothing superfluous and repetitive. All giblets are neatly hidden inside the class: the getAll () helper allows you to immediately get the desired result without writing cycles in the application code, and typed placeholders allow you to safely add dynamic elements of any type to the request without writing the bind_param manually. Extra DRY code! In cases of using placeholders? A and? U, the difference in the amount of code becomes even greater:
$data = $db->getAll("SELECT * FROM table WHERE category IN (?a)",$ids);
Versatility and ease of development
stand on three pillars:
- A very small API - half a dozen placeholders and as many helpers.
- We are working with good old SQL, which does not need to be re-taught.
- At first glance, the inconspicuous but incredibly useful function parse (), which was originally intended only for debugging, but eventually grew to a key element in the preparation of complex queries.
As a result, all complex queries are collected in the old fashioned way - for example, in a loop - but at the same time, subject to all safety rules!
I’ll give a small example (more complicated examples can be found in the documentation at the link at the bottom of the article):
A fairly common case is when we need to add a condition to the query if there is a variable
$sqlpart = '';
if (!empty($var)) {
$sqlpart = $db->parse(" AND field = ?s", $var);
}
$data = $db->getAll("SELECT * FROM table WHERE a=?i ?p", $id, $sqlpart);
It is important to note a few points.
Firstly, since we are not connected by the native API, no one forbids us to parse not the entire request, but only part of it. This turns out to be super-convenient for requests that are collected according to some logic: we parse only part of the request, and then it is inserted into the main request through a special “idle” placeholder to avoid repeated parsing (and observe the rule “any elements are substituted only through placeholder ").
But, unfortunately, this is the weak point of the whole class. Unlike all other placeholders (which, even if used incorrectly, will never lead to an injection), the incorrect use of the placeholder? P can lead to it.
However, protection from a fool would greatly complicate the class, but at the same time, it would not protect against stupidly inserting a variable into the query string. Therefore, I decided to leave it as it is. But if you know a way to solve this problem without too much over-engineering, I would be grateful for the ideas.
However, in the end we got a powerful and lightweight query generator that more than justifies this small flaw.
Powerful because we are not limited by the syntax of the build builder, “SQL written in PHP” - we write pure SQL.
Easy because the entire query compilation API consists of half a dozen placeholders and the parse () function.
Here is my favorite example - insert using Mysql functions
$data = array('field'=>$value,'field2'=>$value);
$sql = "INSERT INTO table SET ts=unix_timestamp(), ip=inet_aton(?s),?u";
$db->query($sql, $ip, $data);
On the one hand, we save the SQL syntax, on the other, we make it safe, and on the third, we completely reduce the amount of code.
More about typed placeholders
First, answer the question, why placeholders in general?
This, in general, is already a common place, but, nevertheless, I repeat - any dynamic data should get into the request only through placeholders for the following reasons:
- the most important thing is safety. By adding a variable through the placeholder, we can be sure that it will be formatted correctly.
- formatting locality. This is no less important point. Firstly, the data is formatted immediately before entering the request, and does not affect the original variable, which can then be used elsewhere. Secondly, the data is formatted exactly where it is needed, and not before the start of the script, as with magic quotes, and not in ten possible places of the code by several developers, each of which can rely on the other.
Developing this concept further, we come to the conclusion that paisholders must be typed . But why?
Here I would like to stop briefly and trace the history of the development of programmer's thought in the field of protection against injections.
At first there was chaos - no protection at all, we shove everything as it is.
Then it didn’t get much better, with the paradigm “we will detach everything that came to the script from the user” and the climax in the form of magic quotes.
Further, the best minds came to the conclusion that it is right to speak not about screening, but about formatting. Since formatting is not always reduced to one iskeeping. So the quote () method appeared in PDO, which did the complete formatting of the string - not only escaped special characters in it, but also enclosed it in quotation marks, not hoping for a programmer. As a result, even if the programmer used this function out of place (for example, for a number), the injection still did not pass (and in the case of naked escaping through mysql_real_escape_string, it easily passes if we put a number in the query without quoting it ) Being used to format the identifier, this function led to an error at the development stage, which prompted the author of the code that he was a little wrong.
Unfortunately, the authors of PDO stopped at this, because the idea that formatting in the request only strings is still firmly held in the minds of developers. But in fact, the request has a lot more elements of the most various types. And each needs its own type of formatting! That is, the only quote () method doesn’t suit us at all - you need a lot of different quotes. And not as an exception, “here you have quoteName (),” but as one of the main concepts: each type has its own format. Well, since there are many types of formatting - the type must be specified somehow. And a typed placeholder is best for this.
In addition, a typed placeholder is VERY convenient!
Firstly, because it becomes unnecessary to use a special operator to bind a value to a placeholder (but it’s still possible to specify the type of value to be transmitted!)
Secondly, since we invented a typed placeholder, we can add a huge amount of these placeholders to solve a lot of routine tasks for compiling SQL queries.
First of all, we will create a placeholder for identifiers - we are desperately lacking it in real life, and not imagined by the authors of the standard APIs. As soon as the developer is faced with the need to dynamically add a field name to the request, each one begins to pervert in his own way, some to the forest, some to firewood. Here, everything is unified with the rest of the query elements, and adding an identifier becomes no more difficult than adding a row. But at the same time, the identifier is formatted not as a string, but in accordance with its own rules - it is enclosed in inverted commas, and inside these quotes are escaped by doubling.
Further more. The next headache of any developer who has ever tried to use the standard prepared statements in real life is the IN () operator. Voila, we have a placeholder for this operation too! Array substitution becomes no more complicated than any other elements, plus it is unified with them - no separate functions, only the letter in the placeholder changes.
In exactly the same way we do a placeholder for SET. I can not resist and demonstrate how simple the code becomes for such a confused query as INSERT ... ON DUPLICATE:
$data = array('offers_in' => $in, 'offers_out' => $out);
$sql = "INSERT INTO stats SET pid=?i,dt=CURDATE(),?u ON DUPLICATE KEY UPDATE ?u";
$db->query($sql,$pid,$data,$data);
The class currently supports 6 types of placeholders
- ? s ("string") - strings (as well as DATE, FLOAT and DECIMAL).
- ? i ("integer") are integers.
- ? n ("name") - field and table names
- ? p ("parsed") - to insert already processed parts of the request
- ? a ("array") - a set of values for IN (a string of the form 'a', 'b', 'c')
- ? u ("update") - a set of values for SET (a string of the form `field` = 'value',` field` = 'value')
Which is quite enough for my tasks, but this set can always be expanded with any other placeholders, for example, for fractional numbers. I don’t see the point of making a separate placeholder for NULL - you can always enter it directly into the request.
I decided not to automatically translate PHP null to SQL null. Perhaps this will complicate the code a little (in those rare cases when it is needed), but it will reduce its ambiguity.
By the way, as many could notice, this class is much like the DbSimple library of Dmitry Koterov. But I have fundamental differences with some of the ideas embedded in it.
Firstly, I am opposed to any kind of magic, when the same function can return a different result depending on the type of data transferred. This, perhaps, makes writing a little easier, but at the same time it is monstrously difficult to maintain and debug the code. Therefore, in my class, all magic is minimized, and all operations and data types are always explicitly written.
Secondly, in DbSimple a little, in my opinion, an overcomplicated syntax. On the one hand, braces are a great idea. On the other hand, why, if we have all the power of PHP at our disposal? Therefore, I decided to go the other way and introduced the “internal” - obviously limited — logic to the “external”, limited only by the PHP syntax. The main thing is that any dynamic elements get into the request only through placeholders, and the rest depends only on the developer's imagination (and the parse () function).
The class code is available at Github, github.com/colshrapnel/safemysql/blob/master/safemysql.class.php
Cheat sheet with basic commands and examples: phpfaq.ru/misc/safemysql_cheatsheet_en.pdf
A good idea of the possibilities available on the documentation page examples (unfortunately, not yet complete), phpfaq.ru/safemysql
There are answers to frequently asked questions such as "why do not you use native prepared statements?» And so forth.
But less, I will be glad to answer any questions in the comments, as well as improve on your comments both the class itself and this article.