xenon January 20, 2015 at 20:45

evalidate: securely handling custom expressions

Why is it necessary

Different filtering is everywhere. For example, the netfilter (iptables) firewall has its own syntax for describing packages. The Apache .htaccess file has its own language, how to determine who should be given access to the directory, who is not. DBMS has its own very powerful language (SQL WHERE ...) for filtering records. In mail programs (thunderbird, gmail) - its own filter description interface, according to which letters will be scattered in folders.

And everywhere - your bike.

For the accounting program, it may be convenient for you to allow the user to choose who will receive a higher salary (all women, as well as men aged 25 to 32 years, or up to 50 years if the man has the name Vasya). And for each suitable increase in user terms (+ 2000 rubles + 20% of the previous salary + 1000 rubles for each year of service)

For an online store (or its admin panel) - find all laptops with memory from 4 to 8 Gb, of which there are more than 3 in stock, but not Acer, or even Acer, if they cost less than 30,000 rubles.

Of course, you can add your own complex system of filters and criteria, make a web interface for them, but would it be easier to do everything in a couple of lines?

src="(RAM>=4 and RAM<=8 and stock>3 and not brand=='Acer') or (brand=='Acer' and price<30000)"
success, result = evalidate.safeeval(src,notebook)

I want and prick

The obvious way to add any logic to the program is through eval () . The solution is the simplest, most flexible, but there are big pitfalls - security. What if the user expression does os.system ('rm -rf /')?

An example of how to “fill up” python through eval ():
stackoverflow.com/questions/13066594/is-there-a-way-to-secure-strings-for-pythons-eval
nedbatchelder.com/blog/201206/eval_really_is_dangerous. html (translation on a habr: habrahabr.ru/post/221937 )
tav.espians.com/a-challenge-to-break-python-security.html

Right way

Often in tips, the “right way” is recommended - use the python itself to parse the code from the text form into the AST tree, and then parse this tree on its own, separating the grains from the goats. But how? And here the main problem of cycle marketing enters the arena - while you find a suitable bicycle, or at least a good drawing ... it’s easier to invent a bicycle itself.

Evalidate

Meet evalidate , my little bike for this purpose. Someone may find it useful (I tried to make it flexible enough), and the rest of the source code can serve as an example of how this problem can be solved (well, how code cannot be written, of course).

We put a pip

pip install evalidate

A simple example:

We place a text search string on the bookstore website (we pass the value to the src variable - here they are hardcoded so that the web application is not fenced, but it is quite safe to take them from the user's request), and users can search for books by any available criteria in any combination . Instead of separate buttons, to show “books that are not available”, “cheap books”, “expensive books”, “Books of authors of the deceased before World War II, who lived in Australia or any of the countries of Africa that (books) we have more than in 10 copies, and cost less than $ 1 per 100 pages of the book ”- just one text field.

import evalidate
depot = [
    {
        'book': 'Sirens of Titan',
        'price': 12,
        'stock': 4
    },
    {
        'book': 'Gone Girl',
        'price': 9.8,
        'stock': 0
    },
    {
        'book': 'Choke',
        'price': 14,
        'stock': 2
    },
    {
        'book': 'Pulp',
        'price': 7.45,
        'stock': 4
    }
]
#src='stock==0' # books out of stock
src='stock>0 and price>8' # expensive book available for sale
for book in depot:
    success, result = evalidate.safeeval(src,book)
    if success:
        if result:
            print book
    else:
        print "ERR:", result

In this case, in src we have a "user" code, which can be somehow bad. In the example, there are two options for a “good” code, the first shows books that we don’t have in stock, the second shows expensive books that are available. If you try to slip a bad code (just that does not parse, a code with access to variables that we don’t have in context, a code that uses unresolved operations, for example Call (function call)), then success will be False, and the program will report an error ( But it will not fall, and will not execute bad code).

As an alternative, you can get an AST-tree through evalidate.evalidate () that is generated through ast.parse (or an exception if the code is not parsed or contains unresolved operations), and then compile and execute it through eval ().

node = evalidate.evalidate(src)
code = compile(node,'','eval')
result = eval(code,{},data)

Well, look at the module code (since it’s simple), and make your bike :-)

Community appeal

Evalidate includes its own set of "safe" (?) Python operations by default. Simply, in my personal opinion - they are safe. This means that within 15 minutes it did not occur to me how to do something terrible using only these operations. But maybe it will come to you? Or maybe it’s worth adding some more operations to the list that will make the default configuration more flexible (allow you to use a richer language of expressions), while not creating vulnerabilities? Any ideas?

Tags: