bsdemon January 23, 2009 at 02:30

We use Python to process HTML forms.

When I first started using django, the most enjoyable moment after ORM, for me, was the django.forms package. Now django is in the past - I use the Werkzeug + SqlAlchemy + Jinja2 stack, and sometimes even try to experiment with non-relational data stores instead of SqlAlchemy. But I never found a replacement for django.forms. Therefore, I decided to quickly sketch out something of my own.

As a result, I came to the following description. At the input, we have data represented by the dict type, and the keys of this dictionary are strings, and the values are strings or other dictionaries of the same structure. For instance:

data = {
    "key1": "value1"
    "key2": {
        "key3": "value3"
    }
}

Further, we have some assumptions regarding this data - some set of rules, which we will call a scheme. Now we need a way to go through all the fields of the data dictionary and check their value for correctness, as well as lead to the necessary types. Everything is simple!

This implies quite understandable requirements for implementation:

* A simple way to describe schemes - I want it to be clear and convenient, that is, declarative.
* Code reuse - it’s rather tedious to describe the same schemes 10 times.
* Definition of schemas for nested data structures - and this may be needed.

Basic principles of implementation
Basic principles

It is assumed that a data validation error will be described with the following exception:

class SchemaValidationError (TypeError):
   def __init __ (self, error):
       self.error = error

Data validation is practically an analysis of data types, so I find it appropriate to inherit from a standard TypeError exception.

The scheme will be set in the form of a class whose attributes will be objects that describe the fields. Since we want to describe nested constructions, the attributes we can have are both string field objects and other schemes. Here's what happens in the first step:

class SchemaElement (object):
    u "" "
    Abstract class for circuit element.
    "" "
    def validate (self, data):
        raise NotImplementedError ()
class Field (SchemaElement):
   u "" "
   The Field class describes a string field.
   "" "
   def validate (self, data):
       raise SchemaValidationError ("not valid value")
class Schema (SchemaElement):
   u "" "
   The Schema class describes a validation scheme.
   "" "
   def validate (self, data):
       # Data validation code data
       return data

Since a circuit element can be either a field or another circuit, I inherited Field and Schema from the general SchemaElement class. This is a composite design pattern, and is great for describing hierarchical data types.

SchemaElement also defines an abstract interface for validation - the validate method. The fact is that now following this interface, we can not distinguish between Field and Schema objects in terms of validation, for us it is one and the same.

The descendants of the Field class will be used to describe the fields of the circuit, that is, to process string values. In order to implement the data validation algorithm for a specific field, you just need to override the validate method, which will return the correct and reduced data data or throw a SchemaValidationError exception in case of an error. The default implementation will always throw an exception.

The Schema class will be used to describe a structure consisting of fields and other schemes. The code for the validate method will be presented a bit later.
Declarative description of circuits

As I have already said, the most successful task seems to be to define schemes in the form of a class whose attributes are other Field and Schema objects. This is called a declarative description. To implement this, we need a metaclass for the Schema container class:

class SchemaMeta (type):
   def __new __ (mcs, name, bases, attrs):
       if not name == "Schema":
           fields = {}
           for base in reversed (bases):
               if issubclass (base, Schema) and not base is Schema:
                   fields.update (base .__ fields__)
           for field_name, field in attrs.items ():
               if isinstance (field, SchemaElement):
                   fields [field_name] = attrs [field_name]
           attrs ["__ fields__"] = fields
       cls = type .__ new __ (mcs, name, bases, attrs)
       return cls
   def __contains __ (cls, value):
       return value in cls .__ fields__
   def __iter __ (cls):
       return cls .__ fields __. items () .__ iter __ ()

The main reason why I use this metaclass is the desire to group all the fields of the schema together and put it in the __fields__ attribute. This will be convenient when processing fields or introspecting the structure, since __fields__ does not contain unnecessary garbage, as if we were going around __dict__ each time.

If we create a class with the name Schema, the metaclass will not process it in any way, if it is another class that inherits from Schema, then first it will collect all the fields of the superclasses in __fields__ from right to left and then add the fields of the current class there.

I also added __contains__ methods, which will check whether the field with the given name is contained inside the scheme, and the __iter__ method, which makes the class with the scheme iterable. Let me remind you that since we defined these methods with a metaclass, we get class methods, which is equivalent to using the classmethod decorator on object methods.

Now it remains to add the __metaclass__ attribute to the Schema class:

class Schema (SchemaElement):
    ...
    __metaclass__ = SchemaMeta
    ...

We can already define the schemes as follows:

>>> class MySchema (Schema):
... my_field = Field ()
>>> class AnotherSchema (MySchema):
... another_field = Field ()
>>> "my_field" in MySchema
True
>>> "another_field" in AnotherSchema
True
>>> "my_field" in AnotherSchema
True

Schema inheritance works - the my_field attribute also appeared in the AnotherSchema schema. To create a schema for validating hierarchical data structures, you just need to add another schema with the schema attribute:

>>> class CompositeSchema (Schema):
        sub_schema = MySchema ()
        my_field = Field ()
>>> "my_field" in CompositeSchema
True
>>> "sub_schema" in CompositeSchema
True
>>> "my_field" in CompositeSchema.sub_schema
True

Validation of data

Validation is performed by the validate method, objects of the Field class themselves must override it, the implementation of the validate method of the Schema class I give here:

class Schema (SchemaElement):
   ...
   def validate (self, data):
       errors = {}
       for field_name, field in self .__ fields __. items ():
           try:
               data [field_name] = field.validate (data.get (field_name, None))
           except SchemaValidationError, error:
               errors [field_name] = error.error
       if errors:
           raise SchemaValidationError (errors)
       return data
   ...

First, for each schema field, the validate method is called with the desired parameter from the data dictionary. If there is an error, it is caught and stored in the errors dictionary. After we have walked through all the fields, the errors dictionary is checked, and if it is not empty, a SchemaValidationError exception is thrown with this dictionary as a parameter. This allows us to collect all errors, starting from the lowest level in the hierarchy.

Now you can try to define several basic fields and schemes and try data validation in action:

class NotEmptyField (Field):
    u "" "
    A class describing a field that cannot be empty.
    "" "
    def validate (self, data):
        print "Field Validation"
        if not data:
            raise SchemaValidationError ("empty field")
class CustomSchema (Schema):
    not_empty_field = NotEmptyField ()
    def validate (self, data):
        print "Validation of schema fields"
        data = super (CustomSchema, self) .validate (data)
        print "Schema Level Validation Code"
        return data

Inside the validate method, we must definitely call the validate method of the superclass. It is also necessary to return data or throw a SchemaValidationError exception. We will check our form in:

>>> schema = CustomSchema ()
>>> try:
... schema.validate ({"not_empty_field": "some value"})
... except SchemaValidationError, e:
... errors = e.error
Validation of schema fields
Field validation
Schema Level Validation Code
>>> schema.errors
{}

Now let's try to provide invalid data for validation:

>>> try:
... schema.validate ({"not_empty_field": ""})
... except SchemaValidationError, e:
... errors = e.error
First, we will validate the schema fields
Field validation
>>> errors
{"not_empty_field": "empty field"}

As expected, data validation failed.
Conclusion

And so, we have a small but already powerful library for data validation. Of course, it is necessary to replenish it with the necessary fields (Class inheritor fields). By the way it turned out pretty compact - no more than 130 lines. If you want to get the source code, you can write to me.

Tags:

We use Python to process HTML forms.

Also popular now: