Description and validation of tree data structures. JSON-Schema

Many services and applications (especially web services) accept tree data. For example, this form has data received through JSON-PRC, JSON-REST, PHP-GET / POST. Naturally, the task is to validate their structure. There are many options for solving this problem, starting from piling up ifs in controllers and ending with classes that implement validation for various configurations. Most often, to solve this problem, a recursive validator is required that works with data schemes described by a specific standard. One of these standards is JSON-Schema, let's take a closer look.
JSON-schema is a standard for describing data structures in JSON format , developed on the basis of XML-Schema , a draft can be found here(further described will correspond to version 03). The schemas described by this standard have a MIME "application / schema + json". The standard is convenient for use when validating and documenting data structures consisting of numbers, strings, arrays and key-value structures (which, depending on the programming language, can be called: an object, a dictionary, a hash table, an associative array or a map, then the name “object” or “object” will be used). At the moment, there are full and partial implementations for different platforms and languages, in particular javascript, php, ruby, python, java.
Scheme
A schema is a JSON object designed to describe any data in JSON format. The properties of this object are optional; each of them is an instruction of a certain validation rule (hereinafter referred to as the rule). First of all, the scheme can restrict the data type (the rule is type or disallow, it can be either a string or an array):
- string (string)
- number (number, including all real numbers)
- integer (integer, is a subset of number)
- boolean (true or false)
- object (an object, in some languages it is called an associative array, hash, hash table, map or dictionary)
- array (array)
- null ("no data" or "unknown", only null is possible)
- any (any type including null)
Further, depending on the type of data being checked, additional rules apply. For example, if the data being checked is a number, minimum, maximum, divisibleBy can be applied to it. If the data being checked is an array, the rules come into force: minItems, maxItems, uniqueItems, items. If the data being checked is a string, apply: pattern, minLength, maxLength. If the object is checked, the rules are considered: properties, patternProperties, additionalProperties.
In addition to type-specific rules, there are additional general rules, such as required and format, as well as descriptive rules, such as id, title, description, $ schema. The specification defines several microformats, such as: date-time (ISO 8601), date, time, utc-millisec, regex, color (W3C.CR-CSS21-20070719), style (W3C.CR-CSS21-20070719), phone, uri, email, ip-address (V4), ipv6, host-name, which can be additionally checked if defined and supported by the current implementation. These and other rules can be found in more detail in the specification .
Since the schema is a JSON object, it can also be checked by the corresponding schema. The schema to which the current schema corresponds is written in the $ schema attribute. Using it, you can determine the version of the draft that was used to write the scheme. Find these schemes here .
One of the most powerful and attractive functions of JSON-Schema is the ability to refer to other schemes from the scheme, as well as inherit (extend) the scheme (using JSON-Ref links ). This is done using id, extends and $ ref. When expanding the scheme, you cannot redefine the rules, only supplement them. When the validator is working, all rules from the parent and child schemes must be applied to the data being checked. We will consider further examples.
Examples
Suppose there is information about the goods. Each item has a name. This is a string of 3 to 50 characters, with no spaces at the ends. Define a schema for the product name:
{
"$schema": "http://json-schema.org/draft-03/schema#", // ид схемы для этой схемы
"id": "urn:product_name#",
"type": "string",
"pattern": "^\\S.*\\S$",
"minLength": 3,
"maxLength": 50,
}
Well, now with this scheme you can describe or validate any string to match the name of the product. Further, the product has a non-negative price, type ('phone' or 'notebook'), and support for wi-fi n and g. Define the scheme for the goods:
{
"$schema":"http://json-schema.org/draft-03/schema#",
"id": "urn:product#",
"type": "object",
"additionalProperties": false,
"properties": {
"name": {
"extends": {"$ref": "urn:product_name#"},
"required": true
},
"price": {
"type": "integer",
"min": 0,
"required": true
},
"type": {
"type": "string",
"enum": ["phone", "notebook"],
"required": true
},
"wi_fi": {
"type": "array",
"items": {
"type": "string",
"enum": ["n", "g"]
},
"uniqueItems": true
}
}
}
This scheme uses a link to the previous scheme and its extension with the required rule. This cannot be done in the previous scheme, because somewhere the name may be optional, and all the rules will apply.
Performance
The performance of a validator based on JSON-Schema, of course, depends on the implementation of the validator and the full support of rules. Let's make a test on nodejs and the most “complete” JSV validator (you can install it through “npm install JSV”). First, we will generate a thousand different products with invalid properties, then we will drive them through the validator. After that, we show the number of errors of each type.
Test source code
var jsv = require('JSV').JSV.createEnvironment();
console.time('load schemas');
jsv.createSchema(
{
"$schema": "http://json-schema.org/draft-03/schema#",
"id": "urn:product_name#",
"type": "string",
"pattern": "^\\S.*\\S$",
"minLength": 3,
"maxLength": 50,
}
);
jsv.createSchema(
{
"$schema":"http://json-schema.org/draft-03/schema#",
"id": "urn:product#",
"type": "object",
"additionalProperties": false,
"properties": {
"name": {
"extends": {"$ref": "urn:product_name#"},
"required": true
},
"price": {
"type": "integer",
"min": 0,
"required": true
},
"type": {
"type": "string",
"enum": ["phone", "notebook"],
"required": true
},
"wi_fi": {
"type": "array",
"items": {
"type": "string",
"enum": ["n", "g"]
},
"uniqueItems": true
}
}
}
);
console.timeEnd('load schemas');
console.time('prepare data');
var i, j;
var product;
var products = [];
var names = [];
for (i = 0; i < 1000; i++) {
product = {
name: 'product ' + i
};
if (Math.random() < 0.05) {
while (product.name.length < 60) {
product.name += 'long';
}
}
names.push(product.name);
if (Math.random() < 0.95) {
product.price = Math.floor(Math.random() * 200 - 2);
}
if (Math.random() < 0.95) {
product.type = ['notebook', 'phone', 'something'][Math.floor(Math.random() * 3)];
}
if (Math.random() < 0.5) {
product.wi_fi = [];
for (j = 0; j < 3; j++) {
if (Math.random() < 0.5) {
product.wi_fi.push(['g', 'n', 'a'][Math.floor(Math.random() * 3)]);
}
}
}
products.push(product);
}
console.timeEnd('prepare data');
var errors;
var results = {};
var schema;
var message;
schema = jsv.findSchema('urn:product_name#');
console.time('names validation');
for (i = 0; i < names.length; i++) {
errors = schema.validate(names[i]).errors;
for (j = 0; j < errors.length; j++) {
message = errors[j].message;
if (!results.hasOwnProperty(message)) {
results[message] = 0;
}
results[message]++;
}
}
console.timeEnd('names validation');
console.dir(results);
results = {};
schema = jsv.findSchema('urn:product#');
console.time('products validation');
for (i = 0; i < products.length; i++) {
errors = schema.validate(products[i]).errors;
for (j = 0; j < errors.length; j++) {
message = errors[j].message;
if (!results.hasOwnProperty(message)) {
results[message] = 0;
}
results[message]++;
}
}
console.timeEnd('products validation');
console.dir(results);
The results for 1000 checks are quite satisfactory.
(at the same time, some libraries claim an order of magnitude higher speed).
On my laptop (MBA, OSX, 1.86 GHz Core2Duo):
names validation: 180ms
products validation: 743ms
Conclusion
JSON-Schema is a fairly convenient tool for documenting data structures and configuring automatic external data validators in applications. It looks simpler and more readable than XML Schema, while taking up less text space. It is independent of the programming language and can be used in many areas: validating POST request forms, JSON REST API, checking packets when exchanging data through sockets, validating documents in document-oriented databases, etc. The main advantage of using JSON-Schema is standardization and, as a result, simplification of support and improvement of software integration.