Leo_Gan February 10, 2016 at 07:29

EDI standard. Technical review

The EDI standard ( Electronic Data Interchange ) is part of older, established systems. But we constantly see how EDI is presented as a modern standard. Is it so? Do we need to consider EDI as the underlying technology for new projects?
Let's look at EDI from a technical point of view, discarding everything else.

Data Format in EDI

EDI uses a delimited text format . It works well for flat data structures such as tables. It is not so good for representing hierarchical data structures. Nested objects are better serialized using tagged formats such as XML and JSON.
Very strange, but it was never built description language ( document definition) to EDI. So many years have passed since the advent of EDI and so much effort has been expended on it, but the description language has not been created. The description language allows you to automate the processing of data, namely their generation, verification, transformation, serialization, deserialization. For comparison, to verify XML data, we take a data scheme (XML Schema, xsd) and the parser automatically checks the data for compliance with this scheme.
You can do without a diagram, but then markup of the document is desirable. XML and JSON documents can also be deserialized without a schema, because the data itself contains the tags (names) of the data elements. EDI has tags only for segments and no tags for elements. Elements are determined by the position within the segment. A universal EDI parser can only parse a document into primitive collections, because the document does not contain either names or types for data elements.

Let's look at the details.

The EDI standard consists of two main parts:

Envelope (batch?) Format (a mix of messaging standards)
Specifications (formats) of documents (a mixture of industrial (domain) standards)

Batch format

EDI defines packages for sets of documents, groups of documents and documents / transactions themselves ( Interchange , Group and Transaction / Document) . Packages are limited to ISA / IEA, GS / GE, ST / SE pairs of segments, respectively.
Note: To illustrate, I am using the EDI X12 version of the standard common in North America. Another version of the standard, EDIFACT, is common in Europe and is not fundamentally different from X12.
Here is an example of the very first segments of all three packages: ISA, GS and ST. An example is taken from here :

ISA * 00 * * 00 * * ZZ * RECEIVERID * 12 * SENDERID * 100325 * 1113 * U * 00403 * 000011436 * 0 * T *> ~
GS * FA * RECEIVERID * SENDERID * 20100325 * 1113 * 24712 * X * 004030 ~
ST * 997 * 1136 ~

What do we see in the first segment?
The last three characters of the ISA segment are separator characters : "*> ~": '~' is the symbol for segment separation; '*' - a symbol for separating elements within a segment; '>' Is the symbol for separating subelements within an element. By changing these characters, we essentially change the formats of packages and documents. In XML and JSON, delimiters are specified in the standard; they cannot be changed. Mutable separators are the rudiments of an era when Unicode was not yet created. But even in those days, making separators mutable was not a good idea. Separators are very important characters. If we can use anycharacters as separators, this not only names the logic of parsing packets into its constituent parts, it greatly complicates the logic of parsing text inside the elements themselves.
Even in the ISA segment, we see elements that define the time and date formats . They help us use custom date and time formats within documents. This made sense in the seventies, when we had to save a few bytes when encoding dates and times. Do we need these elements now, after we have overcome the problem of the “2000th year”, after specialized and very detailed standards for the presentation of time have been created ?
We see elements in the ISA segment that define the sender and destination . In essence, this is -address ( routing ) information. That is, the packaging standard is combined with the addressing standard. Using EDI, we must specify the sender and destination within our data. There are also authorization elements in the ISA segment . The whole idea of placing this authorization information inside the messages themselves was once quite progressive, but now it looks at least naive, and even dangerous. Now we understand that authorization information is much, much more complicated than a pair of values. The same can be said about address information. The EDI standard encourages us to use these elements.
Another element we see confirmation ( acknowledgement request) That is, the creator of the document sets the strategy for using confirmations directly in the document. Is this a good idea? We can use documents in different scenarios. In some of them, confirmations are used at the application level; in others, other protocols are used to increase reliability. Reliability policy is not determined inside the data itself, because reliability is a rather complicated topic in data transmission, determined by many participants in communication.
Inside the packet segments we see the control numbers ( Control Numbers) They are needed in scenarios when we receive a set of documents, but part of the set is lost or distorted along the way, and we are trying to recover as much data as possible. This scenario has not been used for a long time, since a similar reliability problem is usually solved at the lower levels of communication protocols. We do not embed communications reliability at the application level, right?
Another element of the ISA segment is the EDI version ( Standard Identifier ). This is similar to versioning support familiar to us by serialization standards.
In the GS segment is an element that determines the type of document ( Type of Document) For example, this is an order or invoice. There is nothing very bad about this, although setting the type of the document is easier inside the document itself.

As you can see, almost all the elements in the batch segments are either useless or, moreover, dangerous if we use them in accordance with the standard.
Please do not try to use data from packet segments for authentication and addressing.
EDI was created at a time when placing this information in packages was the only option. Now we transfer documents via the Internet and use a wide range of standards and protocols for packaging, addressing, authentication, authorization, reliability, coding, serialization, segmentation, etc., etc. Protocol-specific information is added and deleted throughout the data path, and this information is independent of the data itself.

Is EDI a data format standard or protocol?

EDI is trying to be a protocol, which is why we see these elements of addressing, authorization, and a confirmation request. I do not know how this information can be compared with the OSI protocol layer model.
But still, most of the EDI standard is dedicated to data formats.

Document Formats

Inside the packages we see the documents themselves. But we will not find a standard for a universal, generalized document. The standard defines numerous formats for various types of documents: for orders, for invoices, for inventory descriptions ... Here you will find a small part from the huge list of standardized documents.
EDI follows the famous myth: “Somewhere there is an ideal format that describes everything in the world of the script. We will definitely find this format. We just need to add new scripts and fine-tune old ones. ”
As a result, EDI's standard documents (specifications) are overly complex.
Take one example: We need an invoice for a small local bookstore. We found a suitable standard specification, EDI 850, Purchase Order. At first glance, it looks too detailed. We will not buy food, coal, grain, liquid foods, hazardous foods, or medications. We do not need international addresses. We will not use express delivery services. The EDI specification describes all of these possible options, but there are too many fields in it that we will never use. It is too complicated for our simple document.
There are many industry (domain) standards that are used as a kind of repository of knowledge. But these standards are not used as data transfer standards. (See this articledescribing the issue of industry standards.)

Loops inside documents

The structure of individual documents is quite simple. Documents are composed of a series of segments within which document data is located.
But it turns out that the segments can be combined in groups or in repeating groups, so called loops ( loops ). The piquancy is that these cycles are absolutely not highlighted in the document. We can read about the existence of a cycle in the specification of this particular document. Segments of the same type (with the same tags) can be located either independently or inside loops. To create a parser that recognizes loops (which, I repeat, are not marked in the document in any way) is a rather non-trivial task.
XML and JSON do not have such a problem; hierarchical objects or collections of objects of any nesting level are very easily set using opening and closing tags, either named or unnamed.
EDI tried to sit on two chairs. On the one hand, its document format is similar to the csv format and is convenient for presenting tabular data. On the other hand, he tried to describe hierarchical objects, and this attempt ended very unconvincingly. Of course, we understand this now when we have JSON in front of our eyes. But let's recall that EDI was not made for transmitting tabular data, but for transferring documents whose structure is hierarchical.

A non-technical look at EDI

For a complete picture, I will still list some of the non-technical features of EDI:

EDI standard is not free . This looks pretty weird compared to other standards.
The EDI standard specifications are overly detailed . EDI specifications are so complex that companies need to hire professionals familiar with a particular specification. These experts communicate using special EDI terms, it is almost an EDI language that has nothing to do with business. Look at the EDI agreements between companies. These agreements are full of specific requirements defined by the EDI standard, but far from business requirements.
EDI standard is not stable . A special committee issues modifications to the EDI standard every six months. Each of these versions introduces new refinements. The development of the standard does not follow user requests; rather, it simply follows the schedule. Presumably this is not due to very high requirements for the standard, but because the committee needs to show the results of its work.
EDI was created to save bits and make documents as compact as possible. This requirement still exists, but it is hardly used to transmit documents. Every child now owns a phone that transfers gigabytes of video. The yard is no longer the era of mainframes and teletypes. And it’s rather strange to read reports that seriously discuss resource savings due to the transition from paperwork to EDI.
To save memory, EDI uses codes to represent data wherever possible. As a result, the documents look encrypted, which creates an additional problem in the exchange of code tables.
The EDI standard was created to transmit batches of documents because communications and computers were expensive and slow. Since then, much has changed; communications and computers have become fast and cheap. Data is now transmitted in small messages or streams, and these small messages are the foundation of distributed systems. Document sets are still in use, but not because of slow equipment, but because business processes require it.
There is no standard for the EDI description language . This means that we cannot create a universal parser for processing EDI documents. Parsers must contain descriptions of thousands of existing EDI specifications with lots of detail. (For example, Microsoft provides about 7,000 XML schemas for EDI documents as part of BizTalk Server.) Existing EDI parsers are expensive. To work with EDI documents, we will most likely have to convert EDI documents to XML format and use XML Schema together with an XML parser to process EDI documents: for validation, conversion, serialization, deserialization, creation. Which is done in BizTalk Server.
Due to the lack of a standard EDI description language, documents are described using ... multi-page instructions. Developers of EDI parsers interpret these instructions in different ways, and because of this, different EDI parsers are incompatible .
The EDI standard was created at a time when the development of programs, protocols, and data formats was extremely expensive and took a very long time. Creating a standard for a universal document format was justified. Now data formats are generated on the fly and our programs, as a rule, do not use any universal standards, but create different formats for specific cases. EDI specifications include as many parts as possible to satisfy all users. Modern programs include in the specification of transmitted data only the data that is necessary. The number of elements in the EDI specification that are unnecessary in your particular case will always be very large.
EDI mixes two types of standards: standards for communications and standards for formatting business data. Current trends are exactly the opposite: standards must be independent of each other (orthogonal), which allows them to be mixed in any combination.

As you can see, the EDI standard is outdated in almost every aspect, if we consider it from a technical perspective. There are hardly any rational technical reasons for using it now. But despite this, EDI is still widely used.
In the next part we will try to find reasons for this. Most likely they will not be of a technical nature.

Tags: