Rule number N + 0: don't be afraid to dig deeper

Background

Hello! My name is Zhenya and I am a programmer. Nothing special. For now.

Today I would like to share a story with a good ending, which convinced me that even if you do not consider yourself a programmer above average and solve trivial tasks, programming as a process can still be very exciting!

I faced one of the most typical tasks: to output data to some kind of text file. The file format is such that it opens on any desktop average device. I was sure of the simplicity of the solution to this problem. But this week, fate decided to teach me a lesson ...

History

It all started with choosing a library to make life easier and choosing a format. The format was chosen by docx and the OenTBS library . It seems that everything in it was as it should - and the file format is the same and you can use the template. But as shown by 3 days of work, this thing, if it can work with nested arrays, is not so obvious that perhaps you need to enter into some kind of sect in order to understand this. I decided to follow the path of least resistance and just shared my thoughts with the monitor about this library and started looking for another.

Next up was a simple tool called odtPHP . Whose site does not seem to be working so far. As you may have guessed, the format had to be changed together with the library - it was odt. If you really want to, you can even rejoice - an open format and all that.

And here the fun began. The document is generated from the template. If you make a template in LibreOffice (namely, it was the first one at my fingertips), then in Word it will only open after a question about recovery. And if you create a template in Word`e? Then without recovery. But after minimal edits in the template, odtPHP threw an error saying that a variable was not found in the template. Frantic yelling and poking a finger at the variable name in the template did not help. It’s strange. That is, it turned out that the desired variable is as if written in the template, but odtPHP cannot find this variable through regular expressions.

Suspicions began to creep in that the space / dash / any-other-character in Word in the source itself could be indicated otherwise. Since I knew that odt, like docx, is simply archived XML, I decided to delve into this issue and, for the sake of confidence, I created the same odt template through Google Drive. After unzipping the odt files from different "creators", we got the following picture:

image


It became obvious that the format was a format, but programs may have a different look at it. As you might guess, all the contents of the file are stored in content.xml. I open it. I am looking for my variables. And lo! I'm on the right track! This is how my variable looks in Word`e (a variant created in Word`e):

image


And so this fragment looks in content.xml:

image


And the regular expression was as simple as possible:

$reg = '@\[!--\sBEGIN\s' . $string . '\s--\](.*)\[!--.+END\s' . $string . '\s--\]@smU';

What isto find out and failed. You might think that this is a space, but why did it get inserted right here, and in other places just a space? Unclear. Predicting the arrangement of such magical spaces in Word was not possible. So it turned out that Word somewhere inserts a space as a space, and somewhere likethat breaks off finding the variable by the minimum template.

So the trivial task turned into a rather fascinating journey through the structure of the odt format!

Morality

No matter what level of knowledge you have and how low your assessment of your skills would not fall, do not be afraid to dig deeper! Unless, of course, you are the driver of the excavator and the sign "Digging is prohibited" is not nearby.

PS It would be very interesting to read such stories in the format of Prehistory-History-Moral from cool developers who could then be told for educational purposes

Also popular now: