alexios March 15, 2012 at 10:57

Honest generation of DOCX files in PHP. Part 2

Hello, dear habrasociety!
We continue the story about generating DOCX using PHP.

What awaits us today:

We will learn how to embed images in a document;
Enlighten on the account of English Metric Units;
Let's make a reserve for the future generation of Exel.

Those who are not in the know are advised to read the first part . Well, who is in the subject - I ask for a cat.

Again

But first, first things first. Since the publication of the last article, a sufficient number of comments have been written: emotional and on the case; the PHPDocx project on the github has several forks. All this suggests that this topic is quite relevant. But some developers do not understand the very essence of my approach. And this approach is to use inheritance: the generator class must be the descendant of ZipArchive. Listen, well, if you don’t want to use inheritance, install PHP 5.4 and use traits in the end! This approach is incomparably better than working constantly through one property:

$this->archive->open( … );
$this->archive->addFile( … );
$this->archive->close( .. );

Why do I need to generate DOCX in PHP? Some developers do not understand why this is necessary at all. I focused on making it possible to save the web page in Word format. Personally, I use my class to save Yandex.Metrica reports in DOCX format. User seriyPS asked why I broke up the text on the line? I did this, assuming that the text is a field from the database, and the line break is a new paragraph. In general, we will not do this for clarity. Make your own breakdown into paragraphs.
In addition, our generator should have the most convenient API. I think I managed to implement it. The API consists of only three methods: constructor, assign, create.
Well, talked, and that's enough. Let's get started.

What's new

Firstly, I significantly changed the code used in that article and put it all into a full-fledged OpenSource library. Links at the end. And now for the points:

1. OfficeDocument and WordDocument Class

As we already understood, in the root of the archive are files necessary for the MS Office document as a whole. The word / folder contains documents necessary for the MS Office Word document directly. The solution suggests itself: to make the class common for MS Office documents, and the successor class for Word documents directly.
I will immediately describe the structure:

// Общий класс для создания генераторов MS Office документов
class OfficeDocument extends ZipArchive{
__construct($filename, $template_path = '/template/' );
protected function add_rels( $filename, $rels, $path = '' );
protected function pparse( $replace, $content );
}
// Класс для создания документов MS Word
class WordDocument extends OfficeDocument{
public function __construct( $filename, $template_path = '/template/' )
// Обращаю внимание, это метод API
public function assign( $content = '', $return = false );
public function create();
}

Why did I do this. This is a reserve for the future, in which we will generate MS Excel files with the XlsxDocument class.
Let's look at the insides.

2. Dynamic linking

Inside the docx file, there are _rels / .xml and word / _rels / document.xml.rels files. They include files in the document. If you do not describe any file in these structures, then it will simply be overweight in the docx document. This way you can just hide the info inside the docx. We in the designers will create arrays of internal links between XML documents. Here, for example, are the links for the MS Office document:

      // Описываем связи для документа MS Office
      $this->rels = array_merge( $this->rels, array(
        'rId3' => array(
          'http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties',
          'docProps/app.xml' ),
        'rId2' => array(
          'http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties',
          'docProps/core.xml' ),
      ) );

The identifier of the included file is the record "rIdN". The app.xml and core.xml files are static. We will simply pack them into the archive using the add_rels method , while creating an XML _rels.xml link description XML file:

    // Генерация зависимостей
    protected function add_rels( $filename, $rels, $path = '' ){
      // Шапка XML
      $xmlstring = '';
      // Добавляем документы по описанным связям
      foreach( $rels as $rId => $params ){
        // Если указан путь к файлу, берем. Если нет, то берем из репозитория
        $pathfile = empty( $params[2] ) ? $this->path . $path . $params[1] : $params[2];
        // Добавляем документ в архив
        if( $this->addFile( $pathfile ,  $path . $params[1] ) === false )
          die('Не удалось добавить в архив ' . $path . $params[1] );
        // Прописываем в связях
        $xmlstring .= '';
      }
      $xmlstring .= '';
      // Добавляем в архив
      $this->addFromString( $path . $filename, $xmlstring );
}

Please note that add_rels is described in OfficeDocument, and is used in both classes: OfficeDocument and WordDocument, since there are two _rels.xml documents inside the docx file that describe the dependencies. This is the win of the OOP approach that I proposed, and here the methodology proposed by VolCh will definitely not work.
As a result, we get a typical _rels:

We will generate and connect the word / document.xml file dynamically. Hopefully with dynamic linking it’s clear. Now with the insertion of the image.

Learning to embed images

First, I will give an XML fragment obtained by an experimental method for insertion into document.xml to get an image in a Word document:

We will need to replace {RID} with the identifier of the connected image, as well as register {WIDTH} and {HEIGHT}.
For the image insertion, as well as for the text insertion, one API method is responsible - assign:

    public function assign( $content = '', $return = false ){
      // Проверяем, является ли $text файлом. Если да, то подключаем изображение
      if( is_file( $content ) ){
        // Берем шаблон абзаца
        $block = file_get_contents( $this->path . 'image.xml' );
        list( $width, $height ) = getimagesize( $content );
        $rid = "rId" . count( $this->word_rels ) . 'i';
        $this->word_rels[$rid] = array(
          "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image",
          "media/" . $content,
          // Указываем непосредственно путь к файлу
          $content
        );
        $xml = $this->pparse( array(
          '{WIDTH}' => $width * $this->px_emu,
          '{HEIGHT}' => $height * $this->px_emu,
          '{RID}' => $rid,
        ), $block );
      }
      else{
        // Берем шаблон абзаца
        $block = file_get_contents( $this->path . 'p.xml' );
        $xml = $this->pparse( array(
          '{TEXT}' => $content,
        ), $block );
      }
      // Если нам указали, что нужно возвратить XML, возвращаем
      if( $return )
        return $xml;
      else
        $this->content .= $xml;
    }

Those who can read the code will notice that the method uses a tricky metric system. It is called English Metric Units (EMU). You can read about it on the English Wikipedia . Briefly: you can get EMU from px by multiplying by a number. Only here on Wikipedia it is written that this number is 12,700. I experimentally found out that it is 8625. With this factor, the picture was displayed pixel by pixel.
And of course, we connect the image file directly to the link structure:

        $rid = "rId" . count( $this->word_rels ) . 'i';
        $this->word_rels[$rid] = array(
          "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image",
          "media/" . $content,
          // Указываем непосредственно путь к файлу
          $content
        );

As a result

As a result, we got a full library. Now we can use it like this:


// Подключаем класс
include 'PHPDocx_0.9.2.php';
// Создаем и пишем в файл. Деструктор закрывает
$w = new WordDocument( "Пример.docx" );
// Использование метода assign
/******************************
/
/ $w->assign( 'text' );
/ $w->assign( 'image.png' );
/ $xml = $w->assign( 'image.png', true );
/ $w->assign( $w->assign( 'image.png', true ) );
/
/******************************/
$w->assign('image.jpg');
$w->assign('Кто узнал эту женщину - тот настоящий знаток женской красоты.');
$w->create();

That's basically it.
The plans: table generation.
Links:
PHPDocx on github .
PHPDocx project page .
Download source .

Tags: