OpenOffice COM Automation: Reading Clipboard Content

Part One (hope not the last)


For a long time, for me, OpenOffice remained a thing-of-itself. I knew that it was perfectly automated by pythons and basic, but, well, for PHP I couldn’t find a suitable tool. Quite by chance I discovered such an interesting feature of OpenOffice: gaining access to the contents of the Windows clipboard. Then I really lacked the ability to write simple CLI scripts that process text in a buffer in the PHP language. Therefore, I decided to thoroughly figure out how you can steer an open office using Windows from Windows.


Here is the solution


<?php// PHP OpenOffice: работа с COM-объектами
$oo = new COM("com.sun.star.ServiceManager");
$clipboard = $oo->CreateInstance(
 "com.sun.star.datatransfer.clipboard.SystemClipboard");
$converter = $oo->CreateInstance("com.sun.star.script.Converter");
$contents = $clipboard->getContents();
$flavors = $contents->getTransferDataFlavors();
$result = false;
foreach ($flavors as $mm)
  {
  $mime = $mm->MimeType;
  // echo "$mime\r\n"; // DEBUGif ($mime=="text/plain;charset=utf-16")
    {
    $data = $contents->getTransferData($mm);
    // "com.sun.star.uno.TypeClass.STRING" ==> 12
    $result = $converter->convertToSimpleType($data, 12);
    break;
    }
  }
echo $result;


How does it all work


First, the service manager component is created "com.sun.star.ServiceManager", which is needed to connect the buffer and converter components, since "com.sun.star.datatransfer.clipboard.SystemClipboard"it will not work to directly create the buffer component . The manager is responsible for dispatching calls to UNO interface functions. As a result, in response to requests CreateInstance()to the higher “authority”, we get full-fledged instances of the COM components we need.
The contents of the buffer are retrieved by the method getContents(). This content is very cleverly arranged, presented in several different formats (taste and color). A complete set of format flavors is produced by the method getTransferDataFlavors(). As a result, we have a composite object whose elements can be sorted out in a loop foreach (..as..).

Each element in itself is also no less cunning. Using propertyMimeTypethe type of content is determined. This type of content is returned as a regular string. We will only be interested "text/plain;charset=utf-16".

To get the portable buffer data itself, you need a method getTransferData().

And here the first bummer awaits us:


Unlike MimeType, which is a simple text value, the result of the method is issued not by a string (which could then be simply encoded by the function iconv()into the desired encoding), but by a variant type, which is not so easy to make friends with in PHP.

Most likely this is done because, in addition to text content, the buffer can contain pictures, music and other multimedia, and it is not always kosher to output it as a string.

Conversion


This problem is solved by a special converter component "com.sun.star.script.Converter", which is also created by the manager.

The converter has a method that converts variant values ​​into simple types convertToSimpleType(), which needs to feed the variant itself, and pass the "magic" constant 12 ( "com.sun.star.uno.TypeClass.STRING") corresponding to ordinary lines.

But, here is the second bummer:


The result is a string encoded in Windows-1251 , which can lead to distortion or loss of the original characters (in Unicode encoding) that do not fit into the Procrustean bed of the Windows code table.

Disclaimer


In my opinion, the solution turned out to be very elegant, but I expect to get the opposite reaction from the side of real programming gurus, that, like, again, they say, in Habré there is a new generation of shkoloolo PHP-Bydlokoder, sitting in the Vinduzovy command line, and writing Hallows for office COM automation to just read the text from the buffer.

In general, the place for this topic should be on the Q&A blog , and its content was artificially inflated to the size of a "full" article.

Unfortunately, in the Habra-Sandbox interface there is no way to specify the preferred blog for post publication.
Also from the Read-Only account there is no way to write a letter to someone from Habrauser directly, asking them to post a question on the Q&A blog.

Here are the questions themselves:


1. Are there any other alternative ways to access the contents of the Windows clipboard, similar to OpenOffice, without connecting additional php extensions through COM automation, for example, some MS-Office or even Internet Explorer ? The article would be more interesting if the reader was offered a choice of several different ways to solve the problem, and the ability to automatically access the contents of the buffer in any way possible, depending on what specific additional software was installed on the system. That is, to provide some kind of cross-platform (well, or "cross-office", if you want).

2. Well, we somehow learned how to read the buffer, but how now to write something to this buffer ? I immediately have to warn that the decision to write to the buffer will not look so elegant and transparent. At the very least, I never managed to “cycle” the solution to this problem. And, of course, in a full-fledged article, a description of the reverse operation must be present. Although, once again, I repeat that I myself first of all needed to read the contents of the text buffer , and the encoding Windows-1251 was fully consistent with my appetites.

3. Well, if, with the text contents of the buffer, everything is clear, then what about the graphics? I would very much like to get the graphic contents of the buffer, for example, in the form of a GD2 object, and, moreover, to be able to draw directly in the buffer, that is, to be able to synchronize the contents of the buffer with the state of the GD2 object. I remember how, back in the days of Windows 98, a friend of mine made a lasting impression on me by pasting into MsPaint from the clipboard the FILM copied from the Media Player in playback mode. I was just shocked when I saw a moving image against the background of an open drawing. At that time, I still had a poor understanding of how Windows works, and I perceived this as real magic.

PS


The article, of course, would be more useful if it described universal read-write access to any type of content, so that, for example, it would be possible to export content of a specific type to files of the corresponding format.

I really hope to find in the comments to the article the answers to all these questions, or at least these questions themselves, kindly transferred by interested readers of this article to the Q&A blog if this topic still does not manage to leave the Habra sandbox.

I confess that I didn’t give links to authoritative sources of inspiration, and I hope this comment will be compensated for in the comments.

Also popular now: