Using the InternetTools FPC library in Delphi

    In fact, the article is somewhat broader - it describes a method that allows you to transparently use many other libraries (and not only from the world of Free Pascal ), and InternetTools is chosen because of its remarkable feature - this is the case when (surprisingly) is missing Delphi-version with the same broad capabilities and ease of use.

    This library is designed to extract information (parsing) from web documents (XML and HTML), allowing you to use both high-level query languages such as XPath and XQuery to specify the necessary data , and, as one of the options, providing direct access to the elements of the tree, built on the document.

    Brief introduction to InternetTools


    Further material will be illustrated on the basis of a fairly simple task, which involves obtaining those elements of bulleted and numbered lists of this article that contain references, for which, if you refer to the documentation , such a small code is enough (it is based on the penultimate example with minor, minor changes) ):

    uses
      xquery;
    const
      ArticleURL = 'https://habr.com/post/415617';
      ListXPath = '//div[@class="post__body post__body_full"]//li[a]';
    var
      ListValue: IXQValue;
    beginfor ListValue in xqvalue(ArticleURL).retrieve.map(ListXPath) do
        Writeln(ListValue.toString);
    end.

    However, now this compact and object-oriented code can only be written in Free Pascal, we also need to be able to use everything that this library provides in a Delphi application, preferably in a similar style, with the same facilities; It is also important to note that InternetTools is thread-safe (it can be accessed from many streams at the same time), so our option should provide this.

    Ways of implementation


    If we approach the task as far as possible from a distance, then there are several ways to use something written in another PL - they will be 3 large groups:

    1. Placing the library in a separate process , the executable file of which is created by forces, in this case, FPC . This method can also be divided into two categories where network communication is possible:
    2. Encapsulating a library in a DLL (hereinafter sometimes referred to as a “dynamic library”), working, by definition, within a single process. Although COM objects can be placed in a DLL, the article will consider a simpler and less time consuming method, which, with all this, gives the same comfort when calling the library functionality.
    3. Porting . As in the previous cases, the appropriateness of this approach - rewriting code into another language - is determined by the balance between its pros and cons, but in the situation with InternetTools the disadvantages of porting are much more, namely: due to the considerable amount of library code, you need to do some serious work (even taking into account the similarity of programming languages), and also periodically, due to the development of the ported one , the task of transferring patches and new features to Delphi will appear.

    Dll


    Further, in order to provide the reader with the opportunity to feel the difference, there are 2 options that are notable for their ease of use.

    "Classic" implementation


    Let us first try to use InternetTools in a procedural style dictated by the very nature of a dynamic library, capable of exporting only functions and procedures; We will make the style of communication with the DLL look like WinAPI, when the handle of a certain resource is first requested, after which the useful work is performed, and then the destruction of the received handle occurs. It is not necessary to consider this variant as a role model in everything - it is chosen only for demonstration and subsequent comparison with the second one - a kind of poor relative.

    The composition and ownership of the files of the proposed solution will look like this (arrows show dependencies):

    The composition of the files of the "classic" implementation


    InternetTools.Types module


    Since in this case both Delphi and Free Pascal are very similar, it is quite reasonable to select such a common module containing the types used in the DLL export list in order to not duplicate their definition in the application InternetToolsUsage, including functional prototypes from the dynamic library:

    unit InternetTools.Types;
    interfacetype
      TXQHandle = Integer;
    implementationend.

    In this implementation, only one shy type is defined, but later the module will “mature” and its utility will become unquestionable.

    InternetTools Dynamic Library


    The composition and procedures of DLL functions to select the minimum, but sufficient for the fulfillment of the above objectives :

    library InternetTools;
    uses
      InternetTools.Types;
    functionOpenDocument(const URL: WideString): TXQHandle; stdcall;
    begin
      ...
    end;
    procedureCloseHandle(const Handle: TXQHandle);stdcall;
    begin
      ...
    end;
    functionMap(const Handle: TXQHandle; const XQuery: WideString): TXQHandle; stdcall;
    begin
      ...
    end;
    functionCount(const Handle: TXQHandle): Integer; stdcall;
    begin
      ...
    end;
    functionValueByIndex(const Handle: TXQHandle; constIndex: Integer): WideString; stdcall;
    begin
      ...
    end;
    exports
      OpenDocument,
      CloseHandle,
      Map,
      Count,
      ValueByIndex;
    beginend.

    Due to the demonstration nature of the current implementation, the full code is not given - much more important is how this simplest API will be used further. Here you just do not need to forget about the requirement of thread safety, which, although it will require some effort, but will not be something complicated.

    InternetToolsUsage application


    Thanks to the previous preparations, it became possible to rewrite the example with lists in Delphi:

    program InternetToolsUsage;
    ...
    uses
      InternetTools.Types;
    const
      DLLName = 'InternetTools.dll';
    functionOpenDocument(const URL: WideString): TXQHandle; stdcall; external DLLName;
    procedureCloseHandle(const Handle: TXQHandle);stdcall; external DLLName;
    functionMap(const Handle: TXQHandle; const XQuery: WideString): TXQHandle; stdcall; external DLLName;
    functionCount(const Handle: TXQHandle): Integer; stdcall; external DLLName;
    functionValueByIndex(const Handle: TXQHandle; constIndex: Integer): WideString; stdcall; external DLLName;
    const
      ArticleURL = 'https://habr.com/post/415617';
      ListXPath = '//div[@class="post__body post__body_full"]//li[a]';
    var
      RootHandle, ListHandle: TXQHandle;
      I: Integer;
    begin
      RootHandle := OpenDocument(ArticleURL);
      try
        ListHandle := Map(RootHandle, ListXPath);
        tryfor I := 0to Count(ListHandle) - 1do
            Writeln( ValueByIndex(ListHandle, I) );
        finally
          CloseHandle(ListHandle);
        end;
      finally
        CloseHandle(RootHandle);
      end;
      ReadLn;
    end.

    If you do not take into account the prototypes of functions and procedures from the dynamic library, then you can’t say that the code is catastrophically heavy compared to the Free Pascal version, but what if we complicate the task a little bit and try to filter some elements and output the addresses of links remaining:

    uses
      xquery;
    const
      ArticleURL = 'https://habr.com/post/415617';
      ListXPath = '//div[@class="post__body post__body_full"]//li[a]';
      HrefXPath = './a/@href';
    var
      ListValue, HrefValue: IXQValue;
    beginfor ListValue in xqvalue(ArticleURL).retrieve.map(ListXPath) doif{Условие обработки элемента списка}thenfor HrefValue in ListValue.map(HrefXPath) do
            Writeln(HrefValue.toString);
    end.

    It is possible to do this with the current API DLL, but the verbosity of the resulting is already very large, which not only greatly reduces the readability of the code, but also (and this is no less important) removes it from the above:

    program InternetToolsUsage;
    ...
    const
      ArticleURL = 'https://habr.com/post/415617';
      ListXPath = '//div[@class="post__body post__body_full"]//li[a]';
      HrefXPath = './a/@href';
    var
      RootHandle, ListHandle, HrefHandle: TXQHandle;
      I, J: Integer;
    begin
      RootHandle := OpenDocument(ArticleURL);
      try
        ListHandle := Map(RootHandle, ListXPath);
        tryfor I := 0to Count(ListHandle) - 1doif{Условие обработки элемента списка}thenbegin
              HrefHandle := Map(ListHandle, HrefXPath);
              tryfor J := 0to Count(HrefHandle) - 1do
                  Writeln( ValueByIndex(HrefHandle, J) );
              finally
                CloseHandle(HrefHandle);
              end;
            end;
        finally
          CloseHandle(ListHandle);
        end;
      finally
        CloseHandle(RootHandle);
      end;
      ReadLn;
    end.

    Obviously - in real, more complex cases, the volume of what has been written will only grow rapidly, and therefore we proceed to a solution that is free from such problems.

    Interface implementation


    The procedural style of working with the library, as just shown, is possible, but has significant drawbacks. Due to the fact that the DLL as such supports the use of interfaces (as received and returned data types), you can organize work with InternetTools in the same convenient manner as when used with Free Pascal. In this case, the composition of the files should be slightly changed in order to distribute the declaration and implementation of interfaces into separate modules:

    The composition of the interface implementation files

    As before, we will consistently consider each of the files.

    InternetTools.Types module


    Declares the interfaces to be implemented in a DLL:

    unit InternetTools.Types;
    {$IFDEF FPC}{$MODE Delphi}{$ENDIF}interfacetype
      IXQValue = interface;
      IXQValueEnumerator = interface
      ['{781B23DC-E8E8-4490-97EE-2332B3736466}']
        functionMoveNext: Boolean; safecall;
        functionGetCurrent: IXQValue; safecall;
        property Current: IXQValue read GetCurrent;
      end;
      IXQValue = interface
      ['{DCE33144-A75F-4C53-8D25-6D9BD78B91E4}']
        functionGetEnumerator: IXQValueEnumerator; safecall;
        functionOpenURL(const URL: WideString): IXQValue; safecall;
        functionMap(const XQuery: WideString): IXQValue; safecall;
        functionToString: WideString; safecall;
      end;
    implementationend.

    Conditional compilation directives are necessary due to the use of the module in an unchanged form in both Delphi and the FPC project.

    The interface IXQValueEnumeratoris not necessary in principle, however, in order to be able to use the cycles of the form " for ... in ..." as an example , one cannot do without it; the second interface is the main one and is an analog wrapper over IXQValuefrom InternetTools (it is specially made with the same name, to make it easier to correlate the future Delphi code with the library documentation on Free Pascal). If we consider the module in terms of design patterns, then the interfaces declared in it are adapters , albeit with a small feature — their implementation is located in the dynamic library.

    The need to set the call type for all methodssafecallwell described here . The obligation to use WideStringinstead of “native” strings will also not be justified, because the topic of exchanging dynamic data structures with a DLL is beyond the scope of the article.

    InternetTools.Realization Module


    The first one, both in importance and in scope - it is he, as reflected in the title, will contain the implementation of interfaces from the previous one: for both of them, a single class is assigned responsible TXQValue, whose methods are so simple that almost all consist of one line of code (this is quite expected) , because all the necessary functionality is already contained in the library - here you just need to refer to it):

    unit InternetTools.Realization;
    {$MODE Delphi}interfaceuses
      xquery,
      InternetTools.Types;
    type
      IOriginalXQValue = xquery.IXQValue;
      TXQValue = class(TInterfacedObject, IXQValue, IXQValueEnumerator)
      private
        FOriginalXQValue: IOriginalXQValue;
        FEnumerator: TXQValueEnumerator;
        functionMoveNext: Boolean; safecall;
        functionGetCurrent: IXQValue; safecall;
        functionGetEnumerator: IXQValueEnumerator; safecall;
        functionOpenURL(const URL: WideString): IXQValue; safecall;
        functionMap(const XQuery: WideString): IXQValue; safecall;
        functionToString: WideString; safecall; reintroduce;
      publicconstructorCreate(const OriginalXQValue: IOriginalXQValue);overload;
        functionSafeCallException(ExceptObject: TObject; ExceptAddr: CodePointer): HResult; override;
      end;
    implementationuses
      sysutils, comobj,
      w32internetaccess;
    functionTXQValue.MoveNext: Boolean;
    begin
      Result := FEnumerator.MoveNext;
    end;
    functionTXQValue.GetCurrent: IXQValue;
    begin
      Result := TXQValue.Create(FEnumerator.Current);
    end;
    functionTXQValue.GetEnumerator: IXQValueEnumerator;
    begin
      FEnumerator := FOriginalXQValue.GetEnumerator;
      Result := Self;
    end;
    functionTXQValue.OpenURL(const URL: WideString): IXQValue;
    begin
      FOriginalXQValue := xqvalue(URL).retrieve;
      Result := Self;
    end;
    functionTXQValue.Map(const XQuery: WideString): IXQValue;
    begin
      Result := TXQValue.Create( FOriginalXQValue.map(XQuery) );
    end;
    functionTXQValue.ToString: WideString;
    begin
      Result := FOriginalXQValue.toJoinedString(LineEnding);
    end;
    constructorTXQValue.Create(const OriginalXQValue: IOriginalXQValue);begin
      FOriginalXQValue := OriginalXQValue;
    end;
    functionTXQValue.SafeCallException(ExceptObject: TObject; ExceptAddr: CodePointer): HResult;
    begin
      Result := HandleSafeCallException(ExceptObject, ExceptAddr, GUID_NULL, ExceptObject.ClassName, '');
    end;
    end.

    It is worthwhile to dwell on the method SafeCallException- its overlap, by and large, is not vital (operability TXQValuewill not suffer at all), however, the code given here allows you to pass to the Delphi-side the text of exceptions that will arise in the safecall methods (details can be found in an article already cited recently ).

    In addition, this solution is thread-safe - provided that IXQValue, obtained, for example, through OpenURL, is not transferred between streams. This is due to the fact that the implementation of the interface only redirects calls to the already thread-safe InternetTools.

    InternetTools Dynamic Library


    Because of the work done in the modules above, the DLL only needs to export a single function (compare with the variant where the procedural style was used):

    library InternetTools;
    uses
      InternetTools.Types, InternetTools.Realization;
    functionGetXQValue: IXQValue; stdcall;
    begin
      Result := TXQValue.Create;
    end;
    exports
      GetXQValue;
    begin
      SetMultiByteConversionCodePage(CP_UTF8);
    end.

    The procedure call is SetMultiByteConversionCodePagedesigned to work correctly with Unicode strings.

    InternetToolsUsage application


    If we now arrange the Delphi-solution of the original example based on the proposed interfaces, then it will hardly differ from that on Free Pascal, which means that the task set at the very beginning of the article can be considered completed:

    program InternetToolsUsage;
    ...
    uses
      System.Win.ComObj,
      InternetTools.Types;
    const
      DLLName = 'InternetTools.dll';
    functionGetXQValue: IXQValue; stdcall; external DLLName;
    const
      ArticleURL = 'https://habr.com/post/415617';
      ListXPath = '//div[@class="post__body post__body_full"]//li[a]';
    var
      ListValue: IXQValue;
    beginfor ListValue in GetXQValue.OpenURL(ArticleURL).Map(ListXPath) do
        Writeln(ListValue.ToString);
      ReadLn;
    end.

    The module is System.Win.ComObjconnected not by chance - without it, the text of all safecall exceptions will become a faceless Exception in safecall method, and with it the original value generated in the DLL.

    A slightly complicated example likewise has minimal differences in Delphi:

    ...
    const
      ArticleURL = 'https://habr.com/post/415617';
      ListXPath = '//div[@class="post__body post__body_full"]//li[a]';
      HrefXPath = './a/@href';
    var
      ListValue, HrefValue: IXQValue;
    beginfor ListValue in GetXQValue.OpenURL(ArticleURL).Map(ListXPath) doif{Условие обработки элемента списка}thenfor HrefValue in ListValue.Map(HrefXPath) do
            Writeln(HrefValue.ToString);
      ReadLn;
    end.

    Remaining library functionality


    If you look at the full capabilities of the interface IXQValue of InternetTools, it will be seen that the corresponding interface of InternetTools.Typesdefines only 2 methods ( Mapand ToString) all of the rich set; adding the remaining ones, which the reader deems necessary in his particular case, is performed in exactly the same way and simple: the necessary methods are written in InternetTools.Types, after which InternetTools.Realizationthey are added to the module by code (most often as a single line).

    If you want to use a slightly different functionality, for example, managing cookies, the sequence of steps is very similar:

    1. A new interface is announced in InternetTools.Types:

      ...
      ICookies = interface
      ['{21D0CC9A-204D-44D2-AF00-98E9E04412CD}']
        procedureAdd(const URL, Name, Value: WideString);safecall;
        procedureClear;safecall;
      end;
      ...
    2. Then it is implemented in the module InternetTools.Realization:

      ...
      typeTCookies = class(TInterfacedObject, ICookies)
        privateprocedureAdd(const URL, Name, Value: WideString);safecall;
          procedureClear;safecall;
        publicfunctionSafeCallException(ExceptObject: TObject; ExceptAddr: CodePointer): HResult; override;
        end;
      ...
      implementationuses
        ...,
        internetaccess;
      ...
      procedureTCookies.Add(const URL, Name, Value: WideString);begin
        defaultInternet.cookies.setCookie( decodeURL(URL).host, decodeURL(URL).path, Name, Value, [] );
      end;
      procedureTCookies.Clear;begin
        defaultInternet.cookies.clear;
      end;
      ...
    3. After that, a new exported function is returned to the DLL, which returns this interface:

      ...
      functionGetCookies: ICookies; stdcall;
      begin
        Result := TCookies.Create;
      end;
      exports
        ...,
        GetCookies;
      ...

    Resource Release


    Although the InternetTools library is based on interfaces that imply automatic lifetime management, there is one non-obvious nuance that would seem to lead to memory leaks - if you run the next console application (created in Delphi, nothing will change in the case of FPC), then each time you press the enter key, the memory consumed by the process will increase:

    ...
    const
      ArticleURL = 'https://habr.com/post/415617';
      TitleXPath = '//head/title';
    var
      I: Integer;
    beginfor I := 1to100dobegin
        Writeln( GetXQValue.OpenURL(ArticleURL).Map(TitleXPath).ToString );
        Readln;
      end;
    end.

    There are no errors with the use of interfaces. The problem is that InternetTools does not release its internal resources allocated when analyzing a document (in a method OpenURL) —it needs to be done explicitly after it’s finished; For these purposes, the library module xqueryprovides a procedure freeThreadVarsthat it is logical to provide a call from the Delphi application by expanding the export list of the DLL:

    ...
    procedureFreeResources;stdcall;
    begin
      freeThreadVars;
    end;
    exports
      ...,
      FreeResources;
    ...

    After its activation, the loss of resources will stop:

    for I := 1to100dobegin
      Writeln( GetXQValue.OpenURL(ArticleURL).Map(TitleXPath).ToString );
      FreeResources;
      Readln;
    end;

    It is important to understand the following: a call FreeResourcesleads to the fact that all previously obtained interfaces become meaningless and any attempts to use them are unacceptable.

    Also popular now: