Development → Kazakhstan: How I helped to pass the 100th form of tax reporting. Continuation of 300 form

    * This is not the moon. This is a space station.
    - Obi-Wan Kenobi*

    Greetings to society!

    1 article → Start 200 form

    To be continued ...

    The next step for solving the problems of my customer was the VAT tax returns. Interestingly, the taxpayer’s office could export only 300 small xml forms. The remaining forms were exported only using the SONO program. And these forms have been archived.

    But not everything is as simple as it seems the first time.

    and the most interesting, as programmers in a company that supports an online service for filing tax returns, "encrypted" these same forms ...

    Part one. Torment search algorithm reading large forms.
    This is the SONO program. It employs accountants in Kazakhstan who file tax returns online.

    Where did I start?

    Exported all 300 forms of declarations from SONO.

    And he tried to open the archives. But it was not there.

    The archiver issued an error - "file is damaged." After spending 5 hours learning the basics of tar, I realized that I didn’t understand anything ...

    Studying open sources stupidly googling also did not help. And then I came across a small discussion on the accounting forum. Where admins openly scoffed at an algorithm that encrypted large forms.

    It turns out without hesitation without writing various crypto-protections. The authors of the "encryption" of large forms decided to stupidly remove the first two characters "BZ" at the beginning of the file.

    Inserting this mega key at the beginning of the file

    I opened the archive. And inside it was the same damaged archive. .

    and finally got access to the data.

    Just solving the question of how to read information from the archive took 6 hours of my life. "Abildet" - as my uncle says.

    A ready-made function in php that decrypts large forms.

    if ($_POST["action"] == "getBz2") {    
            $name = $_FILES["bz2"]["tmp_name"];
            $homepage = file_get_contents($name);  
            if (strripos($_FILES["bz2"]["name"], ".xml") === false)  {                                            
                $homepage = "BZ".$homepage;
                file_put_contents($name, $homepage);      
                $baseDir = "/tmp/21";
                exec("rm -f " .$baseDir . "/dir/*");
                if (!@mkdir("$baseDir", 0777, true)) {                                     
                exec("tar -jxvf $name -C $baseDir"); 
                exec (" rm  $baseDir/*.xml -rf");
                $files = glob("$baseDir/*.bz2");
                $homepage = file_get_contents($files[0]);
                $homepage = "BZ".$homepage;
                file_put_contents($files[0], $homepage);   
                exec("bunzip2 ".$files[0]);     
                $files = glob("$baseDir/*.xml");
                $homepage = file_get_contents($files[0]);
            echo homepage;

    Of course, everything can be easily improved, but here is only part of the php code that is responsible for the "crack".

    Part two reading the xml structure and collecting the necessary data on the ToR

    A small visual analysis of the resulting xml files. He made it clear that forms that were originally given as xml have fno formatVersion = 1, mega-encrypted ones had fno formatVersion = 2.


    The basic data structure was completely repeated.

    function getTitle(a) {
        try {        return frame.contentWindow.document.querySelector("form[name='form_300_00'] field[name='" + a + "']").innerHTML;    } catch (ex) {        return "";    }
    var fno = {};
    fno["dt"] = {}
    fno["dt"]["dt_main"] = getTitle("dt_main");
    fno["dt"]["dt_regular"] = getTitle("dt_regular");
    fno["dt"]["dt_additional"] = getTitle("dt_additional");
    fno["dt"]["dt_notice"] = getTitle("dt_notice");
    fno["dt"]["dt_final"] = getTitle("dt_final");
    fno["dt"]["notice_date"] = getTitle("notice_date");
    fno["dt"]["notice_number"] = getTitle("notice_number");
    fno["p7"] = getFaktures(7);
    fno["p8"] = getFaktures(8);
    fno["period_year"] = getTitle("period_year");
    fno["period_quarter"] = getTitle("period_quarter");
    fno["submit_date"] = getTitle("submit_date");
    fno["field_300_00_001_A"] = getTitle("field_300_00_001_A");
    fno["field_300_00_001_B"] = getTitle("field_300_00_001_B");
    fno["field_300_00_013_A"] = getTitle("field_300_00_013_A");
    fno["field_300_00_013_B"] = getTitle("field_300_00_013_B");
    fno["field_300_00_015"] = getTitle("field_300_00_015");
    fno["field_300_00_021"] = getTitle("field_300_00_021");
    fno["field_300_00_023"] = getTitle("field_300_00_023");
    fno["iin"] = getTitle("iin");
    fno["rnn"] = getTitle("rnn");

    In principle, I compiled the logical structure of the document in 20 minutes. If it weren’t for the mega crack, it would freeze altogether.


    Список 300 форм:

    Загрузка форм:

    Просмотр форм:

    ps The customer stopped writing for joy.
    pps For those interested, in applied terms, this is not "sexual."
    ppps If anyone wants to try with his hands I will soon throw everything on the github. At the moment, the system is not finalized. Therefore, I do not want to show it ...

    Also popular now: