[Javawatch Live] The story of one pull request. `os.version` in SubstrateVM

    A year has passed since the previous trick was successful: publish a YouTube clip instead of a post. "Shameful talk about singletons" scored 7k views on YouTube and twice as much on Habré itself in the text version. For an article written in a state of complete upheaval and telling about the most ancient bayan - this is something like success.

    Today I installed a new edition all night. This time the topic is much more recent: the history of committing to experimental technology - SubstrateVM. But the degree of upward movement rose to a new level.



    Really looking forward to your comments! I remind you that if you really want to improve something in this post, then it’s best to file your content on Github . I would like to say “put likes and subscribe to a new channel , but all its releases will be in your Java hub anyway?”

    Technically: there is one glue in the video near the end. I just wrote an uncompressed video, and my m2 ssd, the size of just five hundred gigabytes, quickly overflowed. And no other hard disk could not withstand the pressure of data. So I had to disconnect for half an hour and, having gotten tired of finding an additional fifty gigs to record the last few minutes. This was achieved by deleting the files collected by GoogleChrome . Opinion about the recording software wrote to the FB at the time of recordingthere is a lot of pain.

    More from the technically interesting: YouTube for some reason blocked me live streaming. At the same time on the account there is not a single strike and stigma. Let's hope that this is just a cant, and after 90 days everything will be returned back.

    This article will be quotes from code owned by Oracle. You cannot use this code in your home (unless you read the original license, and it allows it on terms, for example, the GPL). It is not joke. Olso, I warned.

    Tip (and the tale will be ahead)


    Many have already heard enough stories that "new Java will be written in Java" and wondering how this could be. There is a program document Project Metropolis and the corresponding letter from John Rose , but everything is rather vague.

    It sounds like some kind of creepy, bloody magic. In the same thing that you can try right now, not just there is no magic, but everything is stupid like the back of a shovel when you knock out your teeth. Of course, there are some nuances, but this will someday be very much later.

    I will show it on the example of one instructive story that happened in the summer. How is it in schools write an essay "how I spent my summer".

    To start a small remark. The project that Ahead-of-Time is currently compiling with Oracle Labs is GraalVM. The component that actually makes nishtyaki and turns the java code into an executable file (into an executable) is SubstrateVM or SVM for short. Do not confuse this with the same abbreviation used by data-satanists (support vector machine). This is about the SVM, as a key part, we'll talk further.

    Formulation of the problem


    So, "how I spent the summer". I was sitting on vacation, F5 on the Grail githabe, and stumbled upon such a thing :



    A person wants to os.versiongive the correct value.

    Well cho, I wanted to fix the bug? The boy said - the boy did.

    We go to check if our patient is lying.

    publicclassMain{
        publicstaticvoidmain(String[] args){
            System.out.println(System.getProperty("os.version"));
        }
    }
    

    At first, it looks like the exhaust at present the Java: 4.15.0-32-generic. Yes, this is a fresh Ubuntu LTS Bionic.

    Now let's try to do the same on the SVM:

    $ ls
    Main.java
    $ javac -cp . Main.java
    $ ls
    Main.class  Main.java
    $ native-image Main
    Build on Server(pid: 18438, port: 35415)
       classlist:     151.77 ms
           (cap):   1,662.32 ms
           setup:   1,880.78 ms
    error: Basic header file missing (<zlib.h>). Make sure libc and zlib headers are available on your system.
    Error: Processing image build request failed
    

    Well yes. This is because especially for the “clean” test I made a completely new virtual machine.

    $ sudo apt-get install zlib1g-dev libc6 libc6-dev
    $ native-image Main
    Build on Server(pid: 18438, port: 35415)
       classlist:     135.17 ms
           (cap):     877.34 ms
           setup:   1,253.49 ms
      (typeflow):   4,103.97 ms
       (objects):   1,441.97 ms
      (features):      41.74 ms
        analysis:   5,690.63 ms
        universe:     252.43 ms
         (parse):   1,024.49 ms
        (inline):     819.27 ms
       (compile):   4,243.15 ms
         compile:   6,356.02 ms
           image:     632.29 ms
           write:     236.99 ms
         [total]:  14,591.30 ms
    

    Absolute runtime numbers can be terrifying. But, first of all, this is what was intended: very hellish optimizations are being applied here. And secondly, it is a sickly virtualka that you want.

    And finally, the moment of truth:

    $ ./main
    null

    It seems that our guest did not lie, really does not work.

    The first approach: theft of properties from the host


    Then I searched the global search for os.versionand found that all these properties are in the class SystemPropertiesSupport.

    I will not write the full path to the file, because right in the SVM built the ability to generate the correct projects for IntelliJ IDEA and Eclipse. This is very cool and does not at all resemble the torments that OpenJDK has to endure. Let classes for us opens IDE. So:

    publicabstractclassSystemPropertiesSupport{
        privatestaticfinal String[] HOSTED_PROPERTIES = {
                        "java.version",
                        ImageInfo.PROPERTY_IMAGE_KIND_KEY,
                        "line.separator", "path.separator", "file.separator",                    
                        "os.arch", "os.name",
                        "file.encoding", "sun.jnu.encoding",
        };
       //...
    }
    

    Then I, completely without including my head, just went and added another variable to this set:

    "os.arch", "os.name", "os.version"

    I reassemble, I launch, I receive a treasured line 4.15.0-32-generic. Hooray!

    But here’s the problem: now it always shows up on every machine where this code is running 4.15.0-32-generic. Even where the uname -aprevious version of the bucket gives away, on the old Ubunt.

    It becomes clear that these variables are written to the source file at the time of compilation.
    And indeed, you need to carefully read the comments:

    /** System properties that are taken from the VM hosting the image generator. */privatestaticfinal String[] HOSTED_PROPERTIES
    

    It is necessary to apply other methods.

    findings


    • If you want the system property from “main java” to appear in SVM, it is very easy to do. We register the desired property in the right place, everything.
    • You can work in an IDE that supports Java and Python at the same time. For example, in IntelliJ IDEA Ultimate with a Python plug-in or the same in Eclipse.

    Second approach


    If you dig into the file SystemPropertiesSupport, we find a much more reasonable thing:

    /** System properties that are lazily computed at run time on first access. */privatefinal Map<String, Supplier<String>> lazyRuntimeValues;
    

    Among other things, the use of these propertey still does not block the build process of the executable. It is clear that if we cram a lot in HOSTED_PROPERTIES, everything will slow down.

    Registration of the lazy ones is done in an obvious way, by reference to the method that returns:

    lazyRuntimeValues.put("user.name", this::userNameValue);
    lazyRuntimeValues.put("user.home", this::userHomeValue);
    lazyRuntimeValues.put("user.dir", this::userDirValue);
    

    Moreover, all these references to methods are interface, and the same this::userDirValueis implemented for each of the supported platforms. In this case, this PosixSystemPropertiesSupportand WindowsSystemPropertiesSupport.

    If out of curiosity to go to the implementation for Windows, we will see the sad:

    @Overrideprotected String userDirValue(){
        return"C:\\Users\\somebody";
    }
    

    As you can see, Windows is not yet supported :-) However, the real problem is that the generation of executables for Windows has not yet been completed, so supporting these methods would in fact be completely redundant.

    That is, you need to implement the following method:

    lazyRuntimeValues.put("os.version", this::osVersionValue);
    

    And then support it in two or three available interfaces.

    But what to write there?

    findings


    • If you want to add a new property that is calculated in runtime, then this is a matter of writing one method. The result may depend on the current operating system, the switching mechanism is already running and there is no request.

    Bit of archeology


    The first thing that comes to mind is to peek at the implementation in OpenJDK and brazenly copy-paste. A little archeology and looting will never prevent the brave explorer!

    Feel free to open any Jav project in the Idea, we write there System.getProperty("os.version"), and by ctrl + click we proceed to the implementation of the method getProperty(). It turns out that all this is stupid in Properties.

    It would seem that it is enough to copy the place where these Propertiesare filled, and, laughing defiantly, to escape into the void. Unfortunately, we come across a problem:

    privatestaticnative Properties initProperties(Properties props);
    

    Noooooooooooooo.



    But it all started well.

    Was there a boy?


    As we know, using C ++ is bad. Is C ++ used in SVM?

    And how! To do this, there is even a special package: src/com.oracle.svm.native.

    And in this package, horror-horror, is a file getEnviron.cwith something like this:

    externchar **environ;
    char **getEnviron(){
      return environ;
    }
    

    It's time to smear C ++


    Now we dive a little deeper and open the full OpenJDK sources.

    If someone has them yet, you can look at the web or download. I warn you, they are swinging from here , still with the help of Mercurial, and still it will take about half an hour.

    The file we need is at src/java.base/share/native/libjava/System.c.

    Notice that this is the path to the file, and not just the name? That's right, you can shove your new shiny, fashionable Idea, bought for $ 200 a year. You can try CLion , but in order to avoid irreversible mental damage, it is better to just take the Visual Studio Code . He already highlights something, but still does not understand what he saw (he doesn’t cross out everything in red).

    Short retelling System.c:

    java_props_t *sprops = GetJavaProperties(env);
    PUTPROP(props, "os.version", sprops->os_version);

    In turn, they are taken in src/java.base/unix/native/libjava/java_props_md.c.
    Each platform has its own such file, they switch through #define.

    And here begins. There are many platforms. Any necrophilia like AIX can be scored, because GraalVM officially does not support this (as far as I know, GNU-Linux, macOS and Windows are planned first). GNU / Linux and Windows support usage <sys/utsname.h>in which there are ready-made methods for getting the name and version of the operating system.

    But in macOS there is a creepy piece of shit .

    • It has the name "Mac OS X" (although it has long been macOS);
    • It depends on the version of makosi. Before 10.9, there was no function in the SDK operatingSystemVersion, and it was necessary to read it by hand SystemVersion.plist;
    • For this subtraction, it uses ObjC extensions something like this:

    // Fallback if running on pre-10.9 Mac OSif (osVersionCStr == NULL) {
            NSDictionary *version = [NSDictionary dictionaryWithContentsOfFile :
                                     @"/System/Library/CoreServices/SystemVersion.plist"];
            if (version != NULL) {
                NSString *nsVerStr = [version objectForKey : @"ProductVersion"];
                if (nsVerStr != NULL) {
                    osVersionCStr = strdup([nsVerStr UTF8String]);
                }
            }
        }
    

    If initially the idea was to rewrite it manually in a good style, then it quickly broke about reality. And what if I’m somewhere in the jungle of this noodles jungle, for someone it breaks, and I am hanged in the central square? Well nafig. Need to copy-paste.

    findings


    • IDE is not required;
    • Any communication with C ++ is painful, unpleasant, not understandable at first glance.

    Is copy-paste the norm?


    This is an important question on which the amount of further torment depended. I really didn’t want to rewrite manually, but it was even worse to go to court for violating licenses. So I went to the githab and asked Codrut Stancu about it directly. Here is what he said :

    "Reusing OpenJDK code, for example, copy-paste is a normal thing in terms of licensing. However, for this you need to have a very good reason. If a feature can be implemented by reusing the JDK code without copying, for example, patching it with a substitution - it will be much better. "

    That sounds like official permission to copy-paste!

    Normally communicated ...


    I began to transfer this piece of code, but rested on my laziness. To check the work under macOS of different versions, you need to find at least one with necrofile 10.8 Mountain Lion. I have two of my apple devices and one of my friend, plus you can deploy to some kind of VMWare trial.

    But laziness. And this laziness saved me.

    I went to chat and asked Chris Seaton which toolchain is the right one for the build. What is supported version of OSes, C ++ compiler and so on.

    In response, he received a surprised silence of the chat and Chris's answer that he did not understand the essence of the question.

    It was a long time before Chris could understand what I wanted to do, and asked him not to do so .
    That's really the idea of ​​SVM. SVM is pure Java, it’s not a code that comes with OpenJDK. You can read the C ++ code from OpenJDK. That's the last thing we want.

    The example with mathematical libraries did not convince him. At a minimum, they are written in C, and the inclusion of C ++ would mean the connection of a perfect new language into the code base. And this, that fufufu.

    What to do? Write on System Java .

    And if the appeal to the C / C ++ Platform SDK cannot be avoided, then it must be some kind of single system call wrapped in the C API. The data is drawn in Java and then business logic is written strictly in Java, even if the Platform SDK has convenient ready-made ways to do it differently on the C ++ side.

    I sighed and began to study the source code in order to figure out how this can be done differently.

    findings


    • Talk to all the unclear details with the people in the chat . They answer if the questions are not completely idiotic. Although this example shows that Chris is ready to discuss idiotic questions, even if it does not save his time personally;
    • C ++ is not present in the project at all. There is no reason to believe that someone will give him to drag under the hollow;
    • Instead, you need to write to System Java, using C as a last resort (for example, when calling the platform SDK).

    No need for violinist


    A violinist is not needed, dear. He only eats extra fuel.

    Here I felt some sadness, because look here. If we have in Windows <sys/utsname.h>, and we stupidly hope for its answer - it is easy and simple.

    But if it's not there, you have to do what?

    • Call cmd builtins or windows utilities? The outstanding text in Russian, which must be parsit. This is the bottom, and it may not coincide with the fact that in this place the real OpenJDK will respond.
    • Take from the Registry? Even here there are nuances, for example, when switching from Windows 7 to 10, the method of storing tsiferok in the Registry has changed, and in Windows 10, you need to either glue the major and minor components with your hands, or simply answer that this is Windows 10 with a single digit. Which of these methods is more correct (it will not make the users asses their asses) is unclear.

    Fortunately, my mental anguish was interrupted by the pullwrest Paul Woegerer, who repaired it all.

    It is interesting that at first everything was fixed in the master ( os.versionstopped giving nullin the test), and only then I noticed the pullrequest. The problem is that this commit is not marked as a pullrequest on Github - it is a simple commit with an inscription PullRequest: graal/1885in the comment. The fact is that the dudes in Oracle Labs do not use Github, they need it only to interact with external committers. All of us who are not fortunate enough to work at Oracle Labs need to subscribe to alerts about new commits to the repository and read them all.

    But now you can relax and see how to implement this feature correctly .

    Let's see what this beast is, System Java.

    As I said earlier, everything is simple, like the back of a spade, when they try to knock your teeth out. And just as painful. Let's look at a quote from the pool:

    @Overrideprotected String osVersionValue(){
            if (osVersionValue != null) {
                return osVersionValue;
            }
            /* On OSX Java returns the ProductVersion instead of kernel release info. */
            CoreFoundation.CFDictionaryRef dict = CoreFoundation._CFCopyServerVersionDictionary();
            if (dict.isNull()) {
                dict = CoreFoundation._CFCopySystemVersionDictionary();
            }
            if (dict.isNull()) {
                return osVersionValue = "Unknown";
            }
            CoreFoundation.CFStringRef dictKeyRef = DarwinCoreFoundationUtils.toCFStringRef("MacOSXProductVersion");
            CoreFoundation.CFStringRef dictValue = CoreFoundation.CFDictionaryGetValue(dict, dictKeyRef);
            CoreFoundation.CFRelease(dictKeyRef);
            if (dictValue.isNull()) {
                dictKeyRef = DarwinCoreFoundationUtils.toCFStringRef("ProductVersion");
                dictValue = CoreFoundation.CFDictionaryGetValue(dict, dictKeyRef);
                CoreFoundation.CFRelease(dictKeyRef);
            }
            if (dictValue.isNull()) {
                return osVersionValue = "Unknown";
            }
            osVersionValue = DarwinCoreFoundationUtils.fromCFStringRef(dictValue);
            CoreFoundation.CFRelease(dictValue);
            return osVersionValue;
        }
    

    In other words, we write in Java word for word what we would have written in C.

    Take a look, as recorded DarwinExecutableName:

    @Overridepublic Object apply(Object[] args){
            /* Find out how long the executable path is. */final CIntPointer sizePointer = StackValue.get(CIntPointer.class);
            sizePointer.write(0);
            if (DarwinDyld._NSGetExecutablePath(WordFactory.nullPointer(), sizePointer) != -1) {
                VMError.shouldNotReachHere("DarwinExecutableName.getExecutableName: Executable path length is 0?");
            }
            /* Allocate a correctly-sized buffer and ask again. */finalbyte[] byteBuffer = newbyte[sizePointer.read()];
            try (PinnedObject pinnedBuffer = PinnedObject.create(byteBuffer)) {
                final CCharPointer bufferPointer = pinnedBuffer.addressOfArrayElement(0);
                if (DarwinDyld._NSGetExecutablePath(bufferPointer, sizePointer) == -1) {
                    /* Failure to find executable path. */returnnull;
                }
                final String executableString = CTypeConversion.toJavaString(bufferPointer);
                final String result = realpath(executableString);
                return result;
            }
        }
    

    All these here CIntPointer, CCharPointer, PinnedObjectwhat it is.

    For my taste, it is inconvenient and ugly. You need to manually work with pointers that look like Java classes. It is necessary to call appropriate in time releaseso that the memory does not flow away.

    But if it seems to you that these are unjustified measures, you can again look at the implementation of GC in .NET and be terrified, what does C ++ lead to if you do not stop in time. Remember, this is one huge CPP file of more than a megabyte size. There are some descriptions of his work, but they are clearly insufficient for understanding by an external contributor. The code above, albeit ugly looking, is quite understandable and analyzed by means of static analysis for Java.

    As for the essence of the commit, I have questions for him. And at least there is no support for Windows. When kodgen appears for Windows, I'll try to take on this task.

    findings


    • Need to write on System Java. To extol, call sweet bread. There are no options anyway;
    • Sign up for notifications from the repository on GitHub and read commits, otherwise important PR will fly by;
    • If possible, ask about any big features of those responsible for this area. There are a lot of things that are implemented, but they are not yet known to the general public. There is a chance to invent a bicycle, and much more bad, than made by guys from Oracle Labs;
    • When you take on a feature, be sure to tell the person responsible for the github. If he does not respond - write a letter, the addresses of all team members are easy to googling.

    Epilogue


    This battle ends, but not a war at all.

    The fighter, sensitively wait for new articles on Habré and fit into our ranks !

    I want to remind you that Oleg Shelayev, the only official GraalVM evangelist from Oracle, will come to the next Joker conference . Not just "the only Russian-speaking", but "the only one in general." The title of the report ( “Compile Java ahead-of-time with GraalVM” ) hints that it won't do without SubstrateVM. By the way, Oleg recently issued a service weapon - an account on Habré, shelajev-oleg . There are no posts there yet, but on this username you can cast. You can talk with Oleg and Oleg in our telegram in the small game in Telegram:



    @graalvm_ru . Unlike ishshuyov on Gitkhab, you can communicate in any form, and no one will be banned ( but this is not accurate ).

    I also remind you that every week we, together with the podcast “Debriefing”, make the release of “Java-digest”. For example, this was the last digest . From time to time, there is also news about GraalVM (in fact, I don’t turn the whole issue into a GraalVM news release just because of respect for the audience :-)

    Thank you for reading this - and see you again !

    Also popular now: