Debugging a bug that does not play

Original author: Alexandre Moureaux
  • Transfer
On October 10, 2018, our team released a new version of the application on React Native. We are pleased and proud of it.

But what a horror: after a few hours, the number of crashes under Android suddenly increases.


10,000 crashes under Android

Our Sentry crash monitoring tool is going crazy.

In all cases, we see a type error JSApplicationIllegalArgumentException Error while updating property 'left' in shadow node of type: RCTView".

In React Native, this usually happens if you set a property with the wrong type. But why didn't the error manifest itself during testing? With us, each developer thoroughly tests new releases on several devices.

Also, the errors seem quite random, as if they fall on any combination of properties and the type of shadow node. For example, here are the first three:

  • Error while updating property 'paddingTop' in shadow node of type: RCTView
  • Error while updating property 'height' in shadow node of type: RCTImageView
  • Error while updating property 'fill' of a view managed by: RNSVGPath

It seems that the error occurs on any device and in any version of Android, judging by the report Sentry.


Most crashes under Android 8.0.0 are falling, but this is consistent with our user base.

Let's replay!


So, the first step before fixing a bug is to reproduce it, right? Fortunately, thanks to the Sentry logs, we can find out what users are doing before a crash occurs.

Ta-a-ak, let's see ...



Hmm, in the overwhelming majority of cases, users simply open the application and - boom, it crashes.

Well, try again. Install the application on six Android devices, open it and exit several times. No failure! Moreover, it is impossible to reproduce it locally in dev-mode.

Well, that seems pointless. Failures are still quite random and occur in 10% of cases. It seems that you have a chance of 1 out of 10, that the application will fall at startup.

Stack trace analysis


To reproduce this failure, let's try to understand where it comes from ...


As mentioned earlier, we have several different bugs. And they all have similar, but slightly different traces.

Well, take the first one:

java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1
    at android.support.v4.util.Pools$SimplePool.release(Pools.java:116)
    at com.facebook.react.bridge.DynamicFromMap.recycle(DynamicFromMap.java:40)
    at com.facebook.react.uimanager.LayoutShadowNode.setHeight(LayoutShadowNode.java:168)
    at java.lang.reflect.Method.invoke(Method.java)
    ...
java.lang.reflect.InvocationTargetException: null
    at java.lang.reflect.Method.invoke(Method.java)
    ...
com.facebook.react.bridge.JSApplicationIllegalArgumentException: Error while updating property 'height' in shadow node of type: RNSVGSvgView
    at com.facebook.react.uimanager.ViewManagersPropertyCache$PropSetter.updateShadowNodeProp(ViewManagersPropertyCache.java:113)
    ...

So the problem is android/support/v4/util/Pools.java.

Hmm, we are very deep in the library of support for Android, it is unlikely that there can be any benefit.

Find another way


Another way to find the root cause of the error is to check for new changes made to the latest release. Especially those that affect native Android code. Two hypotheses arise:

  • We updated Native Navigation , where native snippets for Android are used for each screen.
  • We updated react-native-svg . There were a few exceptions related to SVG components, but this is hardly the case.

At the moment we can not reproduce the error, so the best strategy:

  1. Roll back one of the two libraries. Roll it out for 10% of users, which is trivially done in the Play Store. Check with several users if the crash persists. Thus, we will confirm or disprove the hypothesis.


    But how to choose a library for rollback? Of course, you can throw a coin, but is this the best option?


    Getting to the bottom


    Let's analyze the previous trace more carefully. Perhaps this will help determine the library.

    /**
     * Simple (non-synchronized) pool of objects.
     *
     * @param  The pooled type.
     */publicstaticclassSimplePoolimplementsPool{
        privatefinal Object[] mPool;
        privateint mPoolSize;
        ...
        @Overridepublicbooleanrelease(T instance){
            if (isInPool(instance)) {
                thrownew IllegalStateException("Already in the pool!");
            }
            if (mPoolSize < mPool.length) {
                mPool[mPoolSize] = instance;
                mPoolSize++;
                returntrue;
            }
            returnfalse;
        }

    There was a failure. Error java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1means that mPool- an array of size 10, but mPoolSize=-1.

    Okay, how come mPoolSize=-1? In addition to the method recycleabove, the only place to change mPoolSizeis the acquireclass method SimplePool:

    public T acquire(){
        if (mPoolSize > 0) {
            finalint lastPooledIndex = mPoolSize - 1;
            T instance = (T) mPool[lastPooledIndex];
            mPool[lastPooledIndex] = null;
            mPoolSize--;
            return instance;
        }
        returnnull;
    }

    Therefore, the only way to get a negative value mPoolSizeis to reduce it when mPoolSize=0. But how is this possible with the condition mPoolSize > 0?

    Put a breakpoint in Android Studio and take a look at what happens when you start the application. I mean, here is the condition if, this code should work fine!

    Finally, a revelation!



    See the DynamicFromMapstatic link for SimplePool.

    privatestaticfinal Pools.SimplePool<DynamicFromMap> sPool = new Pools.SimplePool<>(10);

    After several dozen clicks of the Play button with carefully placed breakpoints, we see that the mqt_native_modules streams access functions SimplePool.acquireand SimplePool.releaseuse React Native to control the style properties of the React component (below the widthcomponent property )



    But they are also accessed by the main thread main !



    Above, we see that they are used to update properties fillin the main thread, usually for a component react-native-svg! Indeed, the library react-native-svgbegan to be used DynamicFromMaponly from the seventh version to improve the performance of native svg animations.

    And-and-and ... the function can be called from two threads, but DynamicFromMapdoes not useSimplePoolthread safe way. “Thread safe,” you say?

    Thread safety, some theory


    In single-threaded JavaScript, developers usually do not need to deal with thread safety.

    On the other hand, Java supports the concept of parallel or multi-threaded programs. Multiple threads can run within the same program and can potentially access the overall data structure, which sometimes leads to unexpected results.

    Take a simple example: the image below shows that streams A and B are parallel:

    • read integer;
    • increase its value;
    • return it.


    Stream B can potentially access the data value before Stream A updates it. We expected two separate steps to give a finite value 19. Instead we can get 18. This situation, where the final state of the data depends on the relative order of the flow operations, is called a race state. The problem is that this state does not necessarily occur all the time. Perhaps in the above case, stream B has another job before increasing the value, which gives sufficient time for stream A to update the value. This explains the chance and inability to reproduce the failure.

    The data structure is considered thread-safe if operations can be performed simultaneously by multiple threads without the risk of a race condition.

    When one thread reads for a particular data element, another thread should not have the right to modify or delete this element (this is called atomicity). In the previous example, if the update cycles were atomic, the race conditions could have been avoided. Flow B will wait for thread A to complete the operation, and then start itself.

    In our case, this can happen:



    Because it DynamicFromMapcontains a static link to SimplePool, several calls DynamicFromMapcome from different threads, simultaneously calling the acquirec method SimplePool.

    In the illustration above, flow A calls the method, evaluating the condition as true , but it has not yet managed to reduce the value mPoolSize(which is used in conjunction with flow B), while flow B also calls this method and also evaluates the condition as true . Subsequently, each call will decrease the value mPoolSize, and as a result the “impossible” value is obtained.

    Correction


    Studying the options for correction, we found a pull-request for react-native , which has not yet joined the branch - and it provides thread safety in this case.



    Then we rolled out the corrected version of React Native for users. Crash finally fixed, hurray!


    So, thanks to the help of Jenik Duplessi (contributor to the core of React Native) and Michael Sand (maintener react-native-svg), the patch is included in the next minor version of React Native 0.57 .

    Fixing this bug required some effort, but it was a great opportunity to dig deeper into react-native and react-native-svg. A good debugger and several well-located breakpoints are important. I hope you learned something from this story too!

Also popular now: