Solving the problem with circular references in ObjC blocks

    A lot has been written about the blocks in ObjC and the correct work with them, including on the Habr. The question of how to properly work with self in blocks in order to avoid circular references is regularly asked at interviews. When using frameworks such as ReactiveCocoa, the number of blocks in the code increases significantly, while increasing the chance of making a mistake and losing objects in memory. About an attempt to finally solve this problem, metaprogramming for c99 with extensions and blocks + hipster macros with @ under the cut.

    Consider the problem and how to solve it evolutionarily.
    self.block = ^{ 
        [self f1];
        [self f2]; 
    };
    

    This code obviously contains a problem. Without zeroing self.block, an object can never be deleted, since the block refers to self. When LANG_WARN_OBJC_IMPLICIT_RETAIN_SELF is enabled, the compiler will even issue a warning.

    Improvement 1:

    __weak __typeof(self)weakSelf = self;
    self.block = ^{ 
        [weakSelf m1];
        [weakSelf m2]; 
    };
    

    The circular reference problem is resolved, but another arises. At the time the block is called, the weakSelf object either exists or is already gone. If the object no longer exists, weakSelf == nil, m1 and m2 will not be called - it would seem that everything is in order. However, it may turn out that at the time of the call to m1 the object still exists, but at the time of the call to m2, it no longer exists. In this case, m1 is called, but m2 is not - this behavior may be unexpected and incorrect. This can happen as with a race condition in a multi-threaded application, or if m1 reduces the number of references to an object (for example, removes an object from some collection). If CLANG_WARN_OBJC_REPEATED_USE_OF_WEAK and CLANG_WARN_OBJC_RECEIVER_WEAK are enabled, the compiler generates a warning for this case.

    Improvement 2:

    __weak typeof(self)weakSelf = self;
    self.block = ^{
        __strong typeof(self)strongSelf = weakSelf;
        [strongSelf m1];
        [strongSelf m2];
    };
    

    The problem with the consistency of method calls inside the block has been resolved. But a new one is revealed:
    __weak typeof(self)weakSelf = self;
    self.block = ^{
        __strong typeof(self)strongSelf = weakSelf;
        [strongSelf m1];
        [strongSelf m2];
        NSAssert(foo == bar, @"Cool assert!")
    };
    

    Macros such as NSAssert and RACObserve implicitly use self, and the loopback problem returns.

    Improvement 3:

    __weak typeof(self)weakSelf = self;
    self.block = ^{
        __strong typeof(self)self = weakSelf;
        [self m1];
        [self m2];
        NSAssert(foo == bar, @"Cool assert!")
    };
    

    Now the problem with macros using self has been resolved, but when GCC_WARN_SHADOW is enabled, the compiler generates a warning.

    Improvement 4:

    The libextobjc library has @weakify and @stongify macros that remove the compiler warning and simplify the code a bit.
    @weakify(self); // self теперь новая локальная переменная с  __weak
    self.block = ^{
        @strongify(self); // self теперь новая локальная переменная с __strong
        [self m1];
        [self m2];
        NSAssert(foo == bar, @"Cool assert!")
    };
    

    This is an almost optimal solution, but it is still not without several drawbacks: you must remember to put @weakify and @strongify in the right places; using self after @weakify is safe, but the compiler may issue a warning.
    At the same time, there is still a chance of accidentally capturing self in a block by a strong link:
    @weakify(self); // self теперь новая локальная переменная с  __weak
    self.block = ^{
        @strongify(self); // self теперь новая локальная переменная с __strong
        [self m];
        NSLog(@"Ivar value form object: %@", _ivar); // Сильная ссылка на self сохраняется неявно для доступа к  _ivar
        NSAssert(foo == bar, @"Cool assert!")
    };
    

    In order to avoid this, you must either use only access through property (self.ivar), or explicitly use the overridden self:
    @weakify(self); // self теперь новая локальная переменная с  __weak
    self.block = ^{
        @strongify(self); // self теперь новая локальная переменная с __strong
        [self m];
        NSLog(@"Ivar value form object: %@", self->_ivar); // Явно используем свой переопределенный self для доступа к _ivar
        NSAssert(foo == bar, @"Cool assert!")
    };
    

    It should be remembered that self can be nil, and explicit dereferencing of self -> _ ivar will cause a crash.

    Given all these problems, the idea came up to write a macro that would modify not self, but the block itself in such a way that:
    • self outside the scope of the block should not change, as in the case of @weakify
    • inside the self block should be called self to avoid surprises with NSAssert and other macros
    • until the block is called, the object pointed to by self is stored by a weak link, and when a block is called, by the strong
    • if possible, the macro should help find blocks that implicitly capture self through _ivar
    • all type checks should work just as without a macro
    • minimize code changes when using this macro
    • overhead in runtime should be minimal

    The macro should work something like a decorator function in Python, take a block as an input and wrap it in a new block wrapper compatible with the parameters and return value. For example, consider a block:
    self.block = ^(NSObject *obj) {
        NSLog(@"%@ %@", [self description], obj);
        return 0;
    };
    

    Let's start modifying the block so that self is captured as a weak link, by analogy with the code from “Improvement 1”. To do this, we need a new scope in which this local link will be declared. An anonymous block that is called immediately after creation is suitable as such scope:
    self.block = ^{
        __weak typeof(self) weakSelf = self;
        return ^(NSObject *obj) {
            NSLog(@"%@ %@", [weakSelf description], obj);
            return 0;
        };
    }();
    

    The compiler will automatically infer the type of the return value for the external nameless block; everything remains type-safe.

    Now you need to somehow make it so that at the time of the call inside the body of the internal block, self becomes a strong link. To do this, you will have to divide the block into 2 parts: a declaration of type ^ (NSObject * obj) and, in fact, the body itself in {...}. We turn the body of our block into a block without parameters and put its call into another block created using a type declaration that turns self into a strong link:
    self.block = ^{
        __weak typeof(self) weakSelf = self;
        return ^(NSObject *obj) {
            __strong typeof(self)self = weakSelf;
            return ^ (void) {
                NSLog(@"%@, %@", [self description], obj);
                return 0;
            }();
        };
    }();
    

    The main trick is to replace the original block with one equivalent to it, but which implicitly captures weakSelf instead of self, and at the time of the call turns it into strongSelf.
    return ^(NSObject *obj) {
        __weak typeof(self)self = weakSelf;
        return ^ (void) {
            NSLog(@"%@, %@", [self description], obj);
            return 0;
        }();
    };
    

    essentially the same as
    ^(NSObject *obj) {
        NSLog(@"%@ %@", [self description], obj);
        return 0;
    };
    

    So instead of one block, three are created. Since the outermost block is called immediately after creation, you can get rid of it using the code block evaluation aka statement expressions extension :
    self.block = ({
        __weak typeof(self) weakSelf = self;
        ^(NSObject *obj) {
            __strong typeof(self)self = weakSelf;
            return ^ (void) {
                NSLog(@"%@, %@", [self description], obj);
                return 0;
            }();
        };
    });
    

    It remains to wrap the entire boilerplate in a macro so that this trick is convenient to use. If you leave only the general code, you get:
     ({
        __weak typeof(self) weakSelf = self;
        /* ТИП БЛОКА */ {
            __strong typeof(self)self = weakSelf;
            return ^ (void) {
            /* ТЕЛО БЛОКА */ 
            } ();
        };
    })
    

    The first idea was to make a macro with two parameters, for the type and body, which would be called like this:
    self.block = weakself(^(NSObject *obj), {
        NSLog(@"%@ %@", [self description], obj);
        return 0;
    });
    

    but, unfortunately, during preprocessing, macros are expanded on one line, and, as a result, you cannot put a breakpoint on an arbitrary line in the body of a block. Therefore, I had to do this:
    self.block = weakself(^(NSObject *obj)) {
        NSLog(@"%@ %@", [self description], obj);
        return 0;
    } weakselfend ;
    

    this option is equivalent to @ weakify / @strongify from Improvement 4. Macro Code:
    #define weakself(ARGS) \
    ({  __weak typeof(self) _private_weakSelf = self; \
        ARGS { \
            __strong typeof(_private_weakSelf) self __attribute__((unused)) = _private_weakSelf; \
            return ^ (void) {
    #define weakselfend } (); }; })
    

    One of the goals when creating the macro was to protect yourself from the implicit capture of self when accessing ivar. Unfortunately, I never came up with how to do this in compile time. The only option is assert / log to debug the version when creating the block (just create a block for the check to work, it is not necessary to call it). Here it’s worth a bit to recall how memory management works for the blocks and objects that they capture. There are 3 types of blocks:
    • NSGlobalBlock - blocks created at the top level of the source code file are essentially the same as functions from the point of view of memory management, variables in scope are not exciting for this, they are of no interest to us.
    • NSStackBlock - the initial type for all other created blocks; they are created on the stack; they do not increase the reference counts of objects that capture, since the lifetime of such a block is less than or equal to the lifetime of variables from its lexical scope.
    • NSMallocBlock is an NSStackBlock that has been ported to heap by an explicit call to copy / Block_copy or implicitly by the compiler. One of the cases when the compiler implicitly inserts Block_copy is the return of the block as a result of the function / block. At the time of turning NSStackBlock into NSMallocBlock, there is an increase in reference counts for objects that the block has captured in its scope.

    Thus, in order to check whether a block captures a strong link to self, you need to compare the reference counter to self before the block was transferred to heap, and after. If the counter has increased, then the block has captured self by a strong link. This check cannot be reliable in 100% of cases, since the counter of references to self can change from other threads during the block transfer to heap, however, in a normal program this situation is unlikely, and is quite suitable for Debug-assembly.

    Previously, you could use the retainCount method to get the reference count from an object, however it is no longer available with ARC, but CFGetRetainCount still works through toll-free bridging. It remains only to insert calls to this function with the self parameter in the right places and compare the results.
    self.block = {(
        __weak typeof(self) weakSelf = self;
        // Первый раз счетчик ссылок для self нужно получить здесь
        ^(NSObject *obj) {
            __strong typeof(self)self = weakSelf;
            return ^ (void) {
                NSLog(@"%@, %@", [self description], obj);
                return 0;
            }();
        };
    })  // второй раз здесь и сравнить. Но у нас нет доступа к переменным из statement expression
    

    The problem is that the result of statement expressions is the last line in it. The behavior is similar to the anonymous block that is called immediately after the declaration. Since the last line of the statement expression is the declaration of the block, in order for this block to remain valid, the compiler will transfer it to heap. It turns out that we can save the call to CFGetRetainCount for self in a local variable inside the statement expression, and we need to make the second call to CFGetRetainCount after the last line of statement expression. If we were talking about C ++, we could create an object on the stack, and in the object's destructor do everything we need, since the destructor would be called after the last line of statement expression. Fortunately, clang supports gcc-extension which allows you to set a cleanup function (analogue to the destructor) for any variable on the stack that will be called the moment the variable leaves the scope. The @onExit macro from libextobjc works through this extension.

    To implement reference counter verification, an additional structure is required:
    struct RefCountCheckerData {
        CFTypeRef weakSelf;
        NSUInteger refCountBefore;
    };
    

    And a function to be billed as cleanup.
    static inline void vbr_CheckRefCountForWeakSelf(struct RefCountCheckerData *data) {
        const NSInteger refCountAfter = CFGetRetainCount(data->weakSelf);
        const NSInteger countOfSelfRefInBlock = refCountAfter - data->refCountBefore;
        if (countOfSelfRefInBlock > 0) {
            raise(SIGPIPE);
        }
    }
    

    We create a structure on the stack, set the cleanup function, and initialize the pointer to weakSelf and the number of references to it. The Cleanup function is called when the variable _private_refCountCheckerData leaves the scope, and at this point our block is already in the heap.
    self.block = {(
        __weak typeof(self) weakSelf = self;
        __attribute__((cleanup(vbr_CheckRefCountForWeakSelf), unused))
            struct RefCountCheckerData _private_refCountCheckerData = {
                .weakSelf = (__bridge CFTypeRef)self,
                .refCountBefore = CFGetRetainCount((__bridge CFTypeRef)self),
            };
        ^(NSObject *obj) {
            __strong typeof(self)self = weakSelf;
            return ^ (void) {
                NSLog(@"%@, %@", [self description], obj);
                return 0;
            }();
        };
    });
    

    With this version of the macro, breakpoint in the debugger will work when trying to access ivar not through self, for example such self.block = ^ {NSLog (@ "% d", _ivarInteger); };

    Before presenting the final version of the macro, you need to bring it into a modern hipster look. It is fashionable for ObjC to make macros starting, like the language keywords, with @, for example: @strongify, @onExit. But the preprocessor does not allow the use of @ as part of the macro name. Extobjc uses the insert at the beginning of the macro autoreleasepool {} or try {} catch for this.(...) {}, the @ symbol is thus stuck to either try or autoreleasepool. After the macro is expanded, an unnecessary empty autoreleasepool or try catch appears in the code, but nobody cares a lot. However, this approach does not work for the weakself macro, because the weakself result is an expression, and the expression cannot contain @autoreleasepool try {} catch (...) {} at the beginning.
    self.block = @weakself(^(NSObject *obj)) {
        NSLog(@"%@ %@", [self description], obj);
        return 0;
    } @weakselfend ;
    

    When it comes to complex expressions in C, the ternary operator comes to mind first. It remains to understand how to apply it. The first thing that occurred to me to write something like this: self.block = @ 1? / * block code here * /: nil;

    To do this, just add 1? to the beginning weakself and: nil; at the end of weakselfend. But self.block = 1? / * block code here * /: nil; quite correct expression, therefore @weakself and weakself will work.

    Option self.block = @ []? / * block code here * /: nil; it does not allow @weakself to be used without @, however, after checking the disassembler, it turned out that the optimizer does not throw out the creation of an empty array, and this is an extra overhead in runtime.

    Finally, the idea came up with the use of String Literal Concatenation features in ObjC.
    const char *s0 = "ABC" "DEF"; // это валидная C-строка "ABCDEF"
    NSString *s1 = @"ABC" @"DEF"; // это валидная ObjC-строка @"ABCDEF"
    NSString *s2 = @"ABC" "DEF"; // это тоже валидная ObjC-строка @"ABCDEF"
    NSString *s3 = "ABC" @"DEF"; // а это ошибка компиляции
    

    So, the final version of the macro:
    #define weakself(ARGS) \
    "weakself should be called as @weakself" @"" ? \
    ({  __weak typeof(self) _private_weakSelf = self; \
        ARGS { \
            __strong typeof(_private_weakSelf) self __attribute__((unused)) = _private_weakSelf; \
            return ^ (void) {
    #define weakselfnotnil(ARGS) \
    "weakself should be called as @weakself" @"" ? \
    ({  __weak typeof(self) _private_weakSelf = self; \
        ARGS { \
            __strong typeof(_private_weakSelf) self __attribute__((unused)) = _private_weakSelf; \
            return ^ (void) { if (self)
    #define weakselfend \
        try {} @finally {} } (); }; \
    }) : nil
    

    @weakselfnotnil differs in that if, by the time the block was called, self was already deleted, the block will not be called. Only suitable for cases when the block does not have a return value, otherwise it is not clear what to return if self has already been deleted. Made mainly for the safe use of ivar through explicit self dereferencing:
    self.block = @weakselfnotnil(^) {
        NSLog(@"%d", self->_ivar);
    } @weakselfend;
    


    Performance


    There is probably no need to worry much about performance here, there should not be much overhead. The trick for adding @ to the beginning of the macro is completely thrown by the optimizer. With the overhead of calling an additional unit, things are more interesting. To check how things are with overhead, consider 2 cases using macros from libextobjc and our weakself:
    - (void)m1 {
        @weakify(self);
        self.block = ^(NSObject * obj) {
            @strongify(self);
            NSLog(@"%@", [self description]);
            return 0;
        };
    }
    - (void)m2 {
        self.block = @weakself(^(NSObject * obj)) {
            NSLog(@"%@", [self description]);
            return 0;
        } @weakselfend;
    }
    

    We collect with -O3, open in Hooper and look at the pseudo-code for both cases
    function -[ViewController m1] {
        asm{ vst1.64    {d8, d9, d10, d11}, [r4:128]! };
        asm{ vst1.64    {d12, d13, d14, d15}, [r4:128] };
        r1 = *_NSConcreteStackBlock;
        *((sp - 0x40 & !0xf) - 0x50) = r1;
        var_4 = 0xc2000000;
        var_24 = ((sp - 0x40 & !0xf) - 0x50) + 0x14;
        asm{ stm.w      r5, {r1, r2, r3} };
        r5 = [r0 retain];
        objc_initWeak(var_24, r5);
        [r5 release];
        r0 = *__objc_personality_v0;
        r1 = *0xac24;
        var_52 = r0;
        var_56 = GCC_except_table0;
        var_60 = &var_12;
        var_68 = (sp - 0x40 & !0xf) - 0x50;
        var_64 = (r1 | 0x1) + 0xabc4;
        var_32 = 0x1;
        [r5 setBlock1:(sp - 0x40 & !0xf) - 0x50];
        objc_destroyWeak(var_24);
        r0 = _Unwind_SjLj_Unregister(&var_28);
        asm{ vld1.64    {d8, d9, d10, d11}, [r4:128]! };
        asm{ vld1.64    {d12, d13, d14, d15}, [r4:128] };
        Pop();
        Pop();
        Pop();
        return r0;
    }
    function ___20-[ViewController m1]_block_invoke {
        r4 = objc_loadWeakRetained(r0 + 0x14);
        r0 = [r4 description];
        r5 = [r0 retain];
        NSLog(@"%@", r5);
        [r5 release];
        [r4 release];
        return 0x0;
    }
    function -[ViewController m2] {
        r4 = r0;
        r0 = *_NSConcreteStackBlock;
        *(sp - 0x18) = r0;
        var_4 = 0xc2000000;
        asm{ stm.w      r3, {r0, r1, r2} };
        objc_initWeak((sp - 0x18) + 0x14, r4);
        r5 = objc_retainBlock(sp - 0x18);
        objc_destroyWeak((sp - 0x18) + 0x14);
        [r4 setBlock1:r5];
        r0 = [r5 release];
        return r0;
    }
    function ___20-[ViewController m2]_block_invoke {
        r4 = objc_loadWeakRetained(r0 + 0x14);
        r0 = [r4 description];
        r5 = [r0 retain];
        NSLog(@"%@", r5);
        [r5 release];
        [r4 release];
        return 0x0;
    }
    


    It turns out that weakself is more effective than @ weakify / strongify, the internal additional block is completely inlined, and _block_invoke looks the same in both cases. But the way that extobjc “eats” @ at the beginning of the macro adds useless runtime exception handling code, as seen in _Unwind_SjLj_Unregister.
    When compiling with -Os, everything is not so good, the block is not inline, and instead of one _block_invoke, two are generated
    function ___20-[ViewController m2]_block_invoke {
        r0 = objc_loadWeakRetained(r0 + 0x14);
        r1 = *_NSConcreteStackBlock;
        *(sp - 0x18) = r1;
        var_4 = 0xc2000000;
        asm{ stm.w      r4, {r1, r2, r3} };
        var_20 = r0;
        r4 = [r0 retain];
        r5 = ___20-[ViewController m2]_block_invoke_2(sp - 0x18);
        [var_20 release];
        [r4 release];
        r0 = r5;
        return r0;
    }
    function ___20-[ViewController m2]_block_invoke_2 {
        r0 = *(r0 + 0x14);
        r0 = [r0 description];
        r4 = [r0 retain];
        NSLog(@"%@", r4);
        [r4 release];
        return 0x0;
    }
    


    Unfortunately, clang does not yet allow adding the always_inline attribute to a block.
    The full source code and autocomplete for Xcode is here .

    Also popular now: