Features of creating NSString

    NSLog (123456789)! = 123456789The article is intended for beginners in Objective-C and talks about one way to shoot yourself in the foot. We will try to create two different NSString objects with the same text, examine the reaction of different compilers to this, and also find out under what conditions NSLog (@ "% @", @ "123456789") will not output "123456789" at all.

    NSString Objects and Pointers


    What do you think the following code will output?
    #import "Foundation/Foundation.h"int main(){
      @autoreleasepool {
        NSString *a = @"123456789";
        NSString *b = a;
        NSLog(@"%p %p", a, b);
      }
      return0;
    }
    

    Naturally, the pointers will be equal (“objects are assigned by reference”), so NSLog () will print two identical memory addresses. No magic:

    2015-01-30 14: 39: 27.662 1-nsstring [13574] 0x602ea0 0x602ea0

    Hereinafter, the addresses of objects are given as an example; when trying to reproduce, the actual values, of course, will be different.

    Let's try to ensure that we have two different NSString with the same text. In the case of other standard classes, such as NSArray, we could write this:
    #import "Foundation/Foundation.h"int main(){
      @autoreleasepool {
        NSArray *a = @[@"123456789"];
        NSArray *b = @[@"123456789"];
        NSLog(@"%p %p", a, b);
      }
      return0;
    }
    

    Since we initialized NSArray separately, they were placed in different sections of memory and two different addresses will be

    displayed in the console: 2015-01-30 14: 40: 45.799 2-nsarray [ 13634 ] 0xa9e1b8 0xaa34e8

    However, applying the same approach to NSString will not to the desired effect:
    #import "Foundation/Foundation.h"int main(){
      @autoreleasepool {
        NSString *a = @"123456789";
        NSString *b = @"123456789";
        NSLog(@"%p %p", a, b);
      }
      return0;
    }
    

    2015-01-30 14: 41: 41.898 3-nsstring [13678] 0x602ea0 0x602ea0

    As you can see, despite separate initialization, both pointers still refer to the same memory area.

    Using stringWithString


    Digging a bit in NSString, we find the stringWithString method , which "returns a string created by copying the characters from another given string". So this is what we need! Let's try the following code:
    #import "Foundation/Foundation.h"int main(){
      @autoreleasepool {
        NSString *a = @"123456789";
        NSString *b = [NSString stringWithString:@"123456789"];
        NSString *с = [NSString stringWithString:b];
        NSLog(@"%p %p %p", a, b, с);
      }
      return0;
    }
    

    It turns out that the output of this program depends on the version of the compiler used. So clang under Ubuntu on LLVM 3.4 will actually create three different objects located in different memory locations. But compiling the specified code in Xcode using clang for Mac on LLVM 3.5 will generate only one object and three pointers to it:

    2015-01-30 17: 59: 02.206 4-nsstring [670: 21855] 0x100001048 0x100001048 0x100001048

    Exposing Magic Session


    The above oddities are explained by compiler attempts to optimize string resources. Meeting string objects with the same content in the source code, it creates them only once to save storage and comparison costs. This optimization is also performed at the linking stage: even if lines with the same text are in different modules, most likely they will be created only once.

    Since the NSString type is immutable (NSMutableString is used for mutable strings), this optimization is safe. As long as we only manipulate strings with methods of the NSString class.

    The compiler, however, is not omnipotent. One of the easiest ways to confuse it and really create two different NSString with the same text is as follows:
    #import "Foundation/Foundation.h"int main(){
      @autoreleasepool {
        NSString *a = @"123456789";
        NSString *b = [NSString stringWithFormat:@"%@", a];
        NSLog(@"%p %p", a, b);
      }
      return0;
    }
    

    Gcc


    Gcc performs similar string constant optimization when compiling C code. For example,
    #include<stdio.h>voidmain(){
      char *a = "123456789";
      char *b = "123456789";
      printf("%p %p\n", a, b);
    }
    

    will output 0x4005f4 0x4005f4 .

    However, there is a significant difference with clang: gcc places such string constants in the read-only segment - attempts to change them in runtime (for example, a [0] = '0') will lead to a segmentation fault. To place lines on the stack where they can be changed, you need to replace char * a with char a [], however, in this case gcc will not apply the optimization. The following code will already create two different lines:
    #include<stdio.h>voidmain(){
      char a[] = "123456789";
      char b[] = "123456789";
      printf("%p %p\n", a, b);
    }
    

    0x7fff17ed0020 0x7fff17ed0030

    Foot shooting


    So, we know that meeting the same string objects in the source code, the compiler optimizes them and creates an NSString only once. At the same time, he creates it in a heap, where it can be changed using manual manipulations with the pointer. (In plain C, as discussed above, this is not possible.)

    Guess what the following code prints?
    #import <Foundation/Foundation.h>void bad(){
      NSString* a = @"123456789";
      char* aa = (__bridge void *)(a);
      aa[8] = 92;
    }
    int main(){
      @autoreleasepool {
        bad();
        NSLog(@"%@", @"123456789");
      }
      return0;
    }
    

    Depending on the compiler, the result may be different: my Xcode under Mac pops a set of “рак 㐳 㘵 㠷 9 䀥” krakozyab, and clang in Ubuntu displays a fragment from the “red: pars” service information. In any case, this is not the expected “123456789”. Experiments with other values ​​of aa [8], as well as aa [16], I suggest the reader to do it himself.

    The worst part is that the bad () function from the last example can be located behind the header, for example, in the plug-in library of another author who, according to his needs, changed his personal (as it seemed to him) NSString. The smart compiler will still find matching string constants and close them to one pointer, after which spoiling the variable inside bad () will cause the string in the main () context to turn into hieroglyphs.

    Also popular now: