Features of creating NSString

NSString Objects and Pointers
What do you think the following code will output?
#import "Foundation/Foundation.h"int main(){
@autoreleasepool {
NSString *a = @"123456789";
NSString *b = a;
NSLog(@"%p %p", a, b);
}
return0;
}
Naturally, the pointers will be equal (“objects are assigned by reference”), so NSLog () will print two identical memory addresses. No magic:
2015-01-30 14: 39: 27.662 1-nsstring [13574] 0x602ea0 0x602ea0
Hereinafter, the addresses of objects are given as an example; when trying to reproduce, the actual values, of course, will be different.
Let's try to ensure that we have two different NSString with the same text. In the case of other standard classes, such as NSArray, we could write this:
#import "Foundation/Foundation.h"int main(){
@autoreleasepool {
NSArray *a = @[@"123456789"];
NSArray *b = @[@"123456789"];
NSLog(@"%p %p", a, b);
}
return0;
}
Since we initialized NSArray separately, they were placed in different sections of memory and two different addresses will be
displayed in the console: 2015-01-30 14: 40: 45.799 2-nsarray [ 13634 ] 0xa9e1b8 0xaa34e8
However, applying the same approach to NSString will not to the desired effect:
#import "Foundation/Foundation.h"int main(){
@autoreleasepool {
NSString *a = @"123456789";
NSString *b = @"123456789";
NSLog(@"%p %p", a, b);
}
return0;
}
2015-01-30 14: 41: 41.898 3-nsstring [13678] 0x602ea0 0x602ea0
As you can see, despite separate initialization, both pointers still refer to the same memory area.
Using stringWithString
Digging a bit in NSString, we find the stringWithString method , which "returns a string created by copying the characters from another given string". So this is what we need! Let's try the following code:
#import "Foundation/Foundation.h"int main(){
@autoreleasepool {
NSString *a = @"123456789";
NSString *b = [NSString stringWithString:@"123456789"];
NSString *с = [NSString stringWithString:b];
NSLog(@"%p %p %p", a, b, с);
}
return0;
}
It turns out that the output of this program depends on the version of the compiler used. So clang under Ubuntu on LLVM 3.4 will actually create three different objects located in different memory locations. But compiling the specified code in Xcode using clang for Mac on LLVM 3.5 will generate only one object and three pointers to it:
2015-01-30 17: 59: 02.206 4-nsstring [670: 21855] 0x100001048 0x100001048 0x100001048
Exposing Magic Session
The above oddities are explained by compiler attempts to optimize string resources. Meeting string objects with the same content in the source code, it creates them only once to save storage and comparison costs. This optimization is also performed at the linking stage: even if lines with the same text are in different modules, most likely they will be created only once.
Since the NSString type is immutable (NSMutableString is used for mutable strings), this optimization is safe. As long as we only manipulate strings with methods of the NSString class.
The compiler, however, is not omnipotent. One of the easiest ways to confuse it and really create two different NSString with the same text is as follows:
#import "Foundation/Foundation.h"int main(){
@autoreleasepool {
NSString *a = @"123456789";
NSString *b = [NSString stringWithFormat:@"%@", a];
NSLog(@"%p %p", a, b);
}
return0;
}
Gcc
Gcc performs similar string constant optimization when compiling C code. For example,
#include<stdio.h>voidmain(){
char *a = "123456789";
char *b = "123456789";
printf("%p %p\n", a, b);
}
will output 0x4005f4 0x4005f4 .
However, there is a significant difference with clang: gcc places such string constants in the read-only segment - attempts to change them in runtime (for example, a [0] = '0') will lead to a segmentation fault. To place lines on the stack where they can be changed, you need to replace char * a with char a [], however, in this case gcc will not apply the optimization. The following code will already create two different lines:
#include<stdio.h>voidmain(){
char a[] = "123456789";
char b[] = "123456789";
printf("%p %p\n", a, b);
}
0x7fff17ed0020 0x7fff17ed0030
Foot shooting
So, we know that meeting the same string objects in the source code, the compiler optimizes them and creates an NSString only once. At the same time, he creates it in a heap, where it can be changed using manual manipulations with the pointer. (In plain C, as discussed above, this is not possible.)
Guess what the following code prints?
#import <Foundation/Foundation.h>void bad(){
NSString* a = @"123456789";
char* aa = (__bridge void *)(a);
aa[8] = 92;
}
int main(){
@autoreleasepool {
bad();
NSLog(@"%@", @"123456789");
}
return0;
}
Depending on the compiler, the result may be different: my Xcode under Mac pops a set of “рак 㐳 㘵 㠷 9 䀥” krakozyab, and clang in Ubuntu displays a fragment from the “red: pars” service information. In any case, this is not the expected “123456789”. Experiments with other values of aa [8], as well as aa [16], I suggest the reader to do it himself.
The worst part is that the bad () function from the last example can be located behind the header, for example, in the plug-in library of another author who, according to his needs, changed his personal (as it seemed to him) NSString. The smart compiler will still find matching string constants and close them to one pointer, after which spoiling the variable inside bad () will cause the string in the main () context to turn into hieroglyphs.