The illusion of immutability and trust as the basis of team development

    Generally I am a C ++ programmer. Well it happened. The vast majority of commercial code that I have written in my career is C ++. I don’t really like such a strong bias of my personal experience towards one language, and I try not to miss the opportunity to write something in another language. And my current employer suddenly provided such an opportunity: I undertook to make one not the most trivial utility in Java. The choice of implementation language was made for historical reasons, and I didn’t mind. Java so Java, the less familiar to me - the better.

    Among other things, I had a rather simple task: once to form a certain set of logically connected data and transfer it to a certain consumer. There can be several consumers, and according to the encapsulation principle, the transmitting code (producer) has no idea what is inside and what it can do with the source data. But the manufacturer needs each consumer to receive the same data. I did not want to make copies and give them. This means that we must somehow deprive consumers of the opportunity to change the data transmitted to them.

    It was then that my inexperience in Java made itself felt. I lacked the language features compared to C ++. Yes, there is a keyword final, but final Object- it's like Object* constin C ++, not const Object*. Those. infinal Listyou can add lines, for example. It's C ++ business: to pose everywhere constaccording to Myers’s testament , and that’s it! No one will change anything. So? Well, not quite. I thought about this a bit instead of doing that utility at my leisure, and that’s what I came to.

    C ++


    Let me remind you the task itself:

    1. Create a data set once.
    2. Do not copy anything unnecessarily.
    3. Prevent the consumer from changing this data.
    4. Minimize code, i.e. Do not create a bunch of methods and interfaces for each data set that is needed, in general, in just a couple of places.

    No aggravating conditions such as multithreading, security in the sense of exceptions, etc. Consider the simplest case. Here is how I would do it using the most familiar language:

    foo.hpp
    #pragma once
    #include 
    #include 
    struct Foo
    {
        const int intValue;
        const std::string strValue;
        const std::list listValue;
        Foo(int intValue_,
            const std::string& strValue_,
            const std::list& listValue_)     
            : intValue(intValue_)
            , strValue(strValue_)
            , listValue(listValue_)
        {}
    };
    std::ostream& operator<<(std::ostream& out, const Foo& foo)
    {
        out << "INT: " << foo.intValue << "\n";
        out << "STRING: " << foo.strValue << "\n";
        out << "LIST: [";
        for (auto it = foo.listValue.cbegin(); it != foo.listValue.cend(); ++it)
        {
            out << (it == foo.listValue.cbegin() ? "" : ", ") << *it;
        }
        out << "]\n";
        return out;
    }


    api.hpp
    #pragma once
    #include "foo.hpp"
    #include 
    class Api
    {
    public:
        const Foo& getFoo() const
        {
            return currentFoo;
        }
    private:
        const Foo currentFoo = Foo{42, "Fish", {0, 1, 2, 3}};
    };

    main.cpp
    #include "api.hpp"
    #include "foo.hpp"
    #include 
    namespace
    {
        void goodConsumer(const Foo& foo)
        {
            // do nothing wrong with foo
        }
    }
    int main()
    {
        {
            const auto& api = Api();
            goodConsumer(api.getFoo());
            std::cout << "*** After good consumer ***\n";
            std::cout << api.getFoo() << std::endl;
        }
    }


    Obviously, everything is fine here, the data is unchanged.

    Conclusion
    *** After good consumer ***
    INT: 42
    STRING: Fish
    LIST: [0, 1, 2, 3]

    And if someone tries to change something?


    main.cpp
    void stupidConsumer(const Foo& foo)
    {
        foo.listValue.push_back(100);
    }


    Yes, the code just does not compile.

    Error
    src/main.cpp: In function ‘void {anonymous}::stupidConsumer(const Foo&)’:
    src/main.cpp:16:36: error: passing ‘const std::__cxx11::list’ as ‘this’ argument discards qualifiers [-fpermissive]
             foo.listValue.push_back(100);


    What could go wrong?


    This is C ++ - a language with a rich arsenal of weapons for shooting on your own legs! For instance:

    main.cpp
    void evilConsumer(const Foo& foo)
    {
        const_cast(foo.intValue) = 7;
        const_cast(foo.strValue) = "James Bond";
    }


    Well, actually everything:
    *** After evil consumer ***
    INT: 7
    STRING: James Bond
    LIST: [0, 1, 2, 3]


    I also note that using reinterpret_castinstead const_castin this case will lead to a compilation error. But the cast in the style of C will allow you to crank up this focus.

    Yes, such code can lead to Undefined Behavior [C ++ 17 10.1.7.1/4] . He generally looks suspicious, which is good. It’s easier to catch during a review.

    It’s bad that the malicious code can hide anywhere deep in the consumer, but it will work anyway:

    main.cpp
    void evilSubConsumer(const std::string& value)
    {
        const_cast(value) = "Loki";
    }
    void goodSubConsumer(const std::string& value)
    {
        evilSubConsumer(value);
    }
    void evilCautiousConsumer(const Foo& foo)
    {
        const auto& strValue = foo.strValue;
        goodSubConsumer(strValue);
    }


    Conclusion
    *** After evil but cautious consumer ***
    INT: 42
    STRING: Loki
    LIST: [0, 1, 2, 3]


    Advantages and disadvantages of C ++ in this context


    What well:
    • you can easily declare read access to anything
    • accidental violation of this restriction is detected at the compilation stage, because constant and non-constant objects can have different interfaces
    • Conscious violation can be detected on a code review

    What is wrong:
    • deliberate circumvention of the prohibition of change is possible
    • and executed in one line, i.e. easy to skip on code review
    • and can lead to undefined behavior
    • class definition can be inflated due to the need to implement different interfaces for constant and non-constant objects


    Java


    In Java, as I understand it, a slightly different approach is used. Primitive types declared as final- are constant in the same sense as in C ++. Strings in Java are basically immutable, so that final String's what we need in this case.

    Collections can be placed in immutable wrappers, for which there are static class methods java.util.Collections- unmodifiableList, unmodifiableMapetc. Those. The interface for constant and non-constant objects is the same, but non-constant objects throw an exception when trying to change them.

    As for custom types, the user himself will have to create immutable wrappers. In general, here is my option for Java.

    Foo.java
    package foo;
    import java.util.Collections;
    import java.util.List;
    public final class Foo {
        public final int intValue;
        public final String strValue;
        public final List listValue; 
        public Foo(final int intValue,
                   final String strValue,
                   final List listValue) {
            this.intValue = intValue;
            this.strValue = strValue;
            this.listValue = Collections.unmodifiableList(listValue);
        }
        @Override
        public String toString() {
            final StringBuilder sb = new StringBuilder();
            sb.append("INT: ").append(intValue).append("\n")
              .append("STRING: ").append(strValue).append("\n")
              .append("LIST: ").append(listValue.toString());
            return sb.toString();
        }
    }


    Api.java
    package api;
    import foo.Foo;
    import java.util.Arrays;
    public final class Api {
        private final Foo foo = new Foo(42, "Fish", Arrays.asList(0, 1, 2, 3));
        public final Foo getFoo() {
            return foo;
        }
    }


    Main.java
    import api.Api;
    import foo.Foo;
    public final class Main {
        private static void goodConsumer(final Foo foo) {
            // do nothing wrong with foo
        }
        public static void main(String[] args) throws Exception {
            {
                final Api api = new Api();
                goodConsumer(api.getFoo());
                System.out.println("*** After good consumer ***");
                System.out.println(api.getFoo());
                System.out.println();
            }
        }
    }


    Conclusion
    *** After good consumer ***
    INT: 42
    STRING: Fish
    LIST: [0, 1, 2, 3]


    Failed Change Attempt


    If you just try to change something, for example:

    Main.java
    private static void stupidConsumer(final Foo foo) {
        foo.listValue.add(100);
    }


    This code will compile, but an exception will be thrown at runtime:

    An exception
    Exception in thread "main" java.lang.UnsupportedOperationException
    	at java.base/java.util.Collections$UnmodifiableCollection.add(Collections.java:1056)
    	at Main.stupidConsumer(Main.java:15)
    	at Main.main(Main.java:70)


    Successful attempt


    And if in a bad way? There is no way to remove a qualifier from a type final. But in Java there is a much more powerful thing - reflection.

    Main.java
    import java.lang.reflect.Field;
    private static void evilConsumer(final Foo foo) throws Exception {
        final Field intField = Foo.class.getDeclaredField("intValue");
        intField.setAccessible(true);
        intField.set(foo, 7);
        final Field strField = Foo.class.getDeclaredField("strValue");
        strField.setAccessible(true);
        strField.set(foo, "James Bond");
    }


    And immunity over
    *** After evil consumer ***
    INT: 7
    STRING: James Bond
    LIST: [0, 1, 2, 3]


    Such code looks even more suspicious than cosnt_castin C ++, it is even easier to catch on a review. And it can also lead to unpredictable effects (i.e., does Java have UB ?). And it can also hide arbitrarily deeply.

    These unpredictable effects may be due to the fact that when the finalobject is changed using reflection, the value returned by the method hashCode()may remain the same. Different objects with the same hash is not a problem, but identical objects with different hashes is bad.

    Why is such a hack more dangerous in Java for strings ( example): the lines here can be stored in the pool, and the same value in the pool can be indicated by no way related to each other, just the same lines. Changed one - changed them all.

    But! JVM can be run with various security settings. Already default Security Manager, being activated, suppresses all the above tricks with reflection:

    An exception
    $ java -Djava.security.manager -jar bin/main.jar
    Exception in thread "main" java.security.AccessControlException: access denied ("java.lang.reflect.ReflectPermission" "suppressAccessChecks")
    	at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
    	at java.base/java.security.AccessController.checkPermission(AccessController.java:895)
    	at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:335)
    	at java.base/java.lang.reflect.AccessibleObject.checkPermission(AccessibleObject.java:85)
    	at java.base/java.lang.reflect.Field.setAccessible(Field.java:169)
    	at Main.evilConsumer(Main.java:20)
    	at Main.main(Main.java:71)


    Advantages and disadvantages of Java in this context


    What well:
    • there is a keyword finalthat somehow limits data change
    • there are library methods for turning collections into immutable
    • conscious immunity violation is easily detected by code review
    • have JVM security settings

    What is wrong:
    • an attempt to change an immutable object will appear only at runtime
    • in order to make an object of a certain class immutable, you will have to write the appropriate wrapper yourself
    • in the absence of appropriate security settings, it is possible to change any immutable data
    • this action can have unpredictable consequences (although maybe it's good - almost no one will do that)


    Python


    Well, after that I was simply swept along the waves of curiosity. How are such tasks solved, for example, in Python? And are they decided at all? Indeed, in python there is no constancy in principle, even there are no such keywords.

    foo.py
    class Foo():
        def __init__(self, int_value, str_value, list_value):
            self.int_value = int_value
            self.str_value = str_value
            self.list_value = list_value
        def __str__(self):
            return 'INT: ' + str(self.int_value) + '\n' + \
                   'STRING: ' + self.str_value + '\n' + \
                   'LIST: ' + str(self.list_value)


    api.py
    from foo import Foo
    class Api():
        def __init__(self):
            self.__foo = Foo(42, 'Fish', [0, 1, 2, 3])
        def get_foo(self):
            return self.__foo


    main.py
    from api import Api
    def good_consumer(foo):
        pass
    def evil_consumer(foo):
        foo.int_value = 7
        foo.str_value = 'James Bond'
    def main():
        api = Api()
        good_consumer(api.get_foo())
        print("*** After good consumer ***")
        print(api.get_foo())
        print()
        api = Api()
        evil_consumer(api.get_foo())
        print("*** After evil consumer ***")
        print(api.get_foo())
        print()
    if __name__ == '__main__':
        main()


    Conclusion
    *** After good consumer ***
    INT: 42
    STRING: Fish
    LIST: [0, 1, 2, 3]
    *** After evil consumer ***
    INT: 7
    STRING: James Bond
    LIST: [0, 1, 2, 3]


    Those. no tricks are needed, take it and change the fields of any object.

    Gentleman's agreement


    The following practice is accepted in python :
    • custom fields and methods whose names begin with a single underscore are protected ( protected in C ++ and Java) fields and methods
    • custom fields and methods with names beginning with two underscores, - a private ( private ) fields and methods

    The language even makes mangling for "private" fields. A very naive decoration, no comparison with C ++, but this is enough to ignore (but not catch) unintentional (or naive) errors.

    The code
    class Foo():
        def __init__(self, int_value):
            self.__int_value = int_value
        def int_value(self):
            return self.__int_value
    def evil_consumer(foo):
        foo.__int_value = 7


    Conclusion
    *** After evil consumer ***
    INT: 42


    And to make a mistake intentionally, just add a few characters.

    The code
    def evil_consumer(foo):
        foo._Foo__int_value = 7


    Conclusion
    *** After evil consumer ***
    INT: 7


    Another option


    I liked the solution proposed by Oz N Tiram . This is a simple decorator that when trying to change read only field throws an exception. This is a bit beyond the agreed scope (“do not create a bunch of methods and interfaces”), but, I repeat, I liked it.

    foo.py
    from read_only_properties import read_only_properties
    @read_only_properties('int_value', 'str_value', 'list_value')
    class Foo():
        def __init__(self, int_value, str_value, list_value):
            self.int_value = int_value
            self.str_value = str_value
            self.list_value = list_value
        def __str__(self):
            return 'INT: ' + str(self.int_value) + '\n' + \
                   'STRING: ' + self.str_value + '\n' + \
                   'LIST: ' + str(self.list_value)


    main.py
    def evil_consumer(foo):
        foo.int_value = 7
        foo.str_value = 'James Bond'


    Conclusion
    Traceback (most recent call last):
      File "src/main.py", line 35, in 
        main()
      File "src/main.py", line 28, in main
        evil_consumer(api.get_foo())
      File "src/main.py", line 9, in evil_consumer
        foo.int_value = 7
      File "/home/Tmp/python/src/read_only_properties.py", line 15, in __setattr__
        raise AttributeError("Can't touch {}".format(name))
    AttributeError: Can't touch int_value


    But this is not a panacea. But at least the corresponding code looks suspicious.

    main.py
    def evil_consumer(foo):
        foo.__dict__['int_value'] = 7
        foo.__dict__['str_value'] = 'James Bond'


    Conclusion
    *** After evil consumer ***
    INT: 7
    STRING: James Bond
    LIST: [0, 1, 2, 3]


    The advantages and disadvantages of Python in this context


    Does python seem to be very bad? No, this is just another philosophy of language. Usually it is expressed by the phrase “ We are all consenting adults here ” ( We are all consenting adults here ). Those. it is assumed that no one will specifically deviate from the accepted norms. The concept is not certain, but it has the right to life.

    What well:
    • it is openly declared that programmers should monitor access rights, not the compiler or interpreter
    • there is a generally accepted naming convention for secure and private fields and methods
    • some access violations are easily detected on a code review

    What is wrong:
    • at the language level it is impossible to restrict access to the fields of the class
    • everything rests solely on the goodwill and honesty of the developers
    • errors occur only at runtime


    Go


    Another language that I periodically feel (mostly just reading articles), although I have not written a line of commercial code on it yet. constThere is basically a keyword here, but only strings and integer values ​​known at compile time (i.e. constexprfrom C ++) can be constants . But the structure fields cannot. Those. if the fields are declared open, then it turns out like in python - change who you want. Not interested. I won’t even give an example code.

    Well, let the fields be private, and let their values ​​be obtained through calls to open methods. Can I get firewood in Go? Of course, there is also reflection here.

    foo.go
    package foo
    import "fmt"
    type Foo struct {
        intValue int
        strValue string
        listValue []int
    }
    func (foo *Foo) IntValue() int {
        return foo.intValue;
    }
    func (foo *Foo) StrValue() string {
        return foo.strValue;
    }
    func (foo *Foo) ListValue() []int {
        return foo.listValue;
    }
    func (foo *Foo) String() string {
        result := fmt.Sprintf("INT: %d\nSTRING: %s\nLIST: [", foo.intValue, foo.strValue)
        for i, num := range foo.listValue {
            if i > 0 {
                result += ", "
            }
            result += fmt.Sprintf("%d", num)
        }
        result += "]"
        return result
    }
    func New(i int, s string, l []int) Foo {
        return Foo{intValue: i, strValue: s, listValue: l}
    }


    api.go
    package api
    import "foo"
    type Api struct {
        foo foo.Foo
    }
    func (api *Api) GetFoo() *foo.Foo {
        return &api.foo
    }
    func New() Api {
        api := Api{}
        api.foo = foo.New(42, "Fish", []int{0, 1, 2, 3})
    	return api
    }


    main.go
    package main
    import (
        "api"
        "foo"
        "fmt"
        "reflect"
        "unsafe"
    )
    func goodConsumer(foo *foo.Foo) {
        // do nothing wrong with foo
    }
    func evilConsumer(foo *foo.Foo) {
        reflectValue := reflect.Indirect(reflect.ValueOf(foo))
        member := reflectValue.FieldByName("intValue")
        intPointer := unsafe.Pointer(member.UnsafeAddr())
        realIntPointer := (*int)(intPointer)
        *realIntPointer = 7
        member = reflectValue.FieldByName("strValue")
        strPointer := unsafe.Pointer(member.UnsafeAddr())
        realStrPointer := (*string)(strPointer)
        *realStrPointer = "James Bond"
    }
    func main() {
        apiInstance := api.New()
        goodConsumer(apiInstance.GetFoo())
        fmt.Println("*** After good consumer ***")
        fmt.Println(apiInstance.GetFoo().String())
        fmt.Println()
        apiInstance = api.New()
        evilConsumer(apiInstance.GetFoo())
        fmt.Println("*** After evil consumer ***")
        fmt.Println(apiInstance.GetFoo().String())
    }


    Conclusion
    *** After good consumer ***
    INT: 42
    STRING: Fish
    LIST: [0, 1, 2, 3]
    *** After evil consumer ***
    INT: 7
    STRING: James Bond
    LIST: [0, 1, 2, 3]


    By the way, strings in Go are immutable, as in Java. Slices and maps are mutable, and unlike Java, there is no way in the core of the language to make them immutable. Only code generation (correct if I'm wrong). Those. even if everything is done correctly, do not use dirty tricks, just return the slice from the method - this slice can always be changed.

    The gopher community clearly lacks immutable types, but there certainly won't be any in Go 1.x.

    Advantages and disadvantages of Go in this context


    In my inexperienced view on the possibilities of prohibiting changing the fields of Go structures, it is somewhere between Java and Python, closer to the latter. At the same time, Go does not (I have not met, although I was looking for) the Python principle of adults. But there is: inside one package everything has access to everything, only rudiment remains from the constants, the presence of the absence of unchangeable collections. Those. if the developer can read some data, then with a high probability he can write something there. Which, as in python, conveys most of the responsibility from the compiler to the person.

    What well:
    • all access errors occur during compilation
    • reflection-based dirty tricks are clearly visible in the review

    What is wrong:
    • there is simply no concept of a “read-only dataset”
    • it is impossible to restrict access to structure fields within a package
    • to protect the fields from changes outside the package, you will have to write getters
    • all reference collections are mutable
    • with the help of reflection you can even change private fields


    Erlang


    This is out of competition. Still, Erlang is a language with a very different paradigm from the above four. Once I studied it with great interest, I really liked to make myself think in a functional style. But, unfortunately, I did not find a practical application of these skills.

    So, in this language the value of a variable can be assigned only once. And when the function is called, all arguments are passed by value, i.e. a copy of them is made (but there is an optimization of tail recursion).

    foo.erl
    -module(foo).
    -export([new/3, print/1]).
    new(IntValue, StrValue, ListValue) ->
        {foo, IntValue, StrValue, ListValue}.
    print(Foo) ->
        case Foo of
            {foo, IntValue, StrValue, ListValue} -> 
                io:format("INT: ~w~nSTRING: ~s~nLIST: ~w~n",
                          [IntValue, StrValue, ListValue]);
            _ -> 
                throw({error, "Not a foo term"})
        end.


    api.erl
    -module(api).
    -export([new/0, get_foo/1]).
    new() ->
        {api, foo:new(42, "Fish", [0, 1, 2, 3])}.
    get_foo(Api) ->
        case Api of
            {api, Foo} -> Foo;
            _ -> throw({error, "Not an api term"})
        end.


    main.erl
    -module(main).
    -export([start/0]).
    start() ->
        ApiForGoodConsumer = api:new(),
        good_consumer(api:get_foo(ApiForGoodConsumer)),
        io:format("*** After good consumer ***~n"),
        foo:print(api:get_foo(ApiForGoodConsumer)),
        io:format("~n"),
        ApiForEvilConsumer = api:new(),
        evil_consumer(api:get_foo(ApiForEvilConsumer)),
        io:format("*** After evil consumer ***~n"),
        foo:print(api:get_foo(ApiForEvilConsumer)),
        init:stop().
    good_consumer(_) ->
        done.
    evil_consumer(Foo) ->
        _ = setelement(1, Foo, 7),
        _ = setelement(2, Foo, "James Bond").


    Conclusion
    *** After good consumer ***
    INT: 42
    STRING: Fish
    LIST: [0,1,2,3]
    *** After evil consumer ***
    INT: 42
    STRING: Fish
    LIST: [0,1,2,3]


    Of course, you can make copies for every sneeze and so protect yourself from data corruption in other languages. But there is a language (and certainly not one) where it simply cannot be done in another way!

    Advantages and disadvantages of Erlang in this context


    What well:
    • data cannot be altered at all

    What is wrong:
    • copying, copying everywhere


    Instead of conclusions and conclusions


    And what is the result? Well, besides the fact that I blew dust from a couple of books I read a long time ago, I stretched my fingers, wrote a useless program in 5 different languages, and scratched the FAQ?

    Firstly, I stopped thinking that C ++ is the most reliable language in terms of protection against an active fool. Despite all its flexibility and rich syntax. Now I am inclined to think that Java in this regard provides more protection. This is not a very original conclusion, but for myself I find it very useful.

    Secondly, I suddenly formulated for myself the idea that programming languages ​​can be roughly divided into those that try to restrict access to certain data at the level of syntax and semantics, and those that do not even try to shift these concerns to users . Accordingly, the entry threshold, best practices, requirements for team development participants (both technical and personal) should somehow differ depending on the selected language of interest. I would love to read on this subject.

    Thirdly: no matter how the language tries to protect data from writing, the user can almost always do this if desired (“almost” because of Erlang). And if you confine yourself to mainstream languages ​​- it’s always easy. And it turns out that all of these constandfinal- no more than recommendations, instructions for the correct use of interfaces. Not all languages ​​have it, but I still prefer to have such tools in my arsenal.

    And fourthly, the most important thing: since no (mainstream) language can prevent a developer from doing nasty things, the only thing that keeps this developer up is his own decency. And it turns out that, when I put constin my code, I do not forbid something to my colleagues (and my future self), but leave the instructions, believing that they (and I) will follow them. Those. I trust my colleagues.

    No, I have long known that modern software development is in 99.99% of cases team work. But I was lucky, all my colleagues were “adults, responsible” people. For me, it always has somehow been, and it is taken for granted that all team members abide by the established rules. My path to the realization that we constantly trust and respect each other was a long, but damn calm and safe.

    PS


    If someone is interested in the used code examples, you can take them here .

    Also popular now: