Some subtleties of GetHashCode

    As I read the “Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries,” I came across this phrase:

    “Ensure that GetHashCode returns exactly the same value regardless of any changes that are made to the object”.

    Hmm ... I thought, what are they talking about? Before my eyes came up the standard implementation, which is generated by ReSharper, and I realized that the generated value will not be constant throughout the life of the object when it changes.

    I decided to sketch out an example to understand the scale of the problem, so, suppose we have a class that reflects a person, and for unique identification we will use his SNILS number:

    public class Employee
        {
            public string FirstName { get; set; }
            public string SecondName { get; set; }
            public string Snils { get; set; }
            protected bool Equals(Employee other)
            {
                return string.Equals(Snils, other.Snils);
            }
            public override bool Equals(object obj)
            {
                if (ReferenceEquals(null, obj)) return false;
                if (ReferenceEquals(this, obj)) return true;
                if (obj.GetType() != this.GetType()) return false;
                return Equals((Employee) obj);
            }
            public override int GetHashCode()
            {
                return (Snils != null ? Snils.GetHashCode() : 0);
            }
        }
    

    Overloaded methods are generated by ReSharper. At first glance, everything is fine. The fields used in the equality check are used to generate a hash. Equal objects will have equal hash codes. Everything seems to be wonderful.
    Add some business logic:

    var employees = new HashSet();
    var employee = new Employee()
                               {
                                   FirstName = "Sergei",
                                   SecondName = "Popov",
                                   Snils = "123456"
                               };
    employees.Add(employee);
    Console.WriteLine(employees.Contains(employee));
    

    And we see the message “True”.
    But what if at some point I decided to change my SNILS

    var employees = new HashSet()
    var employee = new Employee()
                               {
                                   FirstName = "Sergei",
                                   SecondName = "Popov",
                                   Snils = "123456"
                               };
    employees.Add(employee);
    // решил я поменять свой СНИЛС
    employee.Snils = "654321";
    Console.WriteLine(employees.Contains(employee));
    

    And we see the message “False”.

    What happened?
    Internally, a HashSet consists of a number of baskets. The cart for the object is selected based on the value returned by GetHashCode. As soon as we changed the SNILS number, the value returned by GetHashCode also changed. HashSet, in turn, on the basis of the hash code, chose another basket for viewing and, of course, our object is not in this basket (with very little probability, of course, it could be there). In other baskets HashSet will not look, because equal objects must have equal GetHashCode values. That’s all. The object will not be found.

    How did it even work?
    If you did not redefine Equals & GetHashCode, then your object will have a constant GetHashCode object throughout the life of the object, regardless of the changes you made to the fields of the object. But, in case you overload these methods, you need to use only immutable fields in the hash generation algorithm, either not change the fields used in the generation algorithm, or come up with your own crutch (as an option, you can use the approach implemented in the standard implementation of the Object class) .

    Hence the moral:
    The value of the hash code must be constant throughout the life of the object, or you must clearly understand what you are doing, if in your case it can change.

    PS. I understand that far from Rocket Science is described here. Everything written here is obvious and follows from the Microsoft requirements for the mentioned methods. There is a good description from Lippert here , however, straight away , I ran into it and did not believe that the HashSet would return False. Hope you are not there now.

    Also popular now: