Do you read Scaladoc for “obvious” collection methods? Or why laziness is not always good.

If you do not know the difference

someMap.map{ case (k, v) => k -> foo(v)}

and

someMap.mapValues(foo)

besides syntax or doubt / do not know what bad consequences this difference can lead to, and here identity, this article is for you.

Otherwise - participate in the survey, located at the end of the article.

Start simple

Let's try to take an example, located before the kata and see what happens:

val someMap = Map("key1" -> "value1", "key2" -> "value2")
deffoo(value: String): String = value + "_changed"val resultMap = someMap.map{case (k,v) => k -> foo(v)}
val resultMapValues = someMap.mapValues(foo)
println(s"resultMap:       $resultMap")
println(s"resultMapValues: $resultMapValues")
println(s"equality: ${resultMap == resultMapValues}")

This code is expected to display

resultMap:       Map(key1 -> value1_changed, key2 -> value2_changed)
resultMapValues: Map(key1 -> value1_changed, key2 -> value2_changed)
equality: true

And at about this level, the understanding of the method is postponed mapValuesat the early stages of learning Scala: well, yes, there is such a method, it is convenient to change the values in Map, when the keys do not change. And the truth is, what else is there to think about? By the name of the method, everything is obvious, the behavior is clear.

Examples are more complicated

Let's modify our example a bit (I will explicitly write types so that you don’t think there’s some kind of fly-over with implications):

caseclassValueHolder(value: String)val someMap: Map[String, String] = Map("key1" -> "value1", "key2" -> "value2")
deffoo(value: String): ValueHolder = ValueHolder(value)
val resultMap: Map[String, ValueHolder] = someMap.map{case (k,v) => k -> foo(v)}
val resultMapValues: Map[String, ValueHolder] = someMap.mapValues(foo)
println(s"resultMap:       $resultMap")
println(s"resultMapValues: $resultMapValues")
println(s"equality: ${resultMap == resultMapValues}")

And such a code after launch will give

resultMap:       Map(key1 -> ValueHolder(value1), key2 -> ValueHolder(value2))
resultMapValues: Map(key1 -> ValueHolder(value1), key2 -> ValueHolder(value2))
equality: true

It is quite logical and obvious. "Dude, it's time to get to the essence of the article!" - the reader will say. Let's make the creation of our class dependent on external conditions and add a couple of simple tests for idiocy:

caseclassValueHolder(value: String, seed: Int)deffoo(value: String): ValueHolder = ValueHolder(value, Random.nextInt())
...
println(s"simple assert for resultMap:       ${resultMap.head == resultMap.head}")
println(s"simple assert for resultMapValues: ${resultMapValues.head == resultMapValues.head}")

At the output we get:

resultMap:       Map(key1 -> ValueHolder(value1,1189482436), key2 -> ValueHolder(value2,-702760039))
resultMapValues: Map(key1 -> ValueHolder(value1,-1354493526), key2 -> ValueHolder(value2,-379389312))
equality: false
simple assert for resultMap:       true
simple assert for resultMapValues: false

It is logical that the results are no longer equal, randomly. But wait, why did the second assert give false? Did the values resultMapValueschange, we didn’t do anything with them? Let's check if everything is inside there as well:

println(s"resultMapValues: $resultMapValues")
println(s"resultMapValues: $resultMapValues")

And at the output we get:

resultMapValues: Map(key1 -> ValueHolder(value1,1771067356), key2 -> ValueHolder(value2,2034115276))
resultMapValues: Map(key1 -> ValueHolder(value1,-625731649), key2 -> ValueHolder(value2,-1815306407))

Why did this happen?

Why printlnchange the value Map?
It is time to get into the documentation for the method mapValues:

/** Transforms this map by applying a function to every retrieved value.
 *  @param  f   the function used to transform values of this map.
 *  @return a map view which maps every key of this map
 *          to `f(this(key))`. The resulting map wraps the original map without copying any elements.
 */

The first line tells us what we thought - changes Map, applying the function passed in the arguments to each value. But if you read it very carefully and to the end, it turns out that it is not the return Map, but the "map view". And this is not a normal representation ( View), which you can get using the method viewand which has a method forcefor explicit calculation. And a special class (the code from Scala version 2.12.7, but for 2.11 there is almost the same):

protectedclassMappedValues[W](f: V => W) extendsAbstractMap[K, W] withDefaultMap[K, W] {
  overridedefforeach[U](g: ((K, W)) => U): Unit = for ((k, v) <- self) g((k, f(v)))
  defiterator= for ((k, v) <- self.iterator) yield (k, f(v))
  overridedefsize= self.size
  overridedefcontains(key: K) = self.contains(key)
  defget(key: K) = self.get(key).map(f)
}

If you read this code, you will see that nothing is cached, and each time the values are accessed, they will be re-calculated. What we see in our example.

If you are working with pure functions and everything is immunity with you, then you will not notice any difference. Well, maybe productivity will go down. But, unfortunately, not everything in our world is clean and perfect, and using these methods, you can step on a rake (which happened in one of our projects, otherwise there would not be this article).

Of course, we are not the first to encounter this. Already in 2011, a major bug was opened about this (and at the time of this writing, it is marked as open). It also mentions a method filterKeysthat has exactly the same problems, because it is implemented on the same principle.

In addition, since 2015, there is a ticket to add inspection to IntelliJ Idea.

What to do?

The simplest solution is to stupidly not use these methods, since the name of their behavior, in my opinion, is very unclear.

A slightly better option - call map(identity).
identityif someone does not know, this is a function from the standard library that simply returns its input argument. In this case, the main work is done by a method mapthat obviously creates a normal one Map. Check just in case:

val resultMapValues: Map[String, ValueHolder] = someMap.mapValues(foo).map(identity)
println(s"resultMapValues: $resultMapValues")
println(s"simple assert for resultMapValues: ${resultMapValues.head == resultMapValues.head}")
println(s"resultMapValues: $resultMapValues")
println(s"resultMapValues: $resultMapValues")

At the output we get

resultMapValues: Map(key1 -> ValueHolder(value1,333546604), key2 -> ValueHolder(value2,228749608))
simple assert for resultMapValues: true
resultMapValues: Map(key1 -> ValueHolder(value1,333546604), key2 -> ValueHolder(value2,228749608))
resultMapValues: Map(key1 -> ValueHolder(value1,333546604), key2 -> ValueHolder(value2,228749608))

All is well :)

If you still want to leave laziness, it is better to change the code so that it is obvious. You can make an implicit class with a wrapper method for, mapValuesand filterKeysthat gives a new, understandable name for them. Either use explicitly .viewand work with pairs iterator.

In addition, it is worthwhile to add the inspection to the development environment / rule in the static analyzer / somewhere else, which warns about the use of these methods. Because it is better to spend some time on it now than stepping on a rake and raking the consequences for a long time afterwards.

How else can you step on a rake and how have we stepped on them

In addition to the case of dependence on external conditions, which we observed in the examples above, there are other options.

For example, the changeable value (note, here, on a superficial glance, everything is "immutable"):

val someMap1 = Map("key1" -> newAtomicInteger(0), "key2" -> newAtomicInteger(0))
val someMap2 = Map("key1" -> newAtomicInteger(0), "key2" -> newAtomicInteger(0))
defincrement(value: AtomicInteger): Int = value.incrementAndGet()
val resultMap: Map[String, Int] = someMap1.map { case (k, v) => k -> increment(v) }
val resultMapValues: Map[String, Int] = someMap2.mapValues(increment)
println(s"resultMap (1):       $resultMap")
println(s"resultMapValues (1): $resultMapValues")
println(s"resultMap (2):       $resultMap")
println(s"resultMapValues (2): $resultMapValues")

This code will produce the following result:

resultMap (1):       Map(key1 -> 1, key2 -> 1)
resultMapValues (1): Map(key1 -> 1, key2 -> 1)
resultMap (2):       Map(key1 -> 1, key2 -> 1)
resultMapValues (2): Map(key1 -> 2, key2 -> 2)

When you re-apply to someMap2get a fun result.

The problems that can arise from rash use mapValuesand filterKeysyou can add a decrease in performance, an increase in memory consumption and / or an increase in the load on the GC, but this is more dependent on specific cases and may not be so critical.

In the piggy bank of a similar rake, you should also add a method toSeqfor an iterator, which returns lazy Stream.

We rake on by mapValueschance. It was used in the method that, using reflection, created a set of handlers from the config: the keys were the handler identifiers, and the value was their settings, which were then converted into the handlers themselves (the instance class was created). Since the handlers consisted only of pure functions, everything worked without problems, even the performance did not noticeably affect (after the attack on the rake, measurements were made).

But once I had to make a semaphore in one of the handlers in order for only one handler to perform a heavy function, the result of which is cached and used by other handlers. And then on the test environment, problems started - valid code, which worked locally normally, began to crash due to problems with the semaphore. Of course, the first thought with the inoperability of new functionality - is that the problems associated with it itself. Were digging around for a long time, until we came to the conclusion "some kind of game, why are different handler instances used?" and it was only found out from the gridtrack that it turned out to be a mess mapValues.

If you are working with Apache Spark, then you can stumble upon a similar problem when you suddenly discover that for some elementary piece of code .mapValuesyou can catch with

java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$2

https://stackoverflow.com/questions/32900862/map-can-not-be-serializable-in-scala
https://issues.scala-lang.org/browse/SI-7005
But it map(identity)solves the problem, but to dig deeper is usually no motivation / time.

Conclusion

Errors can lurk in the most unexpected places - even in those methods that seem 100% obvious. Specifically, this problem, in my opinion, is associated with a bad method name and an insufficiently strict type of return.

Of course, it is important to study the documentation on all the methods used by the standard library, but it is not always obvious and, frankly, there is not always enough motivation to read about “obvious things”.

Lazy calculations themselves are a cool joke, and the article does not in any way encourage them to be abandoned. However, when laziness is not obvious - this can lead to problems.

For the sake of justice, the problem with it mapValuesalready appeared on Habré in translation, but personally I have that article very poorly deposited in my head, because There were many already known / basic things and it was not completely clear what the use of these functions could be dangerous for:

The filterKeys method wraps the source table without copying any elements. There is nothing bad about it, however you hardly expect filterKeys to behave like this.

That is, there is a remark about unexpected behavior, and what can be done on the rake, if you step on it a little, it seems to be unlikely.

→ All the code from the article is in this gist

Only registered users can participate in the survey. Sign in , please.

Did you know about this behavior mapValues and filterKeys?

Is this behavior obvious if you read scaladoc (the way you read the documentation normally, and not with an overstated level of attention to detail)?

Tags: