Subtleties of Scala: learning CanBuildFrom

image


In the Scala standard library, collection methods ( map , flatMap , scan, and others) accept an instance of type CanBuildFrom as an implicit parameter. In this article we will analyze in detail why this trait is needed, how it works and how the developer can be useful.



How it works


The main purpose that CanBuildFrom serves is to provide the compiler with a result type for the map , flatMap and the like methods , as indicated , for example, by defining a map in the TraversableLike tray :


def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That

The method returns an object of type That , which appears in the description only as a parameter of CanBuildFrom . Suitable instance CanBuildFrom selected compiler based on the type of the original collection Repr types and user-defined function result B . The selection is made from the set of values ​​declared in the Predef object and companions of the collections (the rules for choosing implicit values ​​deserve a separate article and are described in detail in the language specification ).


In fact, when using CanBuildFrom , the same result type inference occurs as in the case of the simplest parameterized method:


scala> def f[T](x: List[T]): T = x.head
f: [T](x: List[T])T
scala> f(List(3))
res0: Int = 3
scala> f(List(3.14))
res1: Double = 3.14
scala> f(List("Pi"))
res2: String = Pi

That is, when called


List(1, 2, 3).map(_ * 2)

the compiler will select an instance of CanBuildFrom from the GenTraversableFactory class , which is described as follows:


class GenericCanBuildFrom[A] extends CanBuildFrom[CC[_], A, CC[A]]

and will return a collection of the same type but with elements received from the user-defined function: CC [A] . In other cases, the compiler can choose a more suitable type of result, for example, for strings:


scala> "abc".map(_.toUpper) // Predef.StringCanBuildFrom
res3: String = ABC
scala> "abc".map(_ + "*") // Predef.fallbackStringCanBuildFrom[String]
res4: scala.collection.immutable.IndexedSeq[String] = Vector(a*, b*, c*)
scala> "abc".map(_.toInt) // Predef.fallbackStringCanBuildFrom[Int]
res5: scala.collection.immutable.IndexedSeq[Int] = Vector(97, 98, 99)

In the first case, StringCanBuildFrom is selected , the result is String :


implicit val StringCanBuildFrom: CanBuildFrom[String, Char, String]

In the second and third - fallbackStringCanBuildFrom method , the result is IndexedSeq :


implicit def fallbackStringCanBuildFrom[T]: CanBuildFrom[String, T, immutable.IndexedSeq[T]]

Using breakOut


Consider using the Map class . It is easy to convert a collection of this type to Iterable if you return not a pair from the conversion function, but a single value:


scala> Map(1 -> "a", 2 -> "b", 3 -> "c").map(_._2)
res6: scala.collection.immutable.Iterable[String] = List(a, b, c)

But to get a Map from the list of pairs you need to call the toMap method :


scala> List('a', 'b', 'c').map(x => x.toInt -> x)
res7: List[(Int, Char)] = List((97,a), (98,b), (99,c))
scala> List('a', 'b', 'c').map(x => x.toInt -> x).toMap
res8: scala.collection.immutable.Map[Int,Char] = Map(97 -> a, 98 -> b, 99 -> c)

Or use the breakOut method instead of the implicit parameter:


scala> import collection.breakOut
import collection.breakOut
scala> List('a', 'b', 'c').map(x => x.toInt -> x)(breakOut)
res9: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((97,a), (98,b), (99,c))

The method, as the name implies, allows you to "break out" of the boundaries of the type of the original collection and give the compiler more freedom in choosing the CanBuildFrom instance :


def breakOut[From, T, To](implicit b: CanBuildFrom[Nothing, T, To]): CanBuildFrom[From, T, To]

The description shows that breakOut does not specialize in any of the three parameters, which means that it can be used instead of any CanBuildFrom instance . BreakOut itself implicitly accepts an object of type CanBuildFrom , but the From parameter in this case is replaced by Nothing , which allows the compiler to use any available instance of CanBuildFrom (this happens because the From parameter is declared as contravariant, and the Nothing type is a descendant of any type.)


In other words, breakOut provides an additional “layer” that allows the compiler to choose from all available CanBuildFrom implementations , and not just those that are valid for the type of the source collection. In the example above, this makes it possible to use CanBuildFrom from the Map companion , despite the fact that we originally worked with List . Another example is getting a string from a list of characters:


scala> List('a', 'b', 'c').map(_.toUpper)
res10: List[Char] = List(A, B, C)
scala> List('a', 'b', 'c').map(_.toUpper)(breakOut)
res11: String = ABC

The implementation of CanBuildFrom [String, Char, String] is declared in Predef and therefore takes precedence over declarations in companion collections.


Future List Usage Example


As a small example of using CanBuildFrom, we will write an implementation that will automatically collect the Future list into a single object, as Future.sequence does :


List[Future[T]] -> Future[List[T]]

To get started, take a look inside CanBuildFrom . The trait declares two abstract apply methods that the builder of the new collection returns based on the results of the user-defined function:


def apply(): Builder[Elem, To]
def apply(from: From): Builder[Elem, To]

Therefore, to provide your own implementation of CanBuildFrom , you need to prepare Builder , in which to implement methods for adding an element, clearing the buffer and obtaining the result:


class FutureBuilder[A] extends Builder[Future[A], Future[Iterable[A]]] {
  private val buff = ListBuffer[Future[A]]()
  def +=(elem: Future[A]) = { buff += elem; this }
  def clear = buff.clear
  def result = Future.sequence(buff.toSeq)
}

The CanBuildFrom implementation itself is trivial:


class FutureCanBuildFrom[A] extends CanBuildFrom[Any, Future[A], Future[Iterable[A]]] {
  def apply = new FutureBuilder[A]
  def apply(from: Any) = apply
}
implicit def futureCanBuildFrom[A] = new FutureCanBuildFrom[A]

We check:


scala> Range(0, 10).map(x => Future(x * x))
res12: scala.concurrent.Future[Iterable[Int]] = scala.concurrent.impl.Promise$DefaultPromise@360e2cfb

Everything is working! Thanks to the futureCanBuildFrom method , we got directly Future [Iterable [Int]] , i.e. Conversion of the staging collection was done automatically.


Warning: this is just an example of using CanBuildFrom , I am not saying that such a solution should be used in your combat code or that it is better than the usual wrapping in Future.sequence . Be careful not to copy the code into your project without a preliminary analysis of the consequences!


Conclusion


Using CanBuildFrom is closely related to implicit parameters, so a clear understanding of the logic of choosing values ​​will save you from wasting time while debugging - do not be too lazy to look into the language specification or Scala FAQ . The compiler can also help and show what implicit values ​​were selected if you build a program with the -Xprint: typer flag - this saves a lot of time.


CanBuildFrom is a very specific thing and you most likely will not have to work closely with it unless you are developing new data structures. Nevertheless, understanding the principles of its work will not be superfluous and will allow better understanding of the internal structure of the standard library.


That's all, thanks and success in learning Scala!


Corrections and additions to the article, as always, are welcome.


Also popular now: