The disadvantages of pure functional programming
- Transfer
From the author: the translation of the article “Functional programming is unpopular because it is strange” caused a heated discussion. Several comments quite rightly noted that when discussing the shortcomings of functional programming, it would be nice to rely on modern and developed functional languages (the examples in the original article were based on C ++ templates) and that Haskell, for example, has been widely used in the industry for the last five years. In this regard, I would like to draw attention to two very substantive articles from another blog (from the author of the book F # for Scientists ): (i) “ Disadvantages of pure functional programming ” and (ii) “ Why is Haskell so little used in the industry". I’d just like to present the translation of the first of them below.
1. In pure functional languages, there is no effective disordered dictionary and set
Since the 90s, the use of dictionaries in development has broken all records. Dictionaries are now the standard collection that every developer expects to find in their standard library.
Purely functional or persistent data structures, such as those found in the well-known Okasaki monograph- a great tool. In particular, the persistence that they provide means that you can access older versions of collections without worrying about how collections are modified. In many cases (especially in tasks such as logical programming or writing compilers), this can make decisions shorter and easier — partly because it makes tracking changes trivial. But for persistence you have to pay a big price in terms of performance: purely functional dictionaries are usually about 10 times slower than well-implemented hash tables, and I have seen cases where the difference reached 40 times. For some applications, this is prohibitively slow.
Moreover, most functional languages (OCaml, Haskell, Scala) are unable to express a fast mutable generic hash table because they do not have a killer combination of reified generics , value types and a quick write barrier for the garbage collector.
ATTENTION! Beware of people trying to claim that Haskell’s purely functional dictionaries are fast, comparing them to mutable hash tables of the same Haskell . The correct conclusion from this comparison is that Haskell's mutable hash tables are slow.
2. There is no purely functional weak hash table (weak hash table)
In an imperative garbage collection language, relationships between nodes and edges of a graph can be expressed using weak hash tables . The garbage collector will collect unreachable subgraphs for you. Since there are no purely functional weak hash tables, in a pure functional language you have to write your garbage collection ( approx. Transl., As I understand it, in the sense of deleting unreachable subgraphs with your hands).
However, this is not the most serious drawback, given that most developers have never used weak hash tables ...
3. There are no purely functional thread safe collections
By definition, immutable collections cannot support mutability, including thread-safe. So if you want a shared mutable collection (such as an in-memory database), then a purely functional solution does not exist.
4. Most graph algorithms look worse and work much slower if written in a functional style.
Functional programming is a great tool for some types of tasks, but, according to my observations, graph algorithms are a place where purely functional solutions are often worse both in terms of code simplicity and performance.
Compare the Prim algorithm on 12 lines of Python and the Prim algorithm on 20 lines of Haskell . And why, in fact, does Haskell use the Prim algorithm in the library? Probably because the Kruskal algorithm is based on the data structure of the " system of disjoint sets " ( union-find collection , aka Disjoint-set data structure), and functional languages do not have its effective implementation.
5. The inertia of traditional imperative data structures and algorithms is huge
Besides graph algorithms, there are many areas of computer science in which 65 years of scientific literature have focused almost entirely on imperative solutions. Consequently, programmers in imperative languages can stand on the shoulders of giants and take advantage of these developments, while programmers in functional languages often have to start from scratch. After all, just a couple of years ago memoization in Haskell was the subject of a Ph.D. thesis.
I once suggested that several Haskell programmers (and some of them had dissertations in this language) write effective parallel quicksort on it — and this is what came of it .
6. As it turns out, all existing implementations of functional languages — both pure and impure — are designed to allocate too much memory.
Around 1960, McCarthy invented Lisp. The main data structure was a singly linked list. Each list item was allocated separately on the heap. All modern functional languages evolved from this idea. In the 70s, Scheme used essentially the same data presentation strategy as Lisp. In the 80s, SML added a bit of unboxing due to the inclusion of tuples in the language, allocated on the heap as solid blocks of memory. In the 90s, OCaml added a little more unpacking due to the inclusion of unpacked arrays of real numbers in the language. Haskell added the ability to unpack certain data types. But to date, no functional language has by default unpacked tuples .Apparently, the author chose for some reason this way of saying "tuples located by default on the stack"). Even .Net-based F #, in which any value types can be created, still uses the packed .Net tuples. Consequently, all modern functional languages bear the burden of frequent allocations of memory on the heap without any need. Consequently, they load their garbage collectors much more than necessary. This is a serious problem, not only because it makes single-threaded code slow, but also because the garbage collector is a shared resource and, as a result, the load on the garbage collector leads to poor scalability of parallel programs.
It should be noted that declaring such behavior as a “flaw” can be controversial. Xavier Leroy from the OCaml community believes that presenting data in OCaml in the Lisp image is an advantage because it is the basis for OCaml's excellent performance in an environment for automatically proving Coq theorems . Xavier states that “Ocaml’s strategy for symbolic computing is near optimal.” Functional languages are often optimized for high-performance purely functional collections due to the low performance of imperative collections. Given that imperative collections are mostly faster, this leads to an artificially low performance ceiling for almost all functional languages.
7. Pure functional programming is good for concurrency in theory, but not very good in terms of performance in practice, and high performance in practice is the only true concurrency problem
Today, there are two reasons to write parallel programs. The first is to write objectively quick decisions. The second is to make slow decisions not so slow. In most cases, parallel programming in a functional language is a variation of the second reason. Almost no one in the environment of high performance computing, that is, on supercomputers, does not directly run functional code. If a developer in a functional language parallelizes a program, in most cases he doesn’t do it in order to achieve excellent absolute performance, but to improve the speed that he has.
Pure functional languages like Haskell are designed to hide memory and time from the programmer. This gives the programmer a bird's eye view of the program, but it makes it very difficult to estimate how much memory the Haskell program will consume and how long it will run before it gets the result. In parallel programming, it is always very important to make sure that the gain from parallelization outweighs the administrative overhead of parallel code execution. In Haskell, this is very difficult. In fact, it’s so hard that the published study on the parallel execution of Haskell is very selectiveprovides results on the degree of parallelization, which maximizes performance, despite the fact that this degree cannot be predicted in advance without having to run the program many times. In my experience, in languages like C ++, parallelization with the most standard methods often gives predictable speed gains, unlike Haskell , where performance is not predictable.
ATTENTION! Beware of people who talk about program scalability without regard to absolute performance. You can improve the scalability of almost any parallel program by meaninglessly computing Mandelbrot sets after each line of code for no reason, because most of the calculations will take place in incredibly parallel code. Scalability is a necessary but not sufficient condition. You must definitely look at absolute performance.
8. It took 50 years to dilute the concentration of arrogant snobs in the community to the extent that you could get a useful answer on functional programming
I have been involved in functional programming for over 20 years. For decades, there has been a social divide between programmers in functional languages and those who have solved real problems. Thank goodness this problem started to resolve along with the way functional languages like Scala, Clojure and F # began to be used for real tasks; but for many years it was arrogant snobs that dominated the functional programming community, which made it very difficult to get real solutions to real problems. The reason for this was that many communities (for decades Lisp), oblivious and left to their own devices, had very developed (but incorrect) arguments about why their language (Lisp) is so good. It took me many years to understand what was wrong with these arguments.
9. A lot of false information circulates about functional programming.
If you criticize the performance of hash tables in Haskell (more recent criticism here ), then you can get absolutely wild tips from community luminaries — for example, they can literally advise you to simply turn off garbage collection.
For a long time, the functional programming community was shocked by the remarkably short implementations of the sieve algorithms of Eratosthenes and quicksort. They have been taught to students for years. And only after many years, the understanding came that these implementations do not correspond to the original algorithms. Melissa O'Neill (Melissa O'Neill) even published a scientific article correcting the sieve of Eratosthenes in Haskell. In particular, her real sieve requires a hundredtimes more code than the erroneous original version. The same is true for quicksort, where the “elegant” two-line version in Haskell is more than 1000 times slower than the Sajwick version in C, because Haskell makes a deep copy of the lists on each quicksort call, damaging the asymptotic complexity of the original Hoar algorithm .
See also “ Why Haskell is Used So Little in the Industry, ” which details the Haskell in Industry page .
From the author: I also hope to someday translate the article “Why is Haskell so little used in the industry”, because it is trying to refute the opinion that Haskell has been widely used in the industry for the past five years. But it should be noted that (judging by the comments on the original version in English), it is not so straightforward and the author himself seems to exaggerate in some places.
1. In pure functional languages, there is no effective disordered dictionary and set
Since the 90s, the use of dictionaries in development has broken all records. Dictionaries are now the standard collection that every developer expects to find in their standard library.
Purely functional or persistent data structures, such as those found in the well-known Okasaki monograph- a great tool. In particular, the persistence that they provide means that you can access older versions of collections without worrying about how collections are modified. In many cases (especially in tasks such as logical programming or writing compilers), this can make decisions shorter and easier — partly because it makes tracking changes trivial. But for persistence you have to pay a big price in terms of performance: purely functional dictionaries are usually about 10 times slower than well-implemented hash tables, and I have seen cases where the difference reached 40 times. For some applications, this is prohibitively slow.
Moreover, most functional languages (OCaml, Haskell, Scala) are unable to express a fast mutable generic hash table because they do not have a killer combination of reified generics , value types and a quick write barrier for the garbage collector.
ATTENTION! Beware of people trying to claim that Haskell’s purely functional dictionaries are fast, comparing them to mutable hash tables of the same Haskell . The correct conclusion from this comparison is that Haskell's mutable hash tables are slow.
2. There is no purely functional weak hash table (weak hash table)
In an imperative garbage collection language, relationships between nodes and edges of a graph can be expressed using weak hash tables . The garbage collector will collect unreachable subgraphs for you. Since there are no purely functional weak hash tables, in a pure functional language you have to write your garbage collection ( approx. Transl., As I understand it, in the sense of deleting unreachable subgraphs with your hands).
However, this is not the most serious drawback, given that most developers have never used weak hash tables ...
3. There are no purely functional thread safe collections
By definition, immutable collections cannot support mutability, including thread-safe. So if you want a shared mutable collection (such as an in-memory database), then a purely functional solution does not exist.
4. Most graph algorithms look worse and work much slower if written in a functional style.
Functional programming is a great tool for some types of tasks, but, according to my observations, graph algorithms are a place where purely functional solutions are often worse both in terms of code simplicity and performance.
Compare the Prim algorithm on 12 lines of Python and the Prim algorithm on 20 lines of Haskell . And why, in fact, does Haskell use the Prim algorithm in the library? Probably because the Kruskal algorithm is based on the data structure of the " system of disjoint sets " ( union-find collection , aka Disjoint-set data structure), and functional languages do not have its effective implementation.
5. The inertia of traditional imperative data structures and algorithms is huge
Besides graph algorithms, there are many areas of computer science in which 65 years of scientific literature have focused almost entirely on imperative solutions. Consequently, programmers in imperative languages can stand on the shoulders of giants and take advantage of these developments, while programmers in functional languages often have to start from scratch. After all, just a couple of years ago memoization in Haskell was the subject of a Ph.D. thesis.
I once suggested that several Haskell programmers (and some of them had dissertations in this language) write effective parallel quicksort on it — and this is what came of it .
6. As it turns out, all existing implementations of functional languages — both pure and impure — are designed to allocate too much memory.
Around 1960, McCarthy invented Lisp. The main data structure was a singly linked list. Each list item was allocated separately on the heap. All modern functional languages evolved from this idea. In the 70s, Scheme used essentially the same data presentation strategy as Lisp. In the 80s, SML added a bit of unboxing due to the inclusion of tuples in the language, allocated on the heap as solid blocks of memory. In the 90s, OCaml added a little more unpacking due to the inclusion of unpacked arrays of real numbers in the language. Haskell added the ability to unpack certain data types. But to date, no functional language has by default unpacked tuples .Apparently, the author chose for some reason this way of saying "tuples located by default on the stack"). Even .Net-based F #, in which any value types can be created, still uses the packed .Net tuples. Consequently, all modern functional languages bear the burden of frequent allocations of memory on the heap without any need. Consequently, they load their garbage collectors much more than necessary. This is a serious problem, not only because it makes single-threaded code slow, but also because the garbage collector is a shared resource and, as a result, the load on the garbage collector leads to poor scalability of parallel programs.
It should be noted that declaring such behavior as a “flaw” can be controversial. Xavier Leroy from the OCaml community believes that presenting data in OCaml in the Lisp image is an advantage because it is the basis for OCaml's excellent performance in an environment for automatically proving Coq theorems . Xavier states that “Ocaml’s strategy for symbolic computing is near optimal.” Functional languages are often optimized for high-performance purely functional collections due to the low performance of imperative collections. Given that imperative collections are mostly faster, this leads to an artificially low performance ceiling for almost all functional languages.
7. Pure functional programming is good for concurrency in theory, but not very good in terms of performance in practice, and high performance in practice is the only true concurrency problem
Today, there are two reasons to write parallel programs. The first is to write objectively quick decisions. The second is to make slow decisions not so slow. In most cases, parallel programming in a functional language is a variation of the second reason. Almost no one in the environment of high performance computing, that is, on supercomputers, does not directly run functional code. If a developer in a functional language parallelizes a program, in most cases he doesn’t do it in order to achieve excellent absolute performance, but to improve the speed that he has.
Pure functional languages like Haskell are designed to hide memory and time from the programmer. This gives the programmer a bird's eye view of the program, but it makes it very difficult to estimate how much memory the Haskell program will consume and how long it will run before it gets the result. In parallel programming, it is always very important to make sure that the gain from parallelization outweighs the administrative overhead of parallel code execution. In Haskell, this is very difficult. In fact, it’s so hard that the published study on the parallel execution of Haskell is very selectiveprovides results on the degree of parallelization, which maximizes performance, despite the fact that this degree cannot be predicted in advance without having to run the program many times. In my experience, in languages like C ++, parallelization with the most standard methods often gives predictable speed gains, unlike Haskell , where performance is not predictable.
ATTENTION! Beware of people who talk about program scalability without regard to absolute performance. You can improve the scalability of almost any parallel program by meaninglessly computing Mandelbrot sets after each line of code for no reason, because most of the calculations will take place in incredibly parallel code. Scalability is a necessary but not sufficient condition. You must definitely look at absolute performance.
8. It took 50 years to dilute the concentration of arrogant snobs in the community to the extent that you could get a useful answer on functional programming
I have been involved in functional programming for over 20 years. For decades, there has been a social divide between programmers in functional languages and those who have solved real problems. Thank goodness this problem started to resolve along with the way functional languages like Scala, Clojure and F # began to be used for real tasks; but for many years it was arrogant snobs that dominated the functional programming community, which made it very difficult to get real solutions to real problems. The reason for this was that many communities (for decades Lisp), oblivious and left to their own devices, had very developed (but incorrect) arguments about why their language (Lisp) is so good. It took me many years to understand what was wrong with these arguments.
9. A lot of false information circulates about functional programming.
If you criticize the performance of hash tables in Haskell (more recent criticism here ), then you can get absolutely wild tips from community luminaries — for example, they can literally advise you to simply turn off garbage collection.
For a long time, the functional programming community was shocked by the remarkably short implementations of the sieve algorithms of Eratosthenes and quicksort. They have been taught to students for years. And only after many years, the understanding came that these implementations do not correspond to the original algorithms. Melissa O'Neill (Melissa O'Neill) even published a scientific article correcting the sieve of Eratosthenes in Haskell. In particular, her real sieve requires a hundredtimes more code than the erroneous original version. The same is true for quicksort, where the “elegant” two-line version in Haskell is more than 1000 times slower than the Sajwick version in C, because Haskell makes a deep copy of the lists on each quicksort call, damaging the asymptotic complexity of the original Hoar algorithm .
See also “ Why Haskell is Used So Little in the Industry, ” which details the Haskell in Industry page .
From the author: I also hope to someday translate the article “Why is Haskell so little used in the industry”, because it is trying to refute the opinion that Haskell has been widely used in the industry for the past five years. But it should be noted that (judging by the comments on the original version in English), it is not so straightforward and the author himself seems to exaggerate in some places.