houk August 8, 2017 at 20:48

Advanced Enumerations with Ruby

Transfer

Enumeration, by definition, is "the act of mentioning a certain amount of something one by one." In programming, instead of mentioning, we choose any action that we want to perform, whether it is simple output to a monitor or performing some kind of selection and / or conversion on an element.

In programming, we have many ways to select and process a collection per unit of time, by adding an additional transformation function to the chain at each step. And each step can either consume the entire collection before transferring the processing results to the next step, or it can process the collection “lazily”, passing one or more elements of the collection through all the transformation steps.

How Ruby Enumerations Work

In this post, I'll give you an overview of what makes the block ( block ) and a yield statement 's. The blocks in Ruby that we are interested in are pieces of code defined inside methods or proc / lambda . You can think of yield as the current block of code, where another block of code is inserted from somewhere else. Let me show you.

def my_printer
  puts "Hello World!"
end
def thrice
  3.times do
    yield
  end
end
thrice &method(:my_printer)
# Hello World!
# Hello World!
# Hello World!
thrice { puts "Ruby" }
# Ruby
# Ruby
# Ruby

Methods take two forms of blocks for the yield command : proc or blocks. The method method transforms the method definition into proc , which can then be passed inside as a block , as in the my_printer example above.

Above in the place where the yield command is written , it is equivalent, as if the code passed as a block would be in place of yield . So in the first case, imagine a yield call replaced by puts “Hello World!” and the second yield replaced by puts “Ruby” .

yieldmay also work as a simple enumerator. You can pass any value inward as a parameter to block / proc by adding them after yield .

def simple_enum
  yield 4
  yield 3
  yield 2
  yield 1
  yield 0
end
simple_enum do |value|
  puts value
end
# 4
# 3
# 2
# 1
# 0

Minimum Enumerator Requirements

The standard way to create an enumerator in Ruby is each , yields / yields values. With this in mind, you can declare each method on any Ruby object and get all the benefits of more than 50 methods for processing and executing collections from the Enumerable module . Just add include Enumerable inside the object that has a valid each method , and you can fully use all of these methods (referring to the methods of the Enumerable module).

Enumerators are not limited to simple collections of the Array type (arrays) , they can work with any collections that have the declared method each (and will usually have an Enumerable module in its “progenitors / ancestors”).

Array.ancestors
# => [Array, Enumerable, Object, Kernel, BasicObject]
Hash.ancestors
# => [Hash, Enumerable, Object, Kernel, BasicObject]
Hash.method_defined? :each
# => true
require "set"
Set.ancestors
# => [Set, Enumerable, Object, Kernel, BasicObject]

"Lazy" and not "lazy" enumerators

Lazy enumerators are usually considered the best way to process collections, as they allow you to bypass a step by step endless sequence, as far as you need.

Imagine an assembly line where people collect pizza, where each person is responsible for only one step in the preparation / transformation of pizza. The first person throws the dough of the correct form, the next one adds the sauce, the next cheese, one person for each addition (sausages, peppers, tomatoes), another puts everything in the oven, and the last person delivers the finished pizza to you. In this example, the “lazy" version of the Ruby build is to have any number of pizza orders, but all the rest will wait until the first pizza goes through all the processing steps / steps before continuing to make the next pizza.

If you are not using a “lazy” enumerator, then each step / stage could wait until the whole collection has completed one step per unit of time. For example, if you have 20 orders for pizza, the person who throws the dough will have to make 20 of them before they can add the sauce to one of them, the next person on the line. And each step in the line awaits in a similar manner. Now, the more collection you need to process, the more ridiculous it seems to keep waiting for the remaining assembly line.

A more vital example: processing a list of letters that needs to be sent to all users. If the code contains an error and the entire list is not processed “lazily,” then it is likely that no one will receive emails. In the case of a "lazy" execution, potentially, you would send letters to most users before, say, an incorrect mailing address would cause a problem / error. If the sending record contains the success status of the sending, it is easier to track on which record (where) an error occurred.

Creating a lazy enumerator in Ruby is as simple as calling lazy on an object with the Enumerable module included in it or calling to_enum.lazy on an object with the each method declared inside it .

class Thing
  def each
    yield "winning"
    yield "not winning"
  end
end
a = Thing.new.to_enum.lazy
Thing.include Enumerable
b = Thing.new.lazy
a.next
# => "winning"
b.next
# => "winning"

The to_enum call returns an object that is both an Enumerator and an Enumerable , and which will have access to all their methods.

It is important to pay attention to which methods of the enumerator will "consume" the entire collection at a time, and which will "consume" (execute) it - "lazily." For example, the partition method consumes the entire collection at a time, so it is not acceptable for infinite collections. A better choice for lazy execution would be methods like chunk or select .

x = (0..Flot::INFINITY)
y = x.chunk(&:even?)
# => #:each>>
y.next
# => [true, [0]]
y.next
# => [false, [1]]
y.next
#=> [true, [2]]
z = x.lazy.select(&:even?)
# => #:select>
z.next
# => 0
z.next
# => 2
z.next
# => 4

In the case of using select with infinite collections, you must first invoke the lazy method to prevent the entire collection from being consumed by the select method , and force the program to terminate due to its endless execution.

Creating a Lazy Enumerator

Ruby has an Enumerator :: Lazy class that allows you to write your own enumerator methods like take in Ruby.

(0..Float::INFINITY).take(2)
# => [0, 1]

For a good example, we implement FizzBuzz, which can be run on any number and which will allow you to get endless FizzBuzz results.

def divisible_by?(num)
  ->input{ (input % num).zero? }
end
def fizzbuzz_from(value)
  Enumerator::Lazy.new(value..Float::INFINITY) do |yielder, val|
    yielder << case val
    when divisible_by?(15)
      "FizzBuzz"
    when divisible_by?(3)
      "Fizz"
    when divisible_by?(5)
      "Buzz"
    else
      val
    end
  end end
x = fizzbuzz_from(7)
# => #
9.times { puts x.next }
# 7
# 8
# Fizz
# Buzz
# 11
# Fizz
# 13
# 14
# FizzBuzz

With Enumerator :: Lazy , it doesn't matter what you feed to yielder - there will be a value returned at each step in the sequence. The enumerator monitors the current progress when you use next . But when you call each after several calls to next , it will start from the very beginning of the collection.

The parameter that you pass to Enumerator :: Lazy.new is the collection through which we will go through the enumerator. If you wrote this method for an Enumerable or compatible object, you can simply pass self as a parameter. valwill be the only value produced per unit by the collection method each , and yielder will be the only entry point for any block of code that you want to pass as if it were with each .

Advanced Enumerator Use

When processing a data collection, it is recommended that you first set restrictive filters, and then processing your data with a code will take much less time. If you receive data for processing from a database, set restriction filters in the internal language of the database, if possible, before transferring the data further to Ruby. So it will be much more effective.

require "prime"
x = (0..34).lazy.select(&Prime.method(:prime?))
x.next
# => 2
x.next
# => 3
x.next
# => 5
x.next
# => 7
x.next
# => 11

After the select method above, you can add other methods to the data processing. These methods will only deal with a limited set of data inside primes, and not with all primes.

Grouping

One great way to process the data into columns is to use group_by to convert the result to an associative array of groups. After that, just pull the results as if you were interested in all the results.

[0,1,2,3,4,5,6,7,8].group_by.with_index {|_,index| index % 3 }.values
# => [[0, 3, 6], [1, 4, 7], [2, 5, 8]]

If you display the results on a web page, the data will be arranged in the following order:

0    3    6
1    4    7
2    5    8

The group_by call above passes both the value and the index into the block. We use underscore for the value from the array to indicate that we are not interested in this value, but only in the index. What we get as a result is an associative array with keys 0, 1, and 2 pointing to each group of values that we have grouped. Since we do not have to worry about keys, we call values on this associative array to get an array of arrays and then display as we need.

If we wanted to arrange the results from left to right in columns, we could do the following:

threes = (0..2).cycle
[0,1,2,3,4,5,6,7,8].slice_when { threes.next == 2 }.to_a
# => [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

The threes enumerator just goes endlessly from 0 to 2, in a "lazy" manner. As a result, we get the following conclusion:

0    1    2
3    4    5
6    7    8

Ruby also has a transpose method that flips the results above from one view to another.

x = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
x = x.transpose
# => [[0, 3, 6], [1, 4, 7], [2, 5, 8]]
x = x.transpose
# => [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

"Folding"

Let's look at ways to assemble collections into a result. In other languages, this is usually done through the fold method . In Ruby, this has been done for a long time with reduce and inject . A more recent solution, and a preferred way of doing this with each_with_object . The main idea is to process one collection into another, acting as a result.

Summing integers is as simple as:

[1,2,3].reduce(:+)
# => 6
[1,2,3].inject(:+)
# => 6
class AddStore
  def add(num)
    @value = @value.to_i + num
  end
  def inspect
    @value
  end
end
[1,2,3].each_with_object(AddStore.new) {|val, memo| memo.add(val) }
# => 6
# As of Ruby 2.4
[1,2,3].sum
# => 6

each_with_object usually requires an object that can be updated. You cannot change an integer object from itself, which is why for this trivial example we created an AddStore object.

These methods can be better demonstrated in the work if you take data from one collection and put them in another collection. Note that inject and reduce are the same method aliases in Ruby and must return this value at the end of the block in order to continue to build an enumerator based on it.

each_with_object does not need the last piece of the block to return the element on which it will further build the enumerator.

collection = [:a, 2, :p, :p, 6, 7, :l, :e]
collection.reduce("") { |memo, value|
  memo << value.to_s if value.is_a? Symbol
  memo # Note the return value needs to be the object/collection we're building
}
# => "apple"
collection.each_with_object("") { |value, memo|
  memo << value.to_s if value.is_a? Symbol
}
# => "apple"

Structures

Structure objects in Ruby are also enumerated objects that you can use to create convenient objects to describe methods in them.

class Pair < Struct.new(:first, :second)
  def same?;    inject(:eql?)  end
  def add;      inject(:+)     end
  def subtract; inject(:-)     end
  def multiply; inject(:*)     end
  def divide;   inject(:/)     end
  def swap!
    members.zip(entries.reverse) {|a,b| self[a] = b}
  end
end
x = Pair.new(23, 42)
x.same?
# => false
x.first
# => 23
x.swap!
x.first
# => 42
x.multiply
# => 966

Structures are usually not used for large collections, but are used more as useful data objects, as a way to organize data together, which in turn encourages the transparent use of data rather than overgrown data.

Overgrowth of data is when two or more variables are always used in a group, but they would not make any sense if they were used separately. This group of variables should be grouped into an object / class.

So, structures in Ruby are usually small collections of data, but nothing suggests that the data alone could represent a completely different collection of data. In this case, the structure may be a way of implementing transformations over the same data collections, which is most likely you would do the same by writing your own class.

To summarize

Ruby is a pretty wonderful language that makes it easy to work as well as manipulate data collections. Examining every bit of what Ruby offers will allow you to write more elegant code, as well as test and optimize the code for better performance.

If performance is important, then measure the performance of individual implementations and be sure to set filters and restrictions / limits in the processing process, as earlier, if of course there is a possibility. Consider restricting your input to small chunks using the readline method on files rather than read or readlines, or use the LIMIT number in SQL.

Lazy iteration can be of great help in dividing tasks into different threads or background processing jobs. The concept of “lazy” iterations actually has no shortcomings, so you can choose them to consume any collection anywhere. It offers flexibility, and some languages, like Rust with iterators, have taken the standard of being “lazily” implemented.

The possibilities are endless when we are faced with how to manipulate and manage data. And it's a fun process to learn and create every way you manipulate data sets in programming. Ruby has well-documented examples for each of its enumerablemethods, so that it helps to learn from them. I support you to experiment and discover many new things that will help make the programming process more enjoyable and enjoyable.

Tags: