How to quickly and easily write DSL in Ruby

Original author: Nikhil Mathew
  • Transfer
  • Tutorial
The text presented is a translation of an article from the official blog of ZenPayroll. Although I disagree with the author on some issues, the general approach and methods shown in this article may be useful to a wide range of people who write in Ruby. I apologize in advance for the fact that some bureaucratic terms could be translated incorrectly. Hereinafter in italics my notes and comments are highlighted.

In ZenPayroll, we try to hide the complexity of the task at hand. Payroll has traditionally been a bureaucratic hornet’s nest, and implementing a modern and convenient solution in such an unfriendly atmosphere is an attractive technical task that is very difficult to solve without
automation.

ZenPayroll is now creating a nationwide service (already implemented in 24 states), which means that we satisfy many requirements unique to each state. At first, we noticed that we spend a lot of time writing template code instead of concentrating on what makes each state unique. Soon, we realized that we could solve this problem by taking advantage of the creation of DSL to speed up and simplify the development process.

In this article, we will create a DSL that is as close as possible to what we use ourselves.

When do we need DSL?


Writing DSL is a huge amount of work, and it can not always help you in solving the problem. In our case, however, the advantages outweighed the disadvantages:

  1. All specific code is collected in one place.
    There are several models in our Rails application in which we must implement state-specific code. We need to generate forms, tables and manipulate mandatory information pertaining to employees, companies, filing schedules and tax rates. We make payments to government agencies, submit generated forms, calculate income tax and much more. The DSL implementation allows us to collect all the shat-specific code in one place.
  2. Standardization of the states.
    Instead of creating every new state from scratch, using DSL allows us to automate the creation of things common for states and, at the same time, allows us to flexibly configure each state.
  3. Reducing the number of places where you can make a mistake.
    Having a DSL that creates classes and methods for us, we reduce the boilerplate code and have fewer places where developers intervene. By qualitatively testing the DSL and protecting it from incorrect input data, we will greatly reduce the likelihood of an error.
  4. Possibility of rapid expansion.
    We are creating a framework that facilitates the implementation of unique requirements for new states. DSL is a set of tools that saves us time for this and allows development to move on.

Writing DSL


In this article, we will focus on creating a DSL that will allow us to store company identification numbers and payroll parameters (used to calculate taxes). Although this is just a quick look at what DSL can provide us with, it is still a complete introduction to the topic. Our final code written using the generated DSL will look something like this:

StateBuilder.build('CA') do
  company do
    edd { format '\d{3}-\d{4}-\d' }
    sos { format '[A-Z]\d{7}' }
  end
  employee do
    filing_status { options ['Single', 'Married', 'Head of Household'] }
    withholding_allowance { max 99 }
    additional_withholding { max 10000 }
  end
end  

Excellent! This is clean, understandable, and expressive code that uses an interface designed to solve our problem. Let's start.

Parameter Definition


First of all, let's decide what we want to get as a result. First question: what information do we want to store?

Each state requires companies to register with local authorities. When registering in most states, companies are given identification numbers that are required to pay taxes and file documents. At company level, we must be able to store different identification numbers for different states.

Withholding taxes are calculated based on the amount of benefits received by the employee. These are the quantities that are defined in the W-4 forms for each state. For each state, there are many questions asked to determine tax rates: your taxpayer status, related benefits, disability benefits, and more. For employees, we need a flexible method to define different attributes for each state in order to correctly calculate tax rates.

The DSL we will write will process company ID numbers and basic payroll information for employees. Next we use this tool to describe California. Since California has some additional conditions that must be considered when calculating salaries, we will concentrate on them in order to show how to develop DSL.

I provide a link to a simple Rails application so that you can follow the steps that will be taken in this article.

The following models are used in the application:

  • Company. Describes the essence of a company. Stores information about the name, type and date of foundation.
  • Employee Describes an employee working for a company. Stores information about the name, payments and date of employment.
  • CompanyStateField. Each company is associated with many CompanyStateField , each of which stores certain information related to the company and specific to the state, for example, an identification number. In California, an employer is required to have two numbers: an Employment Development Department (EDD) number and a State Secretariat (SoS) number. More information on this subject can be found here .
  • EmployeeStateField. Each employee is associated with many EmployeeStateField , each of which stores state-specific employee information. This is information that can be found on state W-4 forms, such as tax deductions or tax payer status. California DE4 requires tax deductions , withheld amounts in dollars, and taxpayer status (single, married, head of household).

We create inheritance models from CompanyStateField and EmployeeStateField models that will use the same tables as the base classes ( single table inheritance ). This allows us to identify their state-specific heirs and use only one table to store data for all such models. To do this, both tables contain serialized hashes, which we will use to store specific data. Although it will not be possible to conduct queries based on these data, this allows us not to inflate the database with unused columns.
Note translator. When using Postgres, this data can be stored in natively supported JSON.

Our application is prepared for work with the states, and now our DSL should create specific classes that implement the required functionality for California.

What will help us?


Metaprogramming is where Ruby can show itself in all its glory. We can create methods and classes directly at runtime, as well as use a huge number of metoprogramming methods, which makes creating DSL in Ruby a pleasure. Rails itself is a
DSL for building web applications and a huge amount of its “magic” is based on Ruby metaprogramming capabilities. Below I will give a short list of methods and objects that will be useful for metaprogramming.

Blocks


Blocks allow us to group code and pass it as an argument to a method. They can be described using the do end construct or braces. Both options are identical.
Note translator. According to the accepted style, do end syntax is used in multi-line constructions, and curly brackets are used in single-line constructions. There are also some differences (thanks to mudasobwa ) that are not significant in this case, but which can give you a lot of fun debugging minutes.
Restored original comment
Blocks allow us to group code and pass it as an argument to a method. They can be described using the do end construct or braces. Both options are identical.
Note translator. According to the accepted style, do end syntax is used in multi-line constructions, and curly brackets are used in single-line constructions.


You are both wrong :)

Actually, there is a difference, and it can lead to an error in the code, from which it is easy to turn gray, but which is extremely difficult to catch if you do not know what is the matter. See:
require 'benchmark'
puts Benchmark.measure { "a"*1_000_000 }
# => 0.000000   0.000000   0.000000 (  0.000427)
puts Benchmark.measure do
  "a"*1_000_000
end
# => LocalJumpError: no block given (yield)
# =>     from IRRELEVANT_PATH_TO_RVM/lib/ruby/2.0.0/benchmark.rb:281:in `measure'
# =>     from (irb):9


Cool, huh?

Think before you click:
Due to the different priority of the operators, the code of the second example is actually executed in the following sequence:
(puts Benchmark.measure) do
  # irrelevant code
end



Correct the note in the code, please. People read :)

Almost certainly you used them if you used a method like each :
[1,2,3].each { |number| puts number*2 }

This is a great thing to create DSLs because they allow us to create code in one context and execute it in another. This gives us the opportunity to create a readable DSL by taking method definitions to other classes. We will see many examples of this later.

send


The send method allows us to call object methods (even private ones), passing it the method name as a symbol. This is useful for calling methods that are usually called inside a class definition or for interpolating variables for dynamic method calls.

define_method


In Ruby, define_method gives us the ability to create methods without using the normal procedure when describing a class. It takes as arguments a string that will be the name of the method and a block that will be executed when the method is called.

instance_eval


This is the thing you need when creating a DSL, much like blocks. It takes a block and executes it in the context of the receiver object. For instance:

class MyClass
  def say_hello
    puts 'Hello!'
  end
end
MyClass.new.instance_eval { say_hello } # => 'Hello!'

In this example, the block contains a call to the say_hello method , although there is no such method in its context. The class instance returned from MyClass.new is the receiver for instance_eval and the call to say_hello occurs in its context.

class MyOtherClass
  def initialize(&block)
    instance_eval &block
  end
  def say_goodbye
    puts 'Goodbye'
  end
end
MyOtherClass.new { say_goodbye } # => 'Goodbye!'

We again describe a block that invokes a method that is not defined in its context. This time we pass the block to the constructor of the MyOtherClass class and execute it in the self context of the receiver, which is an instance of MyOtherClass . Excellent!

method_missing


This is the magic that makes find_by_ * work in Rails. Any call to an undefined method gets into method_missing , which accepts the name of the called method and all the arguments passed to it. This is another great thing for DSL, because it allows you to create methods dynamically when we don’t know what can actually be called. This gives us the opportunity to create very flexible syntax.

Design and implementation of DSL


Now that we have some knowledge about our toolbox, it's time to think about how we want to see our DSL and how they will continue to work with it. In this case, we will work “backwards”: instead of starting with creating classes and methods, we will develop the perfect syntax and build everything else around it. We consider this syntax as a sketch of what we want to get. Let's take a look at how everything should look in the end:

StateBuilder.build('CA') do
  company do
    edd { format '\d{3}-\d{4}-\d' }
    sos { format '[A-Z]\d{7}' }
  end
  employee do
    filing_status { options ['Single', 'Married', 'Head of Household'] }
    withholding_allowance { max 99 }
    additional_withholding { max 10000 }
  end
end  

Let's break it into pieces and gradually write the code that will clothe our DSL in the classes and methods we need to describe California.


If you want to follow me using the provided code, you can do git checkout step-0 and add the code along with me during the reading process.


Our DSL, which we called StateBuilder, is a class. We begin the creation of each state by calling the method of the build class with the abbreviation of the state name and the block describing it as parameters. In this block, we can call the methods that we will call company and employee and pass each of them our own configuration block, which will configure our specialized models ( CompanyStateField :: CA and EmployeeStateField :: CA )

# app/states/ca.rb
StateBuilder.build('CA') do
  company do
    # Конфигурируем CompanyStateField::CA
  end
  employee do
    # Конфигурируем EmployeeStateField::CA
  end
end  

As mentioned earlier, our logic is encapsulated in the StateBuilder class . We call the block passed to self.build in the context of the new StateBuilder instance , so company and employee must be defined and each of them must take the block as an argument. Let's start the development by creating a class disc that fits these conditions.

# app/models/state_builder.rb
class StateBuilder
  def self.build(state, &block)
    # Если не передан блок, выбрасываем исключение
    raise "You need a block to build!" unless block_given?
    StateBuilder.new(state, &block)
  end
  def initialize(state, &block)
    @state = state
    # Выполняем код переданного блока в контексте этого экземпляра StateBuilder
    instance_eval &block
  end
  def company(&block)
    # Конфигурируем CompanyStateField::CA
  end
  def employee(&block)
    # Конфигурируем EmployeeStateField::CA
  end
end  

Now we have a base for our StateBuilder . Since the company and employee methods will define the CompanyStateField :: CA and EmployeeStateField :: CA classes , let's decide how the blocks we will pass to these methods should look. We must define each attribute that our models will have, as well as some information about these attributes. What is especially nice about creating your own DSL is that we are not required to use the standard Rails syntax for getter and setter methods, as well as validations. Instead, let's implement the syntax that we described earlier.
Note translator. A controversial thought. I would still try to minimize the zoo of syntaxes within the application, albeit due to some code redundancy.


It's time to do a git checkout step-1 .


For California companies, we must keep two ID numbers: a number issued by the California Department of Employment (EDD) and a number issued by the State Secretariat (SoS).

The format of the EDD number is "### - #### - #", and the format of the SoS number is "@ #######", where @ means "any letter" and # means "any number".

Ideally, we should use the name of our attribute as the name of the method, to which we pass as a parameter a block that determines the format of this field (It seems that the time has come for method_missing !).
Note translator. Maybe something is wrong with me, but the syntax is
field name, params
It seems to me more understandable and logical than suggested by the author (compare with standard migrations). When using the author’s syntax, at first glance it’s not at all obvious that it is permissible to write any names in the blocks describing the company or employee, and you also get an excellent grenade launcher for shooting in the leg (see below).
Let's write what the calls of these methods for EDD and SoS numbers will look like.

#app/states/ca.rb
StateBuilder.build('CA') do
  company do
    edd { format '\d{3}-\d{4}-\d' }
    sos { format '[A-Z]\d{7}' }
  end
  employee do
    # Конфигурируем EmployeeStateField::CA
  end
end  

Please note that here, when describing the block, we changed the syntax from do end to curly braces, but the result did not change - we still pass the executable block of code to the function. Now let's carry out a similar procedure for employees.

According to the California Certificate of Tax Benefits, employees are asked about their taxpayer status, the number of benefits, and any other additional withheld amounts that they may have. Taxpayer status may be Single, Married, or Head of Family; tax credits should not exceed 99, and for the additional amounts withheld, let's set a maximum of $ 10,000. Now let's describe them in the same way as we did for the fields of the company.

#app/states/ca.rb
StateBuilder.build('CA') do
  company do
    edd { format '\d{3}-\d{4}-\d' }
    sos { format '[A-Z]\d{7}' }
  end
  employee do
    filing_status { options ['Single', 'Married', 'Head of Household'] }
    withholding_allowance { max 99 }
    additional_withholding { max 10000 }
  end
end  

Now we have the final implementation for California. Our DSL describes the attributes and validations for CompanyStateField :: CA and EmployeeStateField :: CA using our custom syntax.

All that remains for us is to translate our syntax into classes, getters / setters and validations. Let's implement the company and employee methods in the StateBuilder class and get working code.


Part Three of the Marleson Ballet: git checkout step-2


We implement our methods and validations by defining what to do with each of the blocks in the methods StateBuilder # company and StateBuilder # employee . Let's use an approach similar to the one we used to define StateBuilder : create a “container” that will contain these methods and execute the passed block using instance_eval in its context. Name

our containers StateBuilder :: CompanyScope and StateBuilder :: EmployeeScope and create methods in StateBuilder that instantiate these classes.

#app/models/state_builder.rb
class StateBuilder
  def self.build(state, &block)
    # Если не передан блок, выбрасываем исключение
    raise "You need a block to build!" unless block_given?
    StateBuilder.new(state, &block)
  end
  def initialize(state, &block)
    @state = state
    # Выполняем код переданного блока в контексте этого экземпляра StateBuilder
    instance_eval &block
  end
  def company(&block)
    StateBuilder::CompanyScope.new(@state, &block)
  end
  def employee(&block)
    StateBuilder::EmployeeScope.new(@state, &block)
  end
end  


#app/models/state_builder/company_scope.rb
class StateBuilder
  class CompanyScope
    def initialize(state, &block)
      @klass = CompanyStateField.const_set state, Class.new(CompanyStateField)
      instance_eval &block
    end
  end
end  


#app/models/state_builder/employee_scope.rb
class StateBuilder
  class EmployeeScope
    def initialize(state, &block)
      @klass = EmployeeStateField.const_set state, Class.new(EmployeeStateField)
      instance_eval &block
    end
  end
end  

We use const_set to define subclasses of CompanyStateField and EmployeeStateField with the name of our state. This will create the CompanyStateField :: CA and EmployeeStateField :: CA classes , each of which inherits from its respective parent.

Now we can focus on the last step: the blocks passed to each of our created attributes ( sos , edd , additional_witholding , etc.). They will be executed in the context of CompanyScope and EmployeeScope , but if we try to execute our code now, we will get errors about calling unknown methods.

We will use the method_missing method to handle these cases. In the current state, we can assume that any method called is the name of the attribute, and the blocks passed to them describe how we want to configure it. This gives us a “magical” opportunity to define the necessary attributes and save them
to the database.

Attention! Using method_missing in a way that does not provide for a situation where super can be called can lead to unexpected behavior. Typos will be difficult to track, since they will all fall into method_missing . Make sure that options are created where method_missing calls super when you write something based on these principles.
Note translator. In general, it is best to minimize the use of method_missing because it slows down the program very much. In this case, this is not critical, since all this code is executed only when the application starts


We define the method method_missing and pass these arguments to the last container we will create - AttributesScope . This container will call store_accessor and create validations based on the blocks that we pass to it.

#app/models/state_builder/company_scope.rb
class StateBuilder
  class CompanyScope
    def initialize(state, &block)
      @klass = CompanyStateField.const_set state, Class.new(CompanyStateField)
      instance_eval &block
    end
    def method_missing(attribute, &block)
      AttributesScope.new(@klass, attribute, &block)
    end
  end
end  


#app/models/state_builder/employee_scope.rb
class StateBuilder
  class EmployeeScope
    def initialize(state, &block)
      @klass = EmployeeStateField.const_set state, Class.new(EmployeeStateField)
      instance_eval &block
    end
    def method_missing(attribute, &block)
      AttributesScope.new(@klass, attribute, &block)
    end
  end
end  

Now, every time we call a method in the company block in app / states / ca.rb, it will fall into the method_missing function we defined . Its first argument will be the name of the called method, it is also the name of the attribute being defined. We create a new instance of AttributesScope , passing it the class to be changed, the name of the attribute being defined and the block that configures the attribute. In AttributesScope, we will call store_accessor , which will define the getters and setters for the attribute, and use a serialized hash to store the data.

class StateBuilder
  class AttributesScope
    def initialize(klass, attribute, &block)
      klass.send(:store_accessor, :data, attribute)
      instance_eval &block
    end
  end
end  

We also need to define the methods that we call inside the blocks that configure the attributes ( format , max , options ) and turn them into validators. We will do this by converting the calls to these methods into the validation calls that Rails expects.

class StateBuilder
  class AttributesScope
    def initialize(klass, attribute, &block)
      @validation_options = []
      klass.send(:store_accessor, :data, attribute)
      instance_eval &block
      klass.send(:validates, attribute, *@validation_options)
    end
    private
    def format(regex)
      @validation_options << { format: { with: Regexp.new(regex) } }
    end
    def max(value)
      @validation_options << {
        numericality: {
          greater_than_or_equal_to: 0,
          less_than_or_equal_to: value
        }
      }
    end
    def options(values)
      @validation_options << { inclusion: { in: values } }
    end
  end
end  

Our DSL is ready for battle. We have successfully identified the CompanyStateField :: CA model , which stores and validates EDD and SoS numbers, and the EmployeeStateField :: CA model , which stores and validates tax incentives, taxpayer status, and additional employee fees. Despite the fact that our DSL was created
to automate quite simple things, each of its components can be easily expanded. We can easily add new hooks to DSL, define more methods in models and develop it further, based on the functionality that we have implemented now.

Our implementation noticeably reduces repetition and boilerplate code in the backend, but still requires that each state has its own client-side views. We expanded our internal development to include the client side for new states, and if interest is expressed in the comments, I will write another post telling how this works for us.

This article only shows part of how we use our own DSL as an extension tool. Such tools have proven tremendous usefulness in expanding our payroll service to the rest of the USA, and if you are interested in such tasks, we can work together !

Happy metaprogramming!

Also popular now: