EvoJ - a convenient framework for genetic algorithms
Hello colleagues!
Articles on the topic of genetic algorithms often appear here, and let me make my five cents.
For a couple of years now, I have been developing a Java framework for EvoJ dedicated to GA as a hobby. When I first started working with GA, the biggest inconvenience was the need to vectorize the variables that make up the solution, so in my framework I tried to make the vectorization of variables transparent for the programmer, putting all the dirty work on the shoulders of the framework. In addition, since GA is very well parallelizable, I tried to make the transition to multithreading no less easy.
To understand the basic idea of EvoJ, consider the following example. For simplicity, let’s say that we need to find the minimum of the function
So, first of all, we need to create a Java interface containing the variables we need. We have two variables: X and Y, create an interface containing getters for these variables.
Please note that setters are not required to be declared; EvoJ will be able to change variables without them. However, if you want to implement your own mutation strategy, you will have to declare setters - otherwise you yourself will not be able to change the variables.
Pay attention to the annotation
Another point worth paying attention to here is that all the parameters of all annotations in EvoJ are strings, this allows you to either specify the parameter value directly or specify the name property instead of a specific value, so that you do not hard-code the value of the annotation parameters in compile- time.
So, we have an interface with variables, let's write a fitness function. Fitness function in EvoJ is presented by the interface:
Here the parameter
It is recommended to implement this interface indirectly, through helper classes that take on some service functions: destruction of old solutions (if the maximum lifetime of the solution is specified), caching of the function value for solutions that were not eliminated at the previous iteration of the GA.
The fitness function for our case will look like this:
Everything is pretty obvious here, we just take and count our function using the variables from our first interface. Since we are looking for a minimum, and the class contract assumes that the best solutions should have a higher rating, then we return the value of the function times -1.
In addition, we filter out the left decisions (if NaN turned out) returning
Well, we have two main components, let's make it all work.
Let's break down the code above into lines:
This factory will give us many initial solutions, if you remember, our solution is represented by an interface, not a class, so we cannot create an instance on our own.
Here, a population of 200 random solutions (gene pool) is created. Pay attention to the last parameter - it should be here with the values of the properties, if you did not specify the annotation parameters explicitly.
Handler is the embodiment of a genetic algorithm, it is a combination of mutation, crossbreeding, selection and rating strategies. Of these, it is necessary to implement only the rating calculation on your own - for everything else, there are standard implementations that will be used if, instead of the corresponding parameter, you pass
This line performs 200 iterations of GA over our population. Since the task is quite primitive, 200 iterations is quite enough to find the minimum of our function. (Finding a solution for more complex tasks may require several hours and millions of iterations, fortunately, the selection of a solution can be accelerated using multithreading, but more on that later)
Here we get the best solution found. On an instance of this interface, you can freely call it
If the solution does not suit you, you can continue the iteration of the GA until the desired quality of the solution is achieved.
Thus, to solve the problem using EvoJ, you need:
Well, it was a warm-up, now let's see what the EvoJ really can.
Suppose we want to approximate a full-color image with a limited set of polygons. The solution to such a problem cannot be described by a single interface with simple variables, and this is where the full power of EvoJ is manifested. We can declare the following set of interfaces:
At the top of the interface is the whole picture, it consists of a single property - a list of polygons. Each polygon is a color plus a list of points. A point is two coordinates, a color is three components.
Of the new annotations, it is worth mentioning
Also note that the variables describing the color components have a fixed range of values (
Next, we need to implement a fitness function, as we did last time by expanding the helper class. Below, to save space, an incomplete class code is given (the original code is designed for multithreading and contains many non-obvious optimizations that make it poorly readable).
Here we draw a picture described by our interfaces, take its pixels and compare with the pixels of the original image. As a result, we return the accumulated deviation with a minus sign (more error - worse solution, less error - better solution).
Next, we write code similar to the example where we were looking for a minimum of the function.
There is one difference -
The multi-threaded code in EvoJ is written to minimize locks, which allows you to get an almost linear performance increase with an increase in the number of threads (for maximum performance, you need the number of individuals in the population to be a multiple of the number of threads, otherwise some threads will not have enough tasks).
Still worth paying attention to the line
It is necessary to complete pending executor threads.
So, that’s all I wanted to tell you about EvoJ. As a summary, I will give the main features of EvoJ:
If you are interested, you can add EvoJ to your Maven project as follows:
Those who are not friends with Maven can download EvoJ directly from the project website http://evoj-frmw.appspot.com/ . There is also a copy of this tutorial in English, as well as a detailed technical guide and the results of using EvoJ for various tasks (image approximation, training of neural networks).
Thanks for attention. I will be happy to answer your questions.
UPD: Below are examples of what is obtained by approximation
Articles on the topic of genetic algorithms often appear here, and let me make my five cents.
For a couple of years now, I have been developing a Java framework for EvoJ dedicated to GA as a hobby. When I first started working with GA, the biggest inconvenience was the need to vectorize the variables that make up the solution, so in my framework I tried to make the vectorization of variables transparent for the programmer, putting all the dirty work on the shoulders of the framework. In addition, since GA is very well parallelizable, I tried to make the transition to multithreading no less easy.
To understand the basic idea of EvoJ, consider the following example. For simplicity, let’s say that we need to find the minimum of the function
Z(x, y)=12 * x^2 + 8 * x + 9 * y^2.
Naturally, this problem can be more effectively solved without resorting to the GA, but first, let's look at it, and then we will analyze an example that reveals the full power of EvoJ. So, first of all, we need to create a Java interface containing the variables we need. We have two variables: X and Y, create an interface containing getters for these variables.
public interface Solution {
@Range(min = "-10", max = "10")
double getX();
@Range(min = "-10", max = "10")
double getY();
}
Please note that setters are not required to be declared; EvoJ will be able to change variables without them. However, if you want to implement your own mutation strategy, you will have to declare setters - otherwise you yourself will not be able to change the variables.
Pay attention to the annotation
@Range
, it sets the range of values that a variable can take. Variables are triggered by random values from a given range. However, as a result of mutations, they can potentially go beyond the specified range. This can be prevented by using the parameter strict=”true”
, which will not allow the variable to accept an invalid value, even if you try to set it using the setter method.Another point worth paying attention to here is that all the parameters of all annotations in EvoJ are strings, this allows you to either specify the parameter value directly or specify the name property instead of a specific value, so that you do not hard-code the value of the annotation parameters in compile- time.
So, we have an interface with variables, let's write a fitness function. Fitness function in EvoJ is presented by the interface:
public interface SolutionRating {
Comparable calcRating(T solution);
}
Here the parameter
T
is our interface with variables. The larger the return value (according to the contract Comparable
), the more adapted the decision is. You can return null
, it is considered the smallest value of fitness function. It is recommended to implement this interface indirectly, through helper classes that take on some service functions: destruction of old solutions (if the maximum lifetime of the solution is specified), caching of the function value for solutions that were not eliminated at the previous iteration of the GA.
The fitness function for our case will look like this:
public class Rating extends AbstractSimpleRating {
public static double calcFunction(Solution solution) {
double x = solution.getX();
double y = solution.getY();
return 12 * x * x + 8 * x + 9 * y * y;
}
@Override
public Comparable doCalcRating(Solution solution) {
double fn = calcFunction(solution);
if (Double.isNaN(fn)) {
return null;
} else {
return -fn;
}
}
}
Everything is pretty obvious here, we just take and count our function using the variables from our first interface. Since we are looking for a minimum, and the class contract assumes that the best solutions should have a higher rating, then we return the value of the function times -1.
In addition, we filter out the left decisions (if NaN turned out) returning
null
. Well, we have two main components, let's make it all work.
DefaultPoolFactory pf = new DefaultPoolFactory();
GenePool pool = pf.createPool(200, Solution.class, null);
DefaultHandler handler = new DefaultHandler(new Rating(), null, null, null);
handler.iterate(pool, 200);
Solution solution = pool.getBestSolution();
Let's break down the code above into lines:
DefaultPoolFactory pf = new DefaultPoolFactory();
This factory will give us many initial solutions, if you remember, our solution is represented by an interface, not a class, so we cannot create an instance on our own.
GenePool pool = pf.createPool(200, Solution.class, null);
Here, a population of 200 random solutions (gene pool) is created. Pay attention to the last parameter - it should be here with the values of the properties, if you did not specify the annotation parameters explicitly.
Map
DefaultHandler handler = new DefaultHandler(new Rating(), null, null, null);
Handler is the embodiment of a genetic algorithm, it is a combination of mutation, crossbreeding, selection and rating strategies. Of these, it is necessary to implement only the rating calculation on your own - for everything else, there are standard implementations that will be used if, instead of the corresponding parameter, you pass
null
.handler.iterate(pool, 200);
This line performs 200 iterations of GA over our population. Since the task is quite primitive, 200 iterations is quite enough to find the minimum of our function. (Finding a solution for more complex tasks may require several hours and millions of iterations, fortunately, the selection of a solution can be accelerated using multithreading, but more on that later)
Solution solution = pool.getBestSolution();
Here we get the best solution found. On an instance of this interface, you can freely call it
getX()/getY()
to get the values of the desired variables. If the solution does not suit you, you can continue the iteration of the GA until the desired quality of the solution is achieved.
Thus, to solve the problem using EvoJ, you need:
- Create an interface with a variable
- Implement a fitness feature interface
- Create a population of solutions and implement the required number of GA iterations over them using the code above
Well, it was a warm-up, now let's see what the EvoJ really can.
Suppose we want to approximate a full-color image with a limited set of polygons. The solution to such a problem cannot be described by a single interface with simple variables, and this is where the full power of EvoJ is manifested. We can declare the following set of interfaces:
public interface PolyPicture {
@ListParams(length = "50")
List getPolygons();
}
public interface Polygon {
Colour getColor();
@ListParams(length = "8")
List getPoints();
}
public interface Point {
@Range(min = "-5", max = "325", strict = "true")
float getX();
@Range(min = "-5", max = "245", strict = "true")
float getY();
void setX(float x);
void setY(float y);
}
public interface Colour {
@MutationRange("0.2")
@Range(min = "0", max = "1", strict = "true")
float getRed();
@MutationRange("0.2")
@Range(min = "0", max = "1", strict = "true")
float getGreen();
@MutationRange("0.2")
@Range(min = "0", max = "1", strict = "true")
float getBlue();
void setRed(float red);
void setGreen(float green);
void setBlue(float blue);
}
At the top of the interface is the whole picture, it consists of a single property - a list of polygons. Each polygon is a color plus a list of points. A point is two coordinates, a color is three components.
Of the new annotations, it is worth mentioning
@ListParams
- specifying the length of the list and some other properties, and @MutationRange
- specifying the maximum radius of the mutation of the variable, i.e. how much the variable can change in one act of mutation. Also note that the variables describing the color components have a fixed range of values (
strict=”true”
).Next, we need to implement a fitness function, as we did last time by expanding the helper class. Below, to save space, an incomplete class code is given (the original code is designed for multithreading and contains many non-obvious optimizations that make it poorly readable).
public class PictureRating extends AbstractSimpleRating {
public PictureRating(BufferedImage originalImage){....}
@Override
protected Comparable doCalcRating(PolyPicture picture) {
BufferedImage testImage = drawPicture(picture);
int[] originalPix = getPixels(originalImage);
int[] testPix = getPixels(testImage);
return -calcError(originalPix, testPix);
}
}
Here we draw a picture described by our interfaces, take its pixels and compare with the pixels of the original image. As a result, we return the accumulated deviation with a minus sign (more error - worse solution, less error - better solution).
Next, we write code similar to the example where we were looking for a minimum of the function.
DefaultPoolFactory factory = new DefaultPoolFactory();
GenePool pool = factory.createPool(40, PolyPicture.class, null);
DemoRating rating = new DemoRating(PICTURE_TO_APPROXIMATE);
MultithreadedHandler handler = new MultithreadedHandler(2, rating, null, null, null);
handler.iterate(pool, 1000);
handler.shutdown();
There is one difference -
MultithreadedHandler
. It differs from a simple handler in that iterates the GA over a population of solutions in several threads (the number of threads is specified by the first parameter of the constructor). Thus, if your processor has several cores, you can really speed up the search for a solution just by increasing the number of threads. For example, a solution of acceptable quality for the considered problem of image approximation on Core i7 with HyperThreading enabled in 8 threads is found in 15-20 minutes. The multi-threaded code in EvoJ is written to minimize locks, which allows you to get an almost linear performance increase with an increase in the number of threads (for maximum performance, you need the number of individuals in the population to be a multiple of the number of threads, otherwise some threads will not have enough tasks).
Still worth paying attention to the line
handler.shutdown();
It is necessary to complete pending executor threads.
So, that’s all I wanted to tell you about EvoJ. As a summary, I will give the main features of EvoJ:
- no need to independently vectorize variables, you can immediately write code in terms of the domain domain
- standard implementation of the main GA strategies (breeding, crossbreeding, mutation)
- declarative mutation control (annotations)
- multithreading out of the box
- small size (120 kb) and no external dependencies (yes, you only need one jar file)
- extensibility at any point (you can write your own implementation of any GA strategy - selection, crossing and mutation)
If you are interested, you can add EvoJ to your Maven project as follows:
- First, you need to add a link to the repository (I haven’t laid out the central one yet)
EvojHome EvoJ http://evoj-frmw.appspot.com/repository
- then directly addicted
net.sourceforge.evoj evoj 2.1
Those who are not friends with Maven can download EvoJ directly from the project website http://evoj-frmw.appspot.com/ . There is also a copy of this tutorial in English, as well as a detailed technical guide and the results of using EvoJ for various tasks (image approximation, training of neural networks).
Thanks for attention. I will be happy to answer your questions.
UPD: Below are examples of what is obtained by approximation
Source image | Matched image |
---|---|