Dynamically create an Apache NiFi cluster
Apache NiFi is a convenient platform for working with various data in real time, with the ability to visually build these processes. The purpose of this article is to describe the possibilities of creating an Apache NiFi cluster.
Fig. 1. GUI Apache NiFi.
Features:
→ More details here
To start the Apache NiFi cluster, the built-in or external Apache Zookeeper can be used, you can set conf / nifi.properties in the settings. We will use the built-in.
Fig. 2. Apache NiFi Cluster Diagram
To configure the Apache NiFi cluster, we need at least 3 nodes in order to provide a quorum. It is generally recommended that you run ZooKeeper on 3 or 5 nodes. Operating on less than 3 nodes provides less longevity before failure. Running on more than 5 nodes usually results in more network traffic than necessary. For all three instances, the general properties of the cluster can be left with the default settings. However, note that when these parameters are changed, they must be the same for each future cluster node.
To minimize the configuration of the Apache NiFi cluster, you must perform the following operations on each node of the future cluster:
We describe each step in more detail.
To start Apache NiFi on each node, just run the command:
It doesn’t matter in what order Apache NiFi will be launched on each node. You can monitor the cluster startup process using the logs / nifi-app.log file
To study working with a cluster, we need the ability to locally launch an Apache NiFi cluster in a virtual environment. Hashicorp Vagrant and Oracle VM VirtualBox were used to run in a virtual environment. You must install the vagrant-vbguest and vagrant-hostmanager plugins. To speed up and facilitate the startup process, special vagrant provision scripts were written that allow you to start the Apache NiFi cluster in a virtual environment with one command:
After starting, within five to seven minutes, the user interface will be available in the browser at localhost : 8080 /. You can also check by opening VirtualBox, you should see three virtual machines running nifi01, nifi02 and nifi03.
The source code for the vagrant provisioning scripts for starting a NiFi cluster is available on github .
In some situations, it is necessary for the connected device to locate the cluster on the network and connect to it. For these purposes, an “agent” program was written that searches for devices on the network, and when a cluster is found (it checks through the Apache NiFi REST API) it connects to it. The source code for this program is available on github .
Agent startup example:
where the arguments are the path to Nifi and the port that the agent will listen to when creating a new cluster.
After starting, the cluster will be searched on the local network and connected to it. If the cluster is not found, there will be an attempt to create a cluster if there are 2 more devices ready to become part of the new cluster.
Fig. 3. GUI Apache Nifi running in the cluster.
Fig. 4. List of cluster nodes.
This work was done in order to experiment and verify the possibility of automatically creating an Apache NiFi cluster on a local network.
Of course, primitive algorithms were used to search and connect, but the purpose of the work was only to check this possibility.
Fig. 1. GUI Apache NiFi.
Features:
- Visual creation and management of directional graphics of processors.
- Asynchronous, which provides high throughput and natural buffering, even when the flow rate and processing diverge.
- Enables the creation of related and loosely coupled components that can then be reused in other contexts.
- Convenient error handling, which facilitates the work and the search for problem areas.
- Sources from which data are received, as well as how they flow and are processed, are visually visible and easily tracked.
→ More details here
Configure Apache NiFi Cluster
To start the Apache NiFi cluster, the built-in or external Apache Zookeeper can be used, you can set conf / nifi.properties in the settings. We will use the built-in.
Fig. 2. Apache NiFi Cluster Diagram
To configure the Apache NiFi cluster, we need at least 3 nodes in order to provide a quorum. It is generally recommended that you run ZooKeeper on 3 or 5 nodes. Operating on less than 3 nodes provides less longevity before failure. Running on more than 5 nodes usually results in more network traffic than necessary. For all three instances, the general properties of the cluster can be left with the default settings. However, note that when these parameters are changed, they must be the same for each future cluster node.
To minimize the configuration of the Apache NiFi cluster, you must perform the following operations on each node of the future cluster:
- set the necessary parameters in nifi.properties
- specify the cluster server in zookeeper.properties
- set id for zookeeper on localhost
- specify the connection string to the Zookeeper cluster in state-management.xml
We describe each step in more detail.
1. Set in nifi.properties: connect-string a list of servers with zk separated by commas. For example: nifi01: 2181, nifi02: 21818, nifi03: 2181 2. In the zookeeper.properties register the cluster server: 3. Set the id in the ./state/zookeeper/myid file if the local node is part of the Zookeeper cluster. 4. Register in the state-management.xml file the connection string to the clusternifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=3030
nifi.state.management.embedded.zookeeper.start=true
nifi.remote.input.host=
nifi.web.http.host=
nifi.zookeeper.connect.string=server.1=
:2888:3888
server.2=:2888:3888
server.3=:2888:3888
initLimit=5
syncLimit=2
To start Apache NiFi on each node, just run the command:
bin/nifi.sh start
It doesn’t matter in what order Apache NiFi will be launched on each node. You can monitor the cluster startup process using the logs / nifi-app.log file
Starting a local cluster in a virtual environment
To study working with a cluster, we need the ability to locally launch an Apache NiFi cluster in a virtual environment. Hashicorp Vagrant and Oracle VM VirtualBox were used to run in a virtual environment. You must install the vagrant-vbguest and vagrant-hostmanager plugins. To speed up and facilitate the startup process, special vagrant provision scripts were written that allow you to start the Apache NiFi cluster in a virtual environment with one command:
vagrant up
After starting, within five to seven minutes, the user interface will be available in the browser at localhost : 8080 /. You can also check by opening VirtualBox, you should see three virtual machines running nifi01, nifi02 and nifi03.
The source code for the vagrant provisioning scripts for starting a NiFi cluster is available on github .
Dynamic cluster formation
In some situations, it is necessary for the connected device to locate the cluster on the network and connect to it. For these purposes, an “agent” program was written that searches for devices on the network, and when a cluster is found (it checks through the Apache NiFi REST API) it connects to it. The source code for this program is available on github .
Agent startup example:
java -cp cluster-joiner-0.0.1-jar-with-dependencies.jar ru.itis.suc.NodeAgent /home/user/nifi/nifi-1.2.0 8085
where the arguments are the path to Nifi and the port that the agent will listen to when creating a new cluster.
After starting, the cluster will be searched on the local network and connected to it. If the cluster is not found, there will be an attempt to create a cluster if there are 2 more devices ready to become part of the new cluster.
Fig. 3. GUI Apache Nifi running in the cluster.
Fig. 4. List of cluster nodes.
Conclusion
This work was done in order to experiment and verify the possibility of automatically creating an Apache NiFi cluster on a local network.
Of course, primitive algorithms were used to search and connect, but the purpose of the work was only to check this possibility.