Dynamically create an Apache NiFi cluster

    Apache NiFi is a convenient platform for working with various data in real time, with the ability to visually build these processes. The purpose of this article is to describe the possibilities of creating an Apache NiFi cluster.

    imageFig. 1. GUI Apache NiFi.

    Features:

    • Visual creation and management of directional graphics of processors.
    • Asynchronous, which provides high throughput and natural buffering, even when the flow rate and processing diverge.
    • Enables the creation of related and loosely coupled components that can then be reused in other contexts.
    • Convenient error handling, which facilitates the work and the search for problem areas.
    • Sources from which data are received, as well as how they flow and are processed, are visually visible and easily tracked.

    More details here

    Configure Apache NiFi Cluster


    To start the Apache NiFi cluster, the built-in or external Apache Zookeeper can be used, you can set conf / nifi.properties in the settings. We will use the built-in.
    imageFig. 2. Apache NiFi Cluster Diagram

    To configure the Apache NiFi cluster, we need at least 3 nodes in order to provide a quorum. It is generally recommended that you run ZooKeeper on 3 or 5 nodes. Operating on less than 3 nodes provides less longevity before failure. Running on more than 5 nodes usually results in more network traffic than necessary. For all three instances, the general properties of the cluster can be left with the default settings. However, note that when these parameters are changed, they must be the same for each future cluster node.

    To minimize the configuration of the Apache NiFi cluster, you must perform the following operations on each node of the future cluster:

    1. set the necessary parameters in nifi.properties
    2. specify the cluster server in zookeeper.properties
    3. set id for zookeeper on localhost
    4. specify the connection string to the Zookeeper cluster in state-management.xml

    We describe each step in more detail.
    1. Set in nifi.properties: connect-string a list of servers with zk separated by commas. For example: nifi01: 2181, nifi02: 21818, nifi03: 2181 2. In the zookeeper.properties register the cluster server: 3. Set the id in the ./state/zookeeper/myid file if the local node is part of the Zookeeper cluster. 4. Register in the state-management.xml file the connection string to the cluster
    nifi.cluster.is.node=true
    nifi.cluster.node.address=
    nifi.cluster.node.protocol.port=3030
    nifi.state.management.embedded.zookeeper.start=true
    nifi.remote.input.host=
    nifi.web.http.host=
    nifi.zookeeper.connect.string=





    server.1=:2888:3888
    server.2=:2888:3888
    server.3=:2888:3888
    initLimit=5
    syncLimit=2





    To start Apache NiFi on each node, just run the command:

    bin/nifi.sh start

    It doesn’t matter in what order Apache NiFi will be launched on each node. You can monitor the cluster startup process using the logs / nifi-app.log file

    Starting a local cluster in a virtual environment


    To study working with a cluster, we need the ability to locally launch an Apache NiFi cluster in a virtual environment. Hashicorp Vagrant and Oracle VM VirtualBox were used to run in a virtual environment. You must install the vagrant-vbguest and vagrant-hostmanager plugins. To speed up and facilitate the startup process, special vagrant provision scripts were written that allow you to start the Apache NiFi cluster in a virtual environment with one command:

    vagrant up

    After starting, within five to seven minutes, the user interface will be available in the browser at localhost : 8080 /. You can also check by opening VirtualBox, you should see three virtual machines running nifi01, nifi02 and nifi03.

    The source code for the vagrant provisioning scripts for starting a NiFi cluster is available on github .

    Dynamic cluster formation


    In some situations, it is necessary for the connected device to locate the cluster on the network and connect to it. For these purposes, an “agent” program was written that searches for devices on the network, and when a cluster is found (it checks through the Apache NiFi REST API) it connects to it. The source code for this program is available on github .

    Agent startup example:

    java -cp cluster-joiner-0.0.1-jar-with-dependencies.jar ru.itis.suc.NodeAgent /home/user/nifi/nifi-1.2.0 8085

    where the arguments are the path to Nifi and the port that the agent will listen to when creating a new cluster.

    After starting, the cluster will be searched on the local network and connected to it. If the cluster is not found, there will be an attempt to create a cluster if there are 2 more devices ready to become part of the new cluster.


    Fig. 3. GUI Apache Nifi running in the cluster.


    Fig. 4. List of cluster nodes.

    Conclusion


    This work was done in order to experiment and verify the possibility of automatically creating an Apache NiFi cluster on a local network.

    Of course, primitive algorithms were used to search and connect, but the purpose of the work was only to check this possibility.

    Also popular now: