Distributed Data Warehouse in Data Lake Concept: Cluster Administration

    The administration theme of the Cloudera cluster is wide enough and it is not possible to highlight it in one article. In this post we will focus on instructions for solving the most frequently encountered tasks related to the cluster and the services installed in it, and for deeper immersion I recommend to refer to the official documentation and the forum. There you can find information on almost any issue.




    Cluster startup


    On the Cloudera Manager home page, click on the button with the arrow to the right of the cluster name and select Start:




    Restart cluster


    Do the same as the previous item and select Restart.


    Stop cluster


    Do the same as the previous paragraph and select Stop.


    Starting Services Roles


    On the Cloudera Manager home page, click on the Clusters button and select the service in the required cluster, the role of which you need to run:




    Go to the Instances tab of this service:




    Service roles have a status to the right of their name. Stopped roles correspond to Stopped. In the table we note the role of the service that needs to be launched:




    Click on the Actions for Selected button and select Start:




    Press the Start button to confirm the launch:




    Restarting service roles


    Repeat the actions from the previous item and select Restart after clicking the Actions for Selected button.


    Stop Service Roles


    Do the same as the previous item and select Stop after clicking the Actions for Selected button.


    Adding Role


    On the Cloudera Manager home page, click on the Clusters button and select the service in the required cluster for which you need to add a role:




    Go to the Instances tab of this service and click Add Role Instances:




    For the roles that you want to add, choose the hosts on which they need to be installed:




    Confirm the installation of selected roles on the specified hosts:




    Deleting a role


    On the Cloudera Manager home page, click on the Clusters button and select the service for which you want to delete the role in the desired cluster:




    Go to the Instances tab of this service:




    Note the roles that need to be removed (after stopping them):




    Click the Actions for Selected button and select Delete:




    Confirm the deletion by pressing the Delete button:




    Add service


    The addition of the service has already been described in the item “Installing additional parcels”, so we will not dwell on this process in detail.


    Deleting a service


    On the Cloudera Manager home page, click on the Clusters button and select the service to be deleted in the desired cluster:




    Go to the Instances tab of this service:




    Note the active roles:




    Click the Actions for Selected button and select Stop:




    Confirm the stop by pressing the Stop button:




    Go to the Cloudera Manager home page, click on the button with the arrow to the right of the name of the service that needs to be deleted, and select Delete:




    Confirm the deletion by pressing the Delete button:




    Redeploying services after changing configuration files


    After modifying the service configuration files, you will need to redeploy them. In this case, a file symbol with an arrow will appear to the right of the corresponding service. Click on it:




    In the lower right corner, click Restart Stale Services:




    Confirm the restart by clicking Restart Now in the lower right corner. If there is no need to deploy the client configuration, remove the check on this page:




    The restart page displays the status of the restarted services. In case of incorrect configurations, by clicking on the arrow to the right of the task, the error detail will be available. After the restart is complete, click Finish:




    Setting up monitoring tools


    When adding hosts to a cluster, Cloudera Manager installs its agents on them that allow monitoring of the system metrics of these machines. Charts of all metrics collected are available on the Charts Library tab in the All Hosts \ Hostname section. Cloudera Manager also has a flexible mechanism for visualizing metrics based on SQL queries and filters, which allows you to easily and quickly create a selection of monitors on your home screen that provide a fairly complete picture of the system. Consider these mechanisms on the example of adding a graph of one of the system metrics to the home page.


    On the Cloudera Manager home page, click on the Hosts button and select All Hosts:




    Select the server whose monitor we want to add metrics:




    Select one of the graphs, click on the button with the gear in its upper right corner and select Add to Dashboard (similarly, you can go to the Chart Library tab and select the desired graph from the full catalog):




    Specify the name of the chart (you can leave it by default), select the panel on which we want to place it (to place it on the Cloudera Manager home page, select the Home Page) and click Save Chart:




    After that, the selected graph appears in the corresponding panel:




    If necessary, the added chart can be changed by clicking on the button with the gear in its upper right corner and selecting Open in Chart Builder.


    Conclusion


    After setting up the monitoring, the Cloudera cluster is ready for operation: you can run data loading tasks, transform them and connect Data Mining tools. And although there is still a long way to reach the final goals, this point can be considered a starting point.


    As a result of this project, all of the goals were achieved: the routine tasks of the credit risk calculation department were automated, and the data scientists acquired “high-quality” tools for collaboration. On the way to these goals, there were also quite a few nuances and difficult moments that I will be happy to share with you in the following sections. They will focus on building continuous integration to speed development processes, as well as installing and configuring Data Mining tools.


    In conclusion, I want to say that working with a stack of applications formed around Apache Hadoop is not always easy, but very interesting. Their technologies open up a lot of opportunities and have already formed around themselves a fairly large community that is always ready to help in difficult times. A little practice and you will succeed.


    PS In the next article I will explain how to effectively organize continuous integration for projects with development under CDH. See you soon!


    Links to previous articles:
    Distributed data storage in the Data Lake concept: where to start
    Distributed data storage in the Data Lake concept: installation of CDH


    Also popular now: