Best practices CI / CD with Kubernetes and GitLab (review and video report)



    On November 7, at the HighLoad ++ 2017 conference , in the DevOps and Operations section, a report was presented on “Best CI / CD Practices with Kubernetes and GitLab”. In it, we share practical experience in solving problems that arise when building an effective CI / CD process based on these Open Source solutions.

    By tradition, we are pleased to present a video with a report (about an hour, much more informative than the article) and the main squeeze in text form.

    CI / CD and key requirements


    By CI / CD (Continuous Integration, Continuous Delivery, Continuous Delivery) we understand all the stages of code delivery from the Git repository to production and its subsequent maintenance right up to the "removal" from production. Existing interpretations of the terms CI / CD allow a different attitude towards this last stage (production operation), but our experience says that its exclusion from CI / CD leads to numerous problems.

    Before describing specific practices, we summarized the main factors that affect the complexity of CI / CD:

    1. How is the main process built ? Possible options: one or more environments, dynamic environments, several distributed sites.
    2. How is testing done ? Running a code analyzer, unit tests without an environment (that is, without runtime dependencies), functional / integration / component tests in an environment, tests in a “complete” environment for microservices (end-to-end, regressions).
    3. What kind of separation of rights is required? Simple (only write code / roll out for certain users), different rights to the environment, multi-stage approval (team lead allows roll-out to stage, and QA to roll-out to production), a collegiate solution.
    4. What is the architecture of the application ? Stateless service, stateful application, multicomponent application, microservice architecture.

    All of these options are summarized in a general graph, which illustrates what tools you can close these or other needs:



    There are others - more general (but no less important!) - CI / CD requirements:


    Finally, as a company, introducing and maintaining system for CI / CD, there are additional requirements for the products used:

    • they should be Open Source (this is our “outlook on life”, and the practical side of the issue: the availability of code, the absence of restrictions on application);
    • Multi-scale ” (products should work equally effectively in companies with one developer and fifty);
    • interoperability (compatibility with each other, independence from the equipment supplier, cloud provider, etc.);
    • simplicity of operation ;
    • focus on the future (we must be sure not only that the products are suitable now, but also have development prospects).

    CI / CD Solutions


    A stack of products that satisfy all the requirements, we have the following:

    • Gitlab
    • Docker
    • Kubernetes;
    • Helm (for package management in Kubernetes);
    • dapp (our Open Source utility for simplifying / improving build and deployment processes). Updated August 13, 2019: the dapp project has now been renamed werf , its code has been rewritten to Go, and the documentation has been significantly improved.

    And the general CI / CD cycle from Git to operation using the above tools looks like this:



    Practice


    No. 1. Docker image composition


    The image should contain everything that is required for the application to work:



    No. 2. One look for everything


    A once-assembled Docker image should be used everywhere. Otherwise, not the same thing that will be pumped to production (reassembled at another point in time) may get to the QA for verification.

    At home we collect sp e mennye images of git branchand release version of the images git tag:



    No. 3. Rollout and migration


    The operation of the various components described in Kubernetes (backend as Deployment , DBMS as StatefulSet , etc.) may depend on each other. In the case of a “straight-forward” roll-out, K8s will update the components and restart them (in case of a failed start):



    And even if the components handle all these events correctly (which should also be foreseen), the total roll-out time will be delayed due to various timeouts in Kubernetes that are triggered (and accumulating) due to the need to wait for the launch of other services (for example, the backend is waiting for the availability of the updated database, i.e., with the migration).

    To reduce the roll-out time, it is necessary to take into account (and prescribe) not only explicit dependencies between the components (the backend is waiting for the DBMS to be available), but also indirect ones (the backend should not be updated before the migration).



    Number 4. Bootstrap db


    DBMSs are filled with data in two ways: by loading a finished dump or by loading seeds / fixtures. Kubernetes path in the first case - it is after the start of the database to run the job ( Job jobseeker ), which loads the dump. After that - the start of migrations (they indirectly depend on the dump), and then - the start of the backend (indirectly depends on migrations).

    The startup sequence for the case with sids changes and looks like this: 1) DBMS, 2) migration, 3) task with sids, 4) backend (you can run in parallel with sids, because it should not depend on seeds).



    After going through the various paths of the bootstrap database, we came to the optimality of two: a night dump with seeds / fixtures from master and a night dump for staging.



    No. 5. Roll out without downtime


    Even with the correct deployment of new versions of the application in Kubernetes, not all the nuances will be automatically taken into account. To truly guarantee complete downtime:

    • do not forget about the correct completion of all HTTP requests,
    • check the real availability of the application (open port) using the readiness probes prescribed in K8s,
    • simulate the load during the deployment (check if all requests are completed),
    • Provide a margin of productivity for the hearth available in the infrastructure during the deployment.



    No. 6. "Atomicity" rolled out


    If an error occurred during the upgrade of any component of the infrastructure (the update could not roll out), there will be a corresponding notification in the GitLab pipelines, but the infrastructure itself will remain in a transitional state. The problem component will not save the original (old) version, as GitLab “thinks”, but will be able to not fully skip the new version.

    The problem is solved using the built-in rollback feature in case of such an error. And most likely this should be provided only in production, because in dev-circuits, you will most likely want to look at the problem component in its transition state (to understand the causes of the problem).

    Number 7. Dynamic environments


    GitLab’s built-in capabilities allow git push of any branch (with a Helm chart for a full infrastructure bootstrap) to deploy a ready-to-use application installation (with the same namespace) in Kubernetes. Similarly, a namespace will be deleted when a branch is deleted.

    However, with the active use of this opportunity, the resources required for infrastructure are growing rapidly. Therefore, there are several recommendations for optimizing dynamic environments:



    Number 8. Tests


    If everything is simple with the implementation of simple tests (not requiring an environment), then for functional and integration testing we suggest creating a dynamic environment using Helm:



    Summary


    Returning to the initial schedule with the requirements for CI / CD ...



    The described practices based on Open Source-products allow us to solve problems marked in green. Violet highlights the area of ​​problems that we at Flant work with every day.

    Videos and slides


    Video from the performance (about an hour) was published on YouTube .

    Presentation of the report:



    PS


    Other reports on our blog:


    You may also be interested in the following publications:


    Also popular now: