AWS ElasticBeanstalk: Tips and Tricks

    AWS ElasticBeanstalk - PaaS based on AWS infrastructure. In my opinion, a significant advantage of this service is the ability to directly access infrastructure elements (balancers, instances, queues, etc.). In this article I decided to collect some tricks to solve typical problems when using ElasticBeanstalk. I will complement as new ones are found. Questions and suggestions in the comments are welcome.



    Options for adding application configuration


    In my opinion, the obvious drawback of the platform is the indistinct mechanism of configuration storage. Therefore, I use the following methods to add configuration.

    The most obvious and native for ElasticBeanstalk is setting through environment variables. Inside the instance, these environment variables are not accessible as usual, but exclusively in the application environment. To set these parameters, it is most convenient to use the eb setenv command from the awsebcli package, which is used to deploy the application (suitable for small projects), or the AWS API.

    eb setenv RDS_PORT=5432    PYTHONPATH=/opt/python/current/app/myapp:$PYTHONPATH    RDS_PASSWORD=12345    DJANGO_SETTINGS_MODULE=myapp.settings    RDS_USERNAME=dbuser    RDS_DB_NAME=appdb    RDS_HOSTNAME=dbcluster.us-east-1.rds.amazonaws.com

    The second option is when the config is injected into the created version of the application. To do this, you need to explain how the deployment process takes place. Manually or with a script, a zip archive is created that contains the application code, is laid out on a special S3 bucket, unique for each region (of the type elasticbeanstalk- <region_name> - <my_account_id> - do not try to use another, it won’t work - it’s checked). You can create this package manually or edit it programmatically. I prefer to use an alternative deployment option when instead of awsebcli I use my own version package creation code.



    The third option is to load the configuration remotely during the deployment phase from an external configuration database. IMHO the most correct approach, however, is beyond the scope of this article. I use a scheme with storing configs on S3 and proxying requests to S3 through the Gateway API - this allows the most flexible configuration management. S3 also supports versioning.

    Turn on jobs in crontab


    ElasticBeanstalk supports creating tasks for the scheduler using the cron.yaml file. However, this config only works for the worker environment — the configuration used to process the task queue / periodic tasks. To solve this problem in the WebServer environment, add a file with the following contents to the project directory .ebextensions:

    files:
      "/etc/cron.d/cron_job":
        mode: "000644"
        owner: root
        group: root
        content: |
           #Add comands below
           15 10 * * * root curl  www.google.com >/dev/null 2>&1<code>
      "/usr/local/bin/cron_job.sh":
        mode: "000755"
        owner: root
        group: root
        content: |
          #!/bin/bash
          /usr/local/bin/test_cron.sh || exit
          echo "Cron running at " `date` >> /tmp/cron_job.log
          # Now do tasks that should only run on 1 instance ...
      "/usr/local/bin/test_cron.sh":
        mode: "000755"
        owner: root
        group: root
        content: |
          #!/bin/bash
          METADATA=/opt/aws/bin/ec2-metadata
          INSTANCE_ID=`$METADATA -i | awk '{print $2}'`
          REGION=`$METADATA -z | awk '{print substr($2, 0, length($2)-1)}'`
          # Find our Auto Scaling Group name.
          ASG=`aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" \
            --region $REGION --output text | awk '/aws:autoscaling:groupName/ {print $5}'`
          # Find the first instance in the Group
          FIRST=`aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names $ASG \
            --region $REGION --output text | awk '/InService$/ {print $4}' | sort | head -1`
          # Test if they're the same.
          [ "$FIRST" = "$INSTANCE_ID" ]
    commands:
      rm_old_cron:
        command: "rm *.bak"
        cwd: "/etc/cron.d"
        ignoreErrors: true
    

    Automatically apply Django migrations and build statics during deployment


    Add to the config file in .ebextensions:

    container_commands:
      01_migrate:
        command: "python manage.py migrate --noinput"
        leader_only: true
      02_collectstatic:
          command: "./manage.py collectstatic --noinput"
    

    Similarly apply alembic migrations; in order to avoid applying migrations on each instance of an autoscaling group, the leader_only parameter is specified

    Using hooks when deploying applications


    By creating scripts in the / opt / elasticbeanstalk / hooks / directory, you can add various control scripts, in particular, modify the application deployment process. Scripts that run before deployment are in the / opt / elasticbeanstalk / hooks / appdeploy / pre / * directory, during / opt / elasticbeanstalk / hooks / appdeploy / enact / *, and after that in / opt / elasticbeanstalk / hooks / appdeploy / post / *. Scripts are executed in alphabetical order, so you can build the correct sequence of application deployment.

    Adding a Celery daemon to an existing supervisor config


    files:
      "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
        mode: "000755"
        owner: root
        group: root
        content: |
          #!/usr/bin/env bash
          # Get django environment variables
          celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
          celeryenv=${celeryenv%?}
          # Create celery configuraiton script
          celeryconf="[program:celeryd]
          ; Set full path to celery program if using virtualenv
          command=/opt/python/run/venv/bin/celery worker -A yourapp -B --loglevel=INFO -s /tmp/celerybeat-schedule
          directory=/opt/python/current/app
          user=nobody
          numprocs=1
          stdout_logfile=/var/log/celery-worker.log
          stderr_logfile=/var/log/celery-worker.log
          autostart=true
          autorestart=true
          startsecs=10
          ; Need to wait for currently executing tasks to finish at shutdown.
          ; Increase this if you have very long running tasks.
          stopwaitsecs = 600
          ; When resorting to send SIGKILL to the program to terminate it
          ; send SIGKILL to its whole process group instead,
          ; taking care of its children as well.
          killasgroup=true
          ; if rabbitmq is supervised, set its priority higher
          ; so it starts first
          priority=998
          environment=$celeryenv"
          # Create the celery supervisord conf script
          echo "$celeryconf" | tee /opt/python/etc/celery.conf
          # Add configuration script to supervisord conf (if not there already)
          if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
              then
              echo "[include]" | tee -a /opt/python/etc/supervisord.conf
              echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
          fi
          # Reread the supervisord config
          supervisorctl -c /opt/python/etc/supervisord.conf reread
          # Update supervisord in cache without restarting all services
          supervisorctl -c /opt/python/etc/supervisord.conf update
          # Start/Restart celeryd through supervisord
          supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
    

    By the way, I used the experimental opportunity to take as a broker for Celery SQS and it justified itself; True, flower does not yet have support for such a scheme.

    Auto Forwarding HTTP to HTTPS


    Used such an addition to the Apache config inside ElasticBeanstalk

    files:
        "/etc/httpd/conf.d/ssl_rewrite.conf":
            mode: "000644"
            owner: root
            group: root
            content: |
                RewriteEngine On
                <If "-n '%{HTTP:X-Forwarded-Proto}' && %{HTTP:X-Forwarded-Proto} != 'https'">
                RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R,L]
                </If>

    Using multiple SSL domains


    Amazon provides domain owners the opportunity to use SSL certificates for free, including wildcard, but only inside AWS. To use several domains with SSL on one environment, we obtain a certificate through AWS Certificate Manager, add another ELB balancer and configure SSL on it. You can use certificates obtained from another supplier.



    UPDATE Below, in the commentary, respected darken99 brought a couple of useful features, let me add them here with some explanations

    Turn off the environment as scheduled

    In this case, depending on the specified time range, the number of instances in the autoscaling group decreases from 1 to 0.

    
    option_settings:
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Start
        option_name: MinSize 
        value: 1
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Start
        option_name: MaxSize
        value: 1
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Start
        option_name: DesiredCapacity
        value: 1
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Start
        option_name: Recurrence
        value: "0 9 * * 1-5"
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Stop
        option_name: MinSize
        value: 0
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Stop
        option_name: MaxSize
        value: 0
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Stop
        option_name: DesiredCapacity
        value: 0
      - namespace: aws:autoscaling:scheduledaction
        resource_name: Stop
        option_name: Recurrence
        value: "0 18 * * 1-5"
    


    Replacing Apache with Nginx
    option_settings:
      aws:elasticbeanstalk:environment:proxy:
        ProxyServer: nginx
    
    Not working for python

    Also popular now: