The book "Pragmatic AI. Machine Learning and Cloud Technologies

Hi, Habrozhiteli! This book by Noah Gifta is intended for everyone who is interested in AI, machine learning, cloud computing, as well as any combination of these topics. Both programmers and just caring techies will find useful information for themselves here. Code examples are given in Python. It discusses many such advanced topics as the use of cloud platforms (for example, AWS, GCP and Azure), as well as machine learning techniques and the implementation of AI. Jedi, freely oriented in Python, cloud computing and ML, will also find a lot of useful ideas for themselves that they can immediately apply in their current work.

We offer you to read the excerpt from the book "Creating an Intellectual Slack Bot in AWS"

People have long dreamed of creating "artificial life." Most often this is possible by creating bots. Bots are becoming an increasingly integral part of our daily lives, especially after the appearance of Siri from Apple and Alexa from Amazon. In this chapter we will reveal all the secrets of creating bots.

Create bot

To create a bot, we will use the Slack library for the Python language (https://github.com/slackapi/python-slackclient). To get started with Slack, you need to generate an identification marker. In general, it makes sense to export an environment variable when working with such markers. I often do this in virtualenv, thus gaining access to it automatically when running in the current environment. To do this, you need to “hack” the virtualenv utility a little by editing the activate script.

When the Slack variable is exported in the ~ / .env / bin / activate script, it will look like this.

And just for information, if you want to keep up with the latest innovations, it is recommended to use a new, official Python utility for managing the environment - pipenv (https://github.com/pypa/pipenv) that appeared on the market:

_OLD_VIRTUAL_PATH="$PATH"
PATH="$VIRTUAL_ENV/bin:$PATH"
export PATH
SLACK_API_TOKEN=<Your Token Here>
export SLACK_API_TOKEN

To check whether the environment variable is set, it is convenient to use the printenv command of OS X and Linux. After that, to check the sending of the message, you can use the following short script:

import os
from slackclient import SlackClient
slack_token = os.environ["SLACK_API_TOKEN"]
sc = SlackClient(slack_token)
sc.api_call(
   "chat.postMessage",
   channel="#general",
   text="Hello from my bot! :tada:"
)

It is also worth noting that the pipenv utility is the recommended solution that combines the capabilities of the pip and virtualenv utilities in one component. It has become a new standard, so it makes sense to look at it in terms of package management.

Convert library to command line utility

As with the other examples in this book, it’s a good idea to convert our code into a command line utility to make it easier to check for new ideas. It should be noted that many novice developers often prefer not command line utilities, but other solutions, for example, simply work in Jupiter notebooks. I'll play the role of the devil's advocate for a while and ask a question that readers may well have: “Why do we need command line utilities in a project based on Jupiter notebooks? Doesn't the meaning of Jupiter notebooks make it unnecessary to make the command shell and command line unnecessary? ”Adding a command line utility to a project is good because it allows you to quickly try out various input options. Jupiter notebook code blocks do not accept input data, in a sense, these are scripts with hardwired data.

The many command line utilities on both the GCP and AWS platforms do not exist by chance: they provide flexibility and features not available for graphical interfaces. A wonderful collection of essays on this subject by science fiction writer Neal Stephenson (Neal Stephenson) is called "In the beginning ... there was a command line." In it, Stevenson says: "GUIs lead to significant additional overhead for every, even the smallest, software component that completely changes the programming environment." He ends the collection with the words: “... life is a very difficult and complicated thing; no interface will change that; and anyone who thinks otherwise is a dupe ... ”Tough enough, but my experience suggests that it’s true enough. Life with the command line is getting better. Try it - and you do not want to go back to the GUI.

To do this, we will use the click package, as shown below. Sending messages using the new interface is very simple.

./clibot.py send --message "from cli"
sending message from cli to #general

Figure 7.1 shows the default values, as well as a custom message from the cli utility.

#!/usr/bin/env python
import os
import click
from slackclient import SlackClient
SLACK_TOKEN = os.environ["SLACK_API_TOKEN"]
def send_message(channel="#general",
                            message="Hello from my bot!"):
     """Отправить сообщение на канал"""
     slack_client = SlackClient(SLACK_TOKEN)
     res = slack_client.api_call(
     "chat.postMessage",
     channel=channel,
     text=message
  )
  return res
@click.group()
@click.version_option("0.1")
def cli():
  """
  Утилита командной строки для слабаков
  """
@cli.command("send")
@click.option("--message", default="Hello from my bot!",
                       help="text of message")
@click.option("--channel", default="#general",
                       help="general channel")
def send(channel, message):
     click.echo(f"sending message {message} to {channel}")
     send_message(channel, message=message)
if __name__ == '__main__':
     cli()

Get the bot to the next level with AWS Step Functions

After creating communication channels to send messages to Slack, you can improve our code, namely: run it on a schedule and use it for some useful actions. The AWS Step Functions step-by-step service is great for this purpose. In the next section, our Slack bot will learn how to scrap Yahoo! NBA players, extract their places of birth, and then send this data to Slack.

Figure 7.2 shows the finished step-by-step function in action. The first step is to retrieve the URL profiles of NBA players, and the second is to use the Beautiful Soup library to find the place of birth for each player. Upon completion of the step-by-step function, the results will be sent back to Slack.

You can use AWS Lambda and Chalice to coordinate individual parts of the work within the step function. Lambda (https://aws.amazon.com/lambda/) allows the user to perform functions in AWS, and the Chalice framework (http://chalice.readthedocs.io/en/latest/) enables the creation of serverless Python applications. Here are some prerequisites:

user must have an AWS account;
the user needs credentials to use the API;
the Lambda role (created by Chalice) must have a policy with the privileges necessary to invoke the corresponding AWS services, for example S3.

Configuring IAM Credentials

Detailed instructions for setting AWS credentials can be found at boto3.readthedocs.io/en/latest/guide/configuration.html . Information on exporting AWS variables on Windows and Linux operating systems can be found here . There are many ways to configure credentials, but virtualenv users can place AWS credentials in a local virtual environment in the / bin / activate script:

#Добавляем ключи AWS
AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=xxxxxxxx
AWS_SESSION_TOKEN=xxxxxxxx

#Экспортируем ключи
export AWS_DEFAULT_REGION
export AWS_ACCESS_KEY_ID
export AWS_DEFAULT_REGION

Work with Chalice. Chalice has a command line utility with many commands available:

Usage: chalice [OPTIONS] COMMAND [ARGS]...
Options:
    --version                        Show the version and exit.
    --project-dir                   TEXT The project directory. Defaults to CWD
    --debug / --no-debug      Print debug logs to stderr.
    --help                            Show this message and exit.
Commands:
    delete
    deploy
    gen-policy
    generate-pipeline Generate a cloudformation template for a...
    generate-sdk
    local
    logs
    new-project
    package
    url

The code inside the app.py template can be replaced with Lambda service functions. In AWS Chalice, it’s convenient that it allows you to create, in addition to web services, “stand-alone” Lambda functions. Thanks to this functionality, you can create several Lambda functions, associate them with a step-by-step function and put them together like Lego blocks.

For example, you can easily create a scheduled Lambda function that will perform any actions:

@app.schedule(Rate(1, unit=Rate.MINUTES))
def every_minute(event):
      """Событие, запланированное для ежеминутного выполнения"""
      #Отправка сообщения боту Slack

To establish interaction with the bot for web scraping, you need to create several functions. At the beginning of the file are the imports and a number of variables are declared:

import logging
import csv
from io import StringIO
import boto3
from bs4 import BeautifulSoup
import requests
from chalice import (Chalice, Rate)
APP_NAME = 'scrape-yahoo'
app = Chalice(app_name=APP_NAME)
app.log.setLevel(logging.DEBUG)

The bot may need to store some data in S3. The following function uses Boto to save the results in a CSV file:

def create_s3_file(data, name="birthplaces.csv"):
      csv_buffer = StringIO()
      app.log.info(f"Creating file with {data} for name")
      writer = csv.writer(csv_buffer)
      for key, value in data.items():
           writer.writerow([key,value])
      s3 = boto3.resource('s3')
      res = s3.Bucket('aiwebscraping').\
            put_object(Key=name, Body=csv_buffer.getvalue())
      return res

The fetch_page function uses the Beautiful Soup library to parse an HTML page located in accordance with the URL of the NBA statistics, and returns a soup object:

def fetch_page(url="https://sports.yahoo.com/nba/stats/"):
      """Извлекает URL Yahoo"""
      #Скачивает страницу и преобразует ее в объект
      # библиотеки Beautiful Soup
      app.log.info(f"Fetching urls from {url}")
      res = requests.get(url)
      soup = BeautifulSoup(res.content, 'html.parser')
      return soup

The functions get_player_links and fetch_player_urls get links to player profile URLs:

def get_player_links(soup):
      """Получает ссылки из URL игроков
      Находит все URL на странице в тегах 'a' и фильтрует их в поисках
      строки 'nba/players'
      """
      nba_player_urls = []
      for link in soup.find_all('a'):
           link_url = link.get('href')
           #Отбрасываем неподходящие
           if link_url:
               if "nba/players" in link_url:
                   print(link_url)
                   nba_player_urls.append(link_url)
      return nba_player_urls
def fetch_player_urls():
      """Возвращает URL игроков"""
      soup = fetch_page()
      urls = get_player_links(soup)
      return urls

Further, in the find_birthplaces function, we extract the places of birth of the players located at these URLs:

def find_birthplaces(urls):
      """Получаем места рождения со страниц профилей игроков NBA
          на Yahoo"""
      birthplaces = {}
      for url in urls:
           profile = requests.get(url)
           profile_url = BeautifulSoup(profile.content, 'html.parser')
           lines = profile_url.text
           res2 = lines.split(",")
           key_line = []
           for line in res2:
                if "Birth" in line:
                    #print(line)
                    key_line.append(line)
           try:
                birth_place = key_line[0].split(":")[-1].strip()
                app.log.info(f"birth_place: {birth_place}")
           except IndexError:
                app.log.info(f"skipping {url}")
                continue
           birthplaces[url] = birth_place
           app.log.info(birth_place)
      return birthplaces

Now we go to the Chalice functions. Note: for the Chalice framework, it is necessary to create a default path:

#Их можно вызвать с помощью HTTP-запросов
@app.route('/')
def index():
      """Корневой URL"""
      app.log.info(f"/ Route: for {APP_NAME}")
      return {'app_name': APP_NAME}

The following Lambda function is a route linking an HTTP URL with a function written earlier:

@app.route('/player_urls')
def player_urls():
      """Извлекает URL игроков"""
      app.log.info(f"/player_urls Route: for {APP_NAME}")
      urls = fetch_player_urls()
      return {"nba_player_urls": urls}

The following Lambda functions are autonomous, they can be called inside a step-by-step function:

#Это автономная функция Lambda
@app.lambda_function()
def return_player_urls(event, context):
     """Автономная функция Lambda, возвращающая URL игроков"""
     app.log.info(f"standalone lambda 'return_players_urls'\
        {APP_NAME} with {event} and {context}")
     urls = fetch_player_urls()
     return {"urls": urls}
#Это автономная функция Lambda
@app.lambda_function()
def birthplace_from_urls(event, context):
      """Находит места рождения игроков"""
      app.log.info(f"standalone lambda 'birthplace_from_urls'\
         {APP_NAME} with {event} and {context}")
      payload = event["urls"]
      birthplaces = find_birthplaces(payload)
      return birthplaces
#Это автономная функция Lambda
@app.lambda_function()
def create_s3_file_from_json(event, context):
      """Создает файл S3 на основе данных в формате JSON"""
      app.log.info(f"Creating s3 file with event data {event}\
          and context {context}")
      print(type(event))
      res = create_s3_file(data=event)
      app.log.info(f"response of putting file: {res}")
      return True

If you run the resulting Chalice application locally, the following results will be displayed:

→ scrape-yahoo git:(master)  chalice local
Serving on 127.0.0.1:8000
scrape-yahoo - INFO - / Route: for scrape-yahoo
127.0.0.1 - - [12/Dec/2017 03:25:42] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [12/Dec/2017 03:25:42] "GET /favicon.ico"
scrape-yahoo - INFO - / Route: for scrape-yahoo
127.0.0.1 - - [12/Dec/2017 03:25:45] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [12/Dec/2017 03:25:45] "GET /favicon.ico"
scrape-yahoo - INFO - /player_urls Route: for scrape-yahoo
scrape-yahoo - INFO - https://sports.yahoo.com/nba/stats/
https://sports.yahoo.com/nba/players/4563/
https://sports.yahoo.com/nba/players/5185/
https://sports.yahoo.com/nba/players/3704/
https://sports.yahoo.com/nba/players/5012/
https://sports.yahoo.com/nba/players/4612/
https://sports.yahoo.com/nba/players/5015/
https://sports.yahoo.com/nba/players/4497/
https://sports.yahoo.com/nba/players/4720/
https://sports.yahoo.com/nba/players/3818/
https://sports.yahoo.com/nba/players/5432/
https://sports.yahoo.com/nba/players/5471/
https://sports.yahoo.com/nba/players/4244/
https://sports.yahoo.com/nba/players/5464/
https://sports.yahoo.com/nba/players/5294/
https://sports.yahoo.com/nba/players/5336/
https://sports.yahoo.com/nba/players/4390/
https://sports.yahoo.com/nba/players/4563/
https://sports.yahoo.com/nba/players/3704/
https://sports.yahoo.com/nba/players/5600/
https://sports.yahoo.com/nba/players/4624/
127.0.0.1 - - [12/Dec/2017 03:25:53] "GET /player_urls"
127.0.0.1 - - [12/Dec/2017 03:25:53] "GET /favicon.ico"

To deploy an application, run the chalice deploy command:

→ scrape-yahoo git:(master)  chalice deploy
Creating role: scrape-yahoo-dev
Creating deployment package.
Creating lambda function: scrape-yahoo-dev
Initiating first time deployment.
Deploying to API Gateway stage: api
https://bt98uzs1cc.execute-api.us-east-1.amazonaws.com/api/

Thanks to the HTTP command line interface (https://github.com/jakubroztocil/httpie), we call the HTTP route from AWS and extract the links available in / api / player_urls:

→ scrape-yahoo git:(master)  http \
https://<a lambda route>.amazonaws.com/api/player_urls
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 941
Content-Type: application/json
Date: Tue, 12 Dec 2017 11:48:41 GMT
Via: 1.1 ba90f9bd20de9ac04075a8309c165ab1.cloudfront.net (CloudFront)
X-Amz-Cf-Id: ViZswjo4UeHYwrc9e-5vMVTDhV_Ic0dhVIG0BrDdtYqd5KWcAuZKKQ==
X-Amzn-Trace-Id: sampled=0;root=1-5a2fc217-07cc12d50a4d38a59a688f5c
X-Cache: Miss from cloudfront
x-amzn-RequestId: 64f24fcd-df32-11e7-a81a-2b511652b4f6
{
       "nba_player_urls": [
              "https://sports.yahoo.com/nba/players/4563/",
              "https://sports.yahoo.com/nba/players/5185/",
              "https://sports.yahoo.com/nba/players/3704/",
              "https://sports.yahoo.com/nba/players/5012/",
              "https://sports.yahoo.com/nba/players/4612/",
              "https://sports.yahoo.com/nba/players/5015/",
              "https://sports.yahoo.com/nba/players/4497/",
              "https://sports.yahoo.com/nba/players/4720/",
              "https://sports.yahoo.com/nba/players/3818/",
              "https://sports.yahoo.com/nba/players/5432/",
              "https://sports.yahoo.com/nba/players/5471/",
              "https://sports.yahoo.com/nba/players/4244/",
              "https://sports.yahoo.com/nba/players/5464/",
              "https://sports.yahoo.com/nba/players/5294/",
              "https://sports.yahoo.com/nba/players/5336/",
              "https://sports.yahoo.com/nba/players/4390/",
              "https://sports.yahoo.com/nba/players/4563/",
              "https://sports.yahoo.com/nba/players/3704/",
              "https://sports.yahoo.com/nba/players/5600/",
              "https://sports.yahoo.com/nba/players/4624/"
       ]
}

Another convenient way to work with Lambda functions is to directly call them using the click package and the Python Boto library.

We can create a new command line utility called wscli.py (short for web-scraping command-line interface - “command-line interface for web scraping”). In the first part of the code, we set up journaling and import libraries:

#!/usr/bin/env python
import logging
import json
import boto3
import click
from pythonjsonlogger import jsonlogger
#Инициализация журналирования
log = logging.getLogger(__name__)
log.setLevel(logging.INFO)
LOGHANDLER = logging.StreamHandler()
FORMMATTER = jsonlogger.JsonFormatter()
LOGHANDLER.setFormatter(FORMMATTER)
log.addHandler(LOGHANDLER)

The following three functions are used to connect to the Lambda function via invoke_lambda:

###Вызовы API Boto Lambda
def lambda_connection(region_name="us-east-1"):
      """Создаем подключение к Lambda"""
      lambda_conn = boto3.client("lambda", region_name=region_name)
      extra_msg = {"region_name": region_name, "aws_service": "lambda"}
      log.info("instantiate lambda client", extra=extra_msg)
      return lambda_conn
def parse_lambda_result(response):
      """Получаем результаты из ответа библиотеки Boto в формате JSON"""
            body = response['Payload']
      json_result = body.read()
      lambda_return_value = json.loads(json_result)
      return lambda_return_value
def invoke_lambda(func_name, lambda_conn, payload=None,
                             invocation_type="RequestResponse"):
      """Вызываем функцию Lambda"""
      extra_msg = {"function_name": func_name, "aws_service": "lambda",
                           "payload":payload}
      log.info("Calling lambda function", extra=extra_msg)
      if not payload:
           payload = json.dumps({"payload":"None"})
      response = lambda_conn.invoke(FunctionName=func_name,
                       InvocationType=invocation_type,
                       Payload=payload
      )
      log.info(response, extra=extra_msg)
      lambda_return_value = parse_lambda_result(response)
      return lambda_return_value

Wrap the invoke_lambda function using the Python package to create Click command line utilities. Notice that we set the default value for the --func option, which uses the Lambda function we deployed earlier:

@click.group()
@click.version_option("1.0")
def cli():
      """Вспомогательная утилита командной строки для веб-скрапинга"""
@cli.command("lambda")
@click.option("--func",
            default="scrape-yahoo-dev-return_player_urls",
            help="name of execution")
@click.option("--payload", default='{"cli":"invoke"}',
            help="name of payload")
def call_lambda(func, payload):
       """Вызывает функцию Lambda
       ./wscli.py lambda
       """
       click.echo(click.style("Lambda Function invoked from cli:",
             bg='blue', fg='white'))
       conn = lambda_connection()
       lambda_return_value = invoke_lambda(func_name=func,
               lambda_conn=conn,
               payload=payload)
       formatted_json = json.dumps(lambda_return_value,
               sort_keys=True, indent=4)
       click.echo(click.style(
            "Lambda Return Value Below:", bg='blue', fg='white'))
       click.echo(click.style(formatted_json,fg="red"))
if __name__ == "__main__":
     cli()

The output from this utility is similar to the HTTP interface call:

→ X ./wscli.py lambda \
--func=scrape-yahoo-dev-birthplace_from_urls\
--payload '{"url":["https://sports.yahoo.com/nba/players/4624/",\
"https://sports.yahoo.com/nba/players/5185/"]}'
Lambda Function invoked from cli:
{"message": "instantiate lambda client",
"region_name": "us-east-1", "aws_service": "lambda"}
{"message": "Calling lambda function",
"function_name": "scrape-yahoo-dev-birthplace_from_urls",
"aws_service": "lambda", "payload":
"{\"url\":[\"https://sports.yahoo.com/nba/players/4624/\",
\"https://sports.yahoo.com/nba/players/5185/\"]}"}
{"message": null, "ResponseMetadata":
{"RequestId": "a6049115-df59-11e7-935d-bb1de9c0649d",
"HTTPStatusCode": 200, "HTTPHeaders":
{"date": "Tue, 12 Dec 2017 16:29:43 GMT", "content-type":
"application/json", "content-length": "118", "connection":
"keep-alive", "x-amzn-requestid":
"a6049115-df59-11e7-935d-bb1de9c0649d",
"x-amzn-remapped-content-length": "0", "x-amz-executed-version":
"$LATEST", "x-amzn-trace-id":
"root=1-5a3003f2-2583679b2456022568ed0682;sampled=0"},
"RetryAttempts": 0}, "StatusCode": 200,
"ExecutedVersion": "$LATEST", "Payload":
"<botocore.response.StreamingBody object at 0x10ee37dd8>",
"function_name": "scrape-yahoo-dev-birthplace_from_urls",
"aws_service": "lambda", "payload":
"{\"url\":[\"https://sports.yahoo.com/nba/players/4624/\",
\"https://sports.yahoo.com/nba/players/5185/\"]}"}
Lambda Return Value Below:
{
        "https://sports.yahoo.com/nba/players/4624/": "Indianapolis",
        "https://sports.yahoo.com/nba/players/5185/": "Athens"
}

Completing the creation of the step function

The last step in creating a step-by-step function, as described in the AWS documentation (https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-activity-state-machine.html), is the creation using the web interface of the state machine structure in the notation of JavaScript objects (JavaScript Object Notation, JSON). The following code demonstrates this pipeline, starting from the original Lambda functions for scraping Yahoo !, saving data in the S3 file and finally sending the content to Slack:

{
      "Comment": "Fetch Player Urls",
      "StartAt": "FetchUrls",
      "States": {
         "FetchUrls": {
             "Type": "Task",
             "Resource": \
             "arn:aws:lambda:us-east-1:561744971673:\
             function:scrape-yahoo-dev-return_player_urls",
             "Next": "FetchBirthplaces"
         },
         "FetchBirthplaces": {
             "Type": "Task",
             "Resource": \
             "arn:aws:lambda:us-east-1:561744971673:\
             function:scrape-yahoo-dev-birthplace_from_urls",
             "Next": "WriteToS3"
         },
          "WriteToS3": {
             "Type": "Task",
             "Resource": "arn:aws:lambda:us-east-1:\
             561744971673:function:scrape-yahoo-dev-create_s3_file_from_json",
             "Next": "SendToSlack"
         },
         "SendToSlack": {
             "Type": "Task",
             "Resource": "arn:aws:lambda:us-east-1:561744971673:\
             function:send_message",
             "Next": "Finish"
         },
             "Finish": {
             "Type": "Pass",
             "Result": "Finished",
             "End": true
          }
     }
}

In fig. 7.2 the first part of this conveyor was shown. Extremely useful is the ability to see the intermediate results of the finite state machine. In addition, the ability to monitor in real time each part of the state machine is very convenient for debugging.

Figure 7.3 shows the complete pipeline with the addition of writing steps to an S3 file and sending content to Slack. It remains only to decide how to run this scraping utility - at a certain time interval or in response to some event.

Summary

In this chapter, you are introduced to many amazing concepts for building AI applications. It created the Slack bot and a web scraping utility, which were then connected using serverless services from AWS. You can add many more things to such an initial framework - for example, the Lambda-processing function of texts written in natural languages for reading web pages and obtaining their brief contents or a clustering algorithm without a teacher who would cluster new NBA players on arbitrary attributes.

»In more detail with the book it is possible to familiarize on a site of publishing house
» Table of contents
» Fragment

For Habrozhiteley a discount of 20% under the coupon - Gift

PS: 7% of the cost of the book will go to the translation of new computer books, a list of books submitted to the printing press here .

Tags:

books