The book "Machine learning and TensorFlow"

    imageAcquaintance with machine learning and the TensorFlow library is similar to the first lessons in a driving school, when you are suffering with parallel parking, trying to change gear at the right time and not confuse the mirrors, feverishly recalling the sequence of actions while your leg nervously twitches on the gas pedals. This is a difficult but necessary exercise. The same is true of machine learning: before using modern face recognition systems or forecasting algorithms in the stock market, you will have to deal with the appropriate tools and instruction sets in order to create your own systems without any problems.

    Newcomers to machine learning will appreciate the applied focus of this book, because its goal is to introduce the basics so that they can quickly get down to solving real problems. From a review of the concepts of machine learning and the principles of working with TensorFlow, you will go to the basic algorithms, study neural networks and be able to independently solve the problems of classification, clustering, regression and prediction.

    Excerpt Convolutional neural networks

    Shopping in stores after a grueling day is a very burdensome task. My eyes are attacked by too much information. Sales, coupons, a variety of colors, small children, twinkling lights and people-filled aisles are just a few examples of all the signals that are sent to the visual cortex, regardless of whether I want or don’t want to pay attention to it. The visual system absorbs an abundance of information.

    Surely you know the phrase "better to see once than hear a hundred times." This may be true for you and for me (that is, for people), but can the machine find meaning in the images? Our visual photoreceptors select the wavelengths of light, but this information does not seem to extend to our consciousness. In the end, I can not say exactly what the length of the light waves are watching. Similarly, the camera receives image pixels. But we want to instead receive something of a higher level, such as the names or positions of objects. How do we get information from the pixels perceived at the human level?

    To get some sense from the source data, you need to design a model of a neural network. In the previous chapters, several types of neural network models were introduced, such as fully connected models (Chapter 8) and auto-encoders (Chapter 7). In this chapter, we will introduce another type of model called the convolutional neural network (CNN). This model works great with images and other sensory data, such as sound. For example, a CNN model can reliably classify which object is displayed in a picture.

    The CNN model, which will be discussed in this chapter, will be trained to classify images into one of 10 possible categories. In this case, “the picture is better than just one word,” since we have only 10 possible options. This is a tiny step towards perception at the human level, but we have to start with something, right?

    9.1. Disadvantages of neural networks

    Machine learning is an eternal struggle for the development of a model that would be sufficiently expressive for presenting data, but at the same time it was not so universal as to reach retraining and memorizing patterns. Neural networks are offered as a way to increase expressiveness; although, as you might guess, they suffer greatly from retraining traps.

    NOTE Overtraining occurs when a trained model is exceptionally accurate on a training data set and bad on a test data set. This model is probably overly universal for the small amount of data available, and in the end it just remembers the training data.

    To compare the versatility of the two machine learning models, you can use a fast and coarse heuristic algorithm to calculate the number of parameters that need to be determined as a result of the training. As shown in fig. 9.1, a fully connected neural network that takes a 256 × 256 image and maps it onto a layer of 10 neurons, will have 256 × 256 × 10 = 655 360 parameters! Compare it with a model containing only five parameters. It can be assumed that a fully connected neural network can represent more complex data than a model with five parameters.


    The following section discusses convolutional neural networks, which are a reasonable way to reduce the number of parameters. Instead of engaging in fully connected networks, CNN reuses the same parameters repeatedly.

    9.2. Convolutional neural networks

    The main idea underlying convolutional neural networks is that local understanding of the image is sufficient. The practical advantage of convolutional neural networks is such that, having several parameters, it is possible to significantly reduce the time for training, as well as the amount of data necessary for training the model.

    Instead of fully connected networks with weights from each pixel, CNN has a sufficient number of weights needed to view a small portion of the image. It is like reading a book with a magnifying glass: in the end, you read the entire page, but at any given time only look at a small piece of it.

    Imagine a 256 × 256 image. Instead of using the TensorFlow code that processes the entire image at once, you can scan an image fragment by fragment, say a 5 × 5 window. A 5 × 5 window slides through the image (usually from left to right and from top to bottom) as shown in fig. 9.2. How “fast” it slides is called stride length. For example, step 2 length means that a 5 × 5 sliding window moves 2 pixels at a time until the entire image has passed. In TensorFlow, as will be shown shortly, you can adjust the step length and window size using the built-in function library.


    This 5 × 5 window has an associated 5 × 5 weight matrix.

    DEFINITION A convolution is a weighted summation of the image pixel intensity values ​​as the window passes through the entire image. It turns out that this process of convolving an image with a weights matrix creates another image (of the same size, which depends on convolution). Coagulation is the process of applying convolution.

    All manipulations of the sliding window occur in the convolutional layer of the neural network. A typical convolutional neural network has several convolutional layers. Each convolutional layer usually creates many additional convolutions, therefore the matrix of weight coefficients is a 5 × 5 × n tensor, where n is the number of convolutions.

    As an example, let the image pass through a convolutional layer with a matrix of weight coefficients 5 × 5 × 64 in size. This creates 64 convolutions by sliding a window of 5 × 5. Therefore, the corresponding model has 5 × 5 × 64 = 1600 parameters, which is significantly less than the number of parameters of a fully connected network : 256 × 256 = 65,536.

    The attractiveness of convolutional neural networks (CNN) is that the number of parameters used by the model does not depend on the size of the original image. You can perform the same convolutional neural network on images of 300 × 300, and the number of parameters in the convolutional layer will not change!

    9.3. Image preparation

    Before using the CNN model with TensorFlow, let's prepare several images. The listings in this section will help you set up a training dataset for the remainder of the chapter.

    First of all, download the CIFAR-10 dataset from python.tar.gz. This set contains 60,000 images evenly distributed in 10 categories, which represents a sufficiently large resource for classification tasks. Then the image file should be placed in the working directory. In fig. 9.3 shows examples of images from this dataset.

    We already used the CIFAR-10 dataset in the previous chapter on auto-encoders, and now we’ll look at this code again. The following listing is taken directly from the CIFAR-10 documentation located on the . Place the code in the file.


    Listing 9.1. Loading images from a CIFAR-10 file in Python

    import pickle
    def unpickle(file):
          fo = open(file, 'rb')
          dict = pickle.load(fo, encoding='latin1')
          return dict

    Neural networks are prone to retraining, so it is important to do everything possible to minimize this error. To do this, do not forget to perform data cleaning before processing them.

    Data cleansing is the main process of machine learning pipelines. The code in listing 9.2 for cleaning a set of images uses the following three steps:

    1. If you have an image in color, try converting it to shades of gray to reduce the size of the input data and, therefore, reduce the number of parameters.

    2. Consider cropping the image in the center, because the edges of the image do not provide any useful information.

    3. Normalize the input data by subtracting the average and dividing by the standard deviation of each data sample so that the gradients during back propagation do not change too dramatically.

    The following listing shows how to clear a dataset using these methods.


    Save all images from the CIFAR-10 dataset and run the clear function. The following listing defines a convenient method for reading, cleaning, and structuring data for use in TensorFlow. There also should include the code from the file

    Listing 9.3. Pre-processing of all CIFAR-10 files

    def read_data(directory):
          names = unpickle('{}/batches.meta'.format(directory))['label_names']
          print('names', names)
          data, labels = [], []
          for i in range(1, 6):
               filename = '{}/data_batch_{}'.format(directory, i)
               batch_data = unpickle(filename)
               if len(data) > 0:
                   data = np.vstack((data, batch_data['data']))
                   labels = np.hstack((labels, batch_data['labels']))
                   data = batch_data['data']
                   labels = batch_data['labels']
          print(np.shape(data), np.shape(labels))
          data = clean(data)
          data = data.astype(np.float32)
          return names, data, labels

    In the file, you can use the method by importing cifar_tools for this. Listings 9.4 and 9.5 show how to select multiple images from a dataset and visualize them.

    Listing 9.4. Using the helper function cifar_tools

    import cifar_tools
    names, data, labels = \

    You can randomly select multiple images and draw them according to the label. The following listing does exactly that, so you can better understand the type of data you will be dealing with.


    By running this code, you will create a file cifar_examples.png, which will look like pic. 9.3.

    »More information about the book can be found on the website of the publisher
    » Table of contents
    » Fragment

    For Habrozhiteley 20% discount on the coupon - Machine Learning

    Also popular now: