Python machine learning with interactive Jupyter demos

Published on December 21, 2018

Python machine learning with interactive Jupyter demos


    Hello Readers!

    Recently, I launched the Homemade Machine Learning repository , which contains examples of popular algorithms and machine learning approaches, such as linear regression, logistic regression, the K-means method and the neural network (multilayer perceptron). Each algorithm contains interactive demo pages running in Jupyter NBViewer-e or Binder-e. Thus, everyone has the opportunity to change the training data, the training parameters and immediately see the result of the training, visualization and forecasting of the model in his browser without installing Jupyter locally.

    The purpose of this repository is to implement the algorithms. nearlyfrom scratch, in order to have a more detailed understanding of the mathematical models that stand behind each of the algorithms. The main libraries used are NumPy and Pandas . These libraries are used for efficient matrix operations, as well as for loading and parsing CSV data. Matplotlib and Plotly libraries are also used in demo pages for plotting graphs and visualizing training data . In the case of logistic regression, the SciPy library is used to minimize the loss function , but in other cases the gradient descent is implemented on pure NumPy / Python. Using libraries like PyTorch or TensorFlow is avoided due to the repository learning goal.

    At the moment, the following algorithms are implemented in the repository ...

    Regression. Linear regression.

    In regression related tasks, we try to predict the real number based on the incoming data. In essence, we build a line / plane / n-dimensional plane along the training data in order to be able to make a prediction for the input data that is missing in the training set. This happens, for example, if we want to predict the cost of a 2-room apartment, in the center of the city N, on the 7th floor.

    Classification. Logistic regression.

    In the tasks related to the classification, we divide the data into classes depending on the parameters of this data. An example of a classification task is spam recognition. Depending on the text of the letter (incoming data), we assign each letter to one of two classes ("spam" or "not spam").

    Clustering K-Mode method.

    In clustering tasks, we divide our data into clusters that are unknown to us in advance. These algorithms can be used for market segmentation, analysis of social and not only networks.

    Neural networks. Multilayer perceptron (MLP).

    Neural networks are rather not an algorithm, but a “pattern” or “framework” for organizing different machine learning algorithms into one system for further analysis of complex input data.

    Anomaly lookup using Gaussian distribution

    In the tasks related to the search for anomalies, we try to isolate those instances of data that look "suspicious" in comparison with most other instances. For example, the definition of atypical (suspicious) bank card transactions.

    I hope that you find the repository useful, either experimenting with demonstrations of each algorithm, either reading about the mathematical models behind them, or analyzing the details of the implementation of each of them.

    Successful coding!