Python machine learning with interactive Jupyter demos

Hello Readers!

Recently, I launched the Homemade Machine Learning repository , which contains examples of popular algorithms and machine learning approaches, such as linear regression, logistic regression, the K-means method and the neural network (multilayer perceptron). Each algorithm contains interactive demo pages running in Jupyter NBViewer-e or Binder-e. Thus, everyone has the opportunity to change the training data, the training parameters and immediately see the result of the training, visualization and forecasting of the model in his browser without installing Jupyter locally.

The purpose of this repository is to implement the algorithms. ~~nearly~~from scratch, in order to have a more detailed understanding of the mathematical models that stand behind each of the algorithms. The main libraries used are NumPy and Pandas . These libraries are used for efficient matrix operations, as well as for loading and parsing CSV data. Matplotlib and Plotly libraries are also used in demo pages for plotting graphs and visualizing training data . In the case of logistic regression, the SciPy library is used to minimize the loss function , but in other cases the gradient descent is implemented on pure NumPy / Python. Using libraries like PyTorch or TensorFlow is avoided due to the repository learning goal.

At the moment, the following algorithms are implemented in the repository ...

Regression. Linear regression.

In regression related tasks, we try to predict the real number based on the incoming data. In essence, we build a line / plane / n-dimensional plane along the training data in order to be able to make a prediction for the input data that is missing in the training set. This happens, for example, if we want to predict the cost of a 2-room apartment, in the center of the city N, on the 7th floor.

∑ Mathematical model - theory and links for further reading.
✎ Example of implementation in Python .
➤ Demonstration of linear regression with one parameter - prediction of the "happiness level" in ptn depending on GDP.
➤ Demonstration of linear regression with several parameters - forecasting the "level of happiness" in the page, depending on GDP and freedom index.
➤ Non-linear regression demonstration is an example of polynomial / sinusoidal expansion of input parameters for predicting non-linear dependencies.

Classification. Logistic regression.

In the tasks related to the classification, we divide the data into classes depending on the parameters of this data. An example of a classification task is spam recognition. Depending on the text of the letter (incoming data), we assign each letter to one of two classes ("spam" or "not spam").

∑ Mathematical model - theory and links for further reading.
✎ Example implementation in Python
➤ Logistic regression demonstration with linear boundaries - classification of flowers according to the width and length of their petals.
➤ Demonstration of logistic regression with non-linear boundaries - microchip classification (good / bad) according to two parameters.
➤ Logistic regression demonstration with many parameters - handwriting recognition.

Clustering K-Mode method.

In clustering tasks, we divide our data into clusters that are unknown to us in advance. These algorithms can be used for market segmentation, analysis of social and not only networks.

∑ Mathematical model - theory and links for further reading.
✎ Example of implementation in Python .
➤ Demonstration of the K-means method — clustering flowers into groups depending on the length and width of their petals.

Neural networks. Multilayer perceptron (MLP).

Neural networks are rather not an algorithm, but a “pattern” or “framework” for organizing different machine learning algorithms into one system for further analysis of complex input data.

∑ Mathematical model - theory and links for further reading.
✎ Example of implementation in Python .
➤ Demonstration of multilayer perceptron - recognition of handwritten numbers.

Anomaly lookup using Gaussian distribution

In the tasks related to the search for anomalies, we try to isolate those instances of data that look "suspicious" in comparison with most other instances. For example, the definition of atypical (suspicious) bank card transactions.

∑ Mathematical model - theory and links for further reading.

I hope that you find the repository useful, either experimenting with demonstrations of each algorithm, either reading about the mathematical models behind them, or analyzing the details of the implementation of each of them.

Successful coding!

Tags: