A selection of working examples of data processing
Hello reader.
In the footsteps of my first post of a selection of datasets for machine learning - I will make a selection of relatively recent datasets with working examples of data processing. It is no secret to anyone that learning with good examples is more effective and faster. Let's see what is interesting to us will be able to show some of the best examples of data processing.
The scheme of working with the current post will be inherited from my post about the best notebooks on ML and DS , namely, saved to bookmarks → transferred to a colleague.
+ bonus at the end of the article - a cool course from FPMI MIPT.

So let's get started.
Suicide Rates Overview 1985 to 2016 - Comparison of socio-economic information with suicide rates by year and country.
Processing Examples:
Spotify's Worldwide Daily Song Ranking is a daily ranking of the 200 most listened songs in 53 countries from 2017 and 2018 by Spotify users.
Processing Example:
Crimes in Boston - records from the Boston crime incident reporting system, which includes incidents and information about when and where it happened.
Processing Example:
Google Play Store Apps - categories, ratings, size of all Google Play applications.
Processing Example:
Pokémon for Data Mining and Machine Learning - statistics and features of Pokemon;
Processing Example:
A Million News Headlines - These are news headlines published over the past 15 years.
Processing Example:
Airplane Crashes Since 1908 - A complete history of air crashes around the world, from 1908 to the present.
Processing Example:
News Headlines Dataset For Sarcasm Detection is a high-quality dataset for sarcasm detection.
Processing Example:
Historical Air Quality - Air quality data collected on outdoor monitors throughout the United States.
Processing Example:
Nutrition Facts for McDonald's Menu - Nutrition Facts for each menu item in McDonald's USA.
Processing Example:
LEGO Database - details / sets / colors and stocks of each official LEGO set in the Rebrickable database.
Processing Example:
Global Commodity Trade Statistics - import and export volumes for 5,000 products in most countries of the world over the past 30 years.
Processing Example:
Crime in India - complete information on various aspects of crimes committed in India since 2001.
Processing Example:
Predicting a Pulsar Star - pulsar data collected during a survey of the universe.
Processing Examples:
French employment, salaries, population per town - data showing equality and inequality in France.
Processing Example:
United States Census - US Census.
Processing Example:
California Housing Prices - the price of housing in California.
Processing Example:
US Unemployment Rate by County, 1990-2016 - US Department of Labor unemployment data.
Processing Example:
World of Warcraft Avatar History is a set of records that detail information about the player’s characters in the game over time.
Processing Example:
The Gravitational Waves Discovery Data - data on events of gravitational waves GW150914.
Processing Example:
And the bonus today is a great Deep Learning course designed for high school students interested in programming and mathematics, as well as students who want to start deep learning.
The purpose of the course is to introduce the basic principles of deep learning (neural networks) in an interactive format and on the example of practical tasks.
You can also take a look at the Youtube channel of Deep Learning School. There are a lot of great videos;)
On this, our short selection of data processing examples came to an end. I hope you learned something new for yourself. As is customary on Habré, I liked the post - put a plus. Do not forget to share with colleagues. Also, if you have something that you can share yourself - write in the comments. More information about machine learning and Data Science on Habré and in the telegram channel Neuron (@neurondata).
All knowledge!
In the footsteps of my first post of a selection of datasets for machine learning - I will make a selection of relatively recent datasets with working examples of data processing. It is no secret to anyone that learning with good examples is more effective and faster. Let's see what is interesting to us will be able to show some of the best examples of data processing.
The scheme of working with the current post will be inherited from my post about the best notebooks on ML and DS , namely, saved to bookmarks → transferred to a colleague.
+ bonus at the end of the article - a cool course from FPMI MIPT.

So let's get started.
A selection of datasets with working examples of data processing:
Suicide Rates Overview 1985 to 2016 - Comparison of socio-economic information with suicide rates by year and country.
Processing Examples:
- Suicide data - Full interactive dashboard;
- Mental Health, Happiness, Economics, Human Freedom;
- Data Visualization of Suicide Rates
Spotify's Worldwide Daily Song Ranking is a daily ranking of the 200 most listened songs in 53 countries from 2017 and 2018 by Spotify users.
Processing Example:
- Top Songs on Spotify: What makes them popular ?;
- Spotify Chart Trend + Seasonal ARIMA;
- Trends in Spotify's Worldwide Daily Songs 17-18.
Crimes in Boston - records from the Boston crime incident reporting system, which includes incidents and information about when and where it happened.
Processing Example:
Google Play Store Apps - categories, ratings, size of all Google Play applications.
Processing Example:
- All that you need to know about the Android market;
- How to get “High” Rating on Play Store;
- Google Play Store EDA.
Pokémon for Data Mining and Machine Learning - statistics and features of Pokemon;
Processing Example:
A Million News Headlines - These are news headlines published over the past 15 years.
Processing Example:
- What is with News headlines;
- Meaningful Random Headlines by Markov Chain;
- Topic Modeling with LSA and LDA.
Airplane Crashes Since 1908 - A complete history of air crashes around the world, from 1908 to the present.
Processing Example:
News Headlines Dataset For Sarcasm Detection is a high-quality dataset for sarcasm detection.
Processing Example:
- Detecting Sarcasm Using Different Embeddings;
- Sarcasm with Keras;
- Beginners's guide to NLP using spaCy
Historical Air Quality - Air quality data collected on outdoor monitors throughout the United States.
Processing Example:
Nutrition Facts for McDonald's Menu - Nutrition Facts for each menu item in McDonald's USA.
Processing Example:
LEGO Database - details / sets / colors and stocks of each official LEGO set in the Rebrickable database.
Processing Example:
- LEGO- Let's play;
- Finding Lego color themes with topic models;
- Have LEGO sets been getting bigger over time?
Global Commodity Trade Statistics - import and export volumes for 5,000 products in most countries of the world over the past 30 years.
Processing Example:
Crime in India - complete information on various aspects of crimes committed in India since 2001.
Processing Example:
Predicting a Pulsar Star - pulsar data collected during a survey of the universe.
Processing Examples:
French employment, salaries, population per town - data showing equality and inequality in France.
Processing Example:
- Using Regression to Predicting Earnings in France;
- Interactive Map & Graph - job & salary inequality;
- How big is French Industry?
United States Census - US Census.
Processing Example:
California Housing Prices - the price of housing in California.
Processing Example:
- Introduction to machine learning in R (tutorial);
- Gradient Boosting and Parameter Tuning in R;
- Geospatial Feature Engineering and Visualization
US Unemployment Rate by County, 1990-2016 - US Department of Labor unemployment data.
Processing Example:
- Maps are beautiful, Unemployment is not;
- Analysis of world crime;
- Time Series Analysis on US Unemployment Rate.
World of Warcraft Avatar History is a set of records that detail information about the player’s characters in the game over time.
Processing Example:
The Gravitational Waves Discovery Data - data on events of gravitational waves GW150914.
Processing Example:
Bonus!
And the bonus today is a great Deep Learning course designed for high school students interested in programming and mathematics, as well as students who want to start deep learning.
The purpose of the course is to introduce the basic principles of deep learning (neural networks) in an interactive format and on the example of practical tasks.
Course program
- Python: basics, Google Colab;
- Introduction to linear algebra. Vectors. Matrices and operations with them. NumPy Library
- Pandas and MatPlotlib Libraries. The basics of machine learning;
- Elements of the theory of optimization. Gradient. Gradient descent. Linear models;
- Introduction to deep learning. Perceptron. A neuron with a sigmoid (and other activation functions). OOP basics in Python;
- PyTorch library. Multilayer neural networks;
- Training neural networks in practice. Cifar10, notMNIST;
- Convolutional neural networks. Convolutional layer. Pooling layer;
- The practice of training neural networks. Classification of road signs;
- Transfer Dearning. Popular in Computer Vision Architecture;
- Segmentation of pictures. U-Net;
- Participation in competitions at Kaggle;
- Object Detection YOLOv3;
- Classic GAN. Neural style transfer;
- Basic text processing methods;
- Word Embeddings;
- Recurrent neural networks;
- LSTM, GRU cells;
- Language models;
- Machine translate;
- Text2Speech;
- SuperResolution.
You can also take a look at the Youtube channel of Deep Learning School. There are a lot of great videos;)
On this, our short selection of data processing examples came to an end. I hope you learned something new for yourself. As is customary on Habré, I liked the post - put a plus. Do not forget to share with colleagues. Also, if you have something that you can share yourself - write in the comments. More information about machine learning and Data Science on Habré and in the telegram channel Neuron (@neurondata).
All knowledge!