Selection: More than 70 sources for machine learning for beginners

    Indicator of a cam analog computer / Wiki

    In our blog we already talked about the development of a quantum communication system and how advanced programmers are prepared from ordinary students . Today we decided to return to the topic of machine learning and bring an adapted ( source ) selection of useful materials.

    This list is intended for those who are just starting to learn the topic of machine learning, for example, using Python (if you want to start learning Python, this article will help you ).

    Machine learning is just one of the mathematical disciplines associated with the concept of “data”. To understand what data analytics, data analysis, data science, machine learning, and big data are, read this material.

    Here are the tools you'll need:

    You can install Python 3 and all the necessary packages in a few clicks using the Anaconda Python build . Anaconda is a fairly popular distribution among machine learning people.

    It's okay if you have Python 2.7 installed. Upgrading to Python 3 is not necessary. Instead of Anaconda, you can use pip or virtualenv. Can't decide? Read this stuff.

    To get started, check out the IPython Notebook (it takes 5-10 minutes). You can still watch this video . Next, consider a small example (it will take 10 minutes) to classify numbers using the scikit-learn library .

    A visual introduction to machine learning theory

    Let's learn more about machine learning: ideas and features. Read the article by Stephanie Yee (Stephanie Yee) and Tony Chu (Tony Chu) «A visual introduction to machine learning. Part 1".

    Read the article by Professor Pedro Domingos. Do not rush while reading, take notes. Two main points can be distinguished in the article:

    Data alone is not enough.Domingos wrote: “... there is nothing surprising in the fact that knowledge is needed for training. Machine learning cannot get something out of nothing, but it can get more out of less. Learning is like agriculture, where nature does most of the work. Farmers give seeds nutrients to grow crops. So it is here: to create a program, you need to combine knowledge and data. ”

    A large amount of data is better than a well-thought out algorithm. Do not try to reinvent the wheel and complicate decisions: choose the shortest path leading to the goal. Domingos says: “Typically, a“ stupid ”algorithm with lots of data is superior to a“ smart ”algorithm with few data. In machine learning, data always plays a major role. ”

    So, knowledge and data are crucial. This means that you need to complicate the algorithms only when you really have no choice.

    The diagram is based on a slide from a lecture by Alex Pinto, "Mathematics on the guard of security: a monitoring guide using machine learning."

    Learn from examples

    Choose and review one or two of the examples below.

    • Face recognition in photos from the Labeled Faces in the Wild website database.
    • Machine learning based on data from the Titanic disaster. It demonstrates data conversion and analysis methods, as well as visualization techniques. There are examples of machine learning methods with a teacher.
    • Election Prediction : Using the Nate Silver Model to Predict U.S. Presidential Election Results for 2012 Published by The New York Times.

    Here are more guides and reviews:

    Other sources where you can find IPython notepads:

    • IPython Interesting Notebook Gallery : Statistics, Machine Learning, and Data Science.
    • Large gallery of Fabian Pedregosa.

    Machine Learning Courses

    It will be useful if you start working on some small independent project - so you will have the opportunity to put the acquired knowledge into practice. You can use one of these data sets.

    The book The Elements of Statistical Learning is often recommended , but it usually acts as a reference. The book is free, so download it or bookmark it.

    Still there are these online courses:

    • The course "Machine Learning" Professor Pedro Domingos University of Washington.
    • Data Science Workshop .
    • Data Science .
    • Video “Introduction to machine learning with scikit-learn” by Kevin Markham. After watching the video, you can take an interactive course on data science (there are earlier versions: 7 , 5 , 4 , 3 ).
    • Harvard Course CS109 - Data Science.
    • Advanced Course statistical computing (course BIOS8366 Vanderbilt University).

    Feedback on courses and various discussions:

    • Check out Jack Golding's answer to Quora. There you will find a link to the specialization «Data Science» on Coursera - if you do not need a certificate, you can go through all 9 free courses.
    • Another discussion on Quora: how to become a data processing and analysis specialist?
    • A large list of data science resources from Data Science Weekly, as well as a list of open online courses.

    Learning Pandas

    To work with Python, you need to become familiar with the Pandas package. Here is a list of materials that will help in this:

    • Primary : getting to know Pandas,
    • Guide : a few things in Pandas that I would like to know before (IPython notepads),
    • Useful Pandas Code Snippets ,

    You should also pay attention to these resources:

    More materials and articles

    • Accessible book by John Foreman, Data Smart,
    • IPython Notebook Data Science Course ,
    • Article : The main challenges of the data science section (read Joseph McCarthy’s article and commentary)
    • IPython : Key Data Skills.

    Questions, Answers, Chats

    At the moment, the best place to find answers to your questions is the machine learning section on There is also a subreddit: / r / machinelearning . Join the channel on scikit-learn on Gitter! Still worth paying attention to discussions on Quora and a large list of materials on data science from the site Data Science Weekly.

    Other things to know

    • Data Science : An article by John Foreman, Data Processing and Analysis Specialist at MailChimp.
    • Article : eleven factors leading to retraining, and how to avoid them.
    • Decent article : “Machine Learning: The High-Interest Credit Card of Technical Debt”: Machine Learning: The Overhead of Technical Debt. The purpose of this article: to identify specific risk factors in machine learning and create patterns with which to avoid them.
    • John Foreman : The Dangerous World of Machine Learning.
    • Kdnuggets : “The Costs of Machine Learning Systems."

    You need practice. A user with the nickname Olympus at Hacker News noted that this requires participation in contests and competitions. Kaggle  and ChaLearn are research platforms where you can try your hand by participating in various competitions. Here you will find sample code for the Kaggle contest. Another option: HackerRank .

    Listen and read what Kaggle winners say about their proposed solutions. For example, check out the No Free Hunch blog .

    Contests or contests are just one way to practice. You can start researching:

    1. Start with a question. “The most important thing in data science is the question,” says Dr. Jeff T. Leek. Start with a question, then find real data and analyze it.
    2. Announce the results and seek expert judgment.
    3. Fix the problems found. Share your discoveries.

    You can learn more about the scientific method here and here .

    Here are a couple more machine learning guides:

    Also popular now: