# Overview of the most interesting materials on data analysis and machine learning No. 3 (an overview of online courses)

This issue of the review of the most interesting materials on data analysis and machine learning is completely devoted to online courses on the subject of Data Science. In the last issue , a list of online courses starting shortly was presented. In this issue, I tried to collect the most interesting online courses on data analysis. It is worth noting that some courses have already ended, but for most of these courses you can see the archive of all training materials.

The review begins with a set of courses from Johns Hopkins University at Coursera, which are combined in one specialization “Data Science Specialization”, so it makes sense to consider them separately from the rest of the courses. These are 9 official specialization courses and two additional Mathematical Biostatistics Boot Camp 1 and 2, which are not officially included in the specialization. It is important to note that the entire set of these courses regularly starts anew and, in general, you can flexibly build your own specialization promotion schedule. Most courses last 4 weeks. The R language is the main programming language in this set of courses. The following is a list of courses from the Data Science Specialization from Johns Hopkins University:

- The Data Scientist's Toolbox is a basic course in specialization and is dedicated to an overview of the various data analysis specialist tools. The amount of materials is small and the course can be completed in 3-4 hours.
- R Programming is a basic course in specialization and is devoted to the basics of working with the programming language R.
- Getting and Cleaning Data is also a basic course in specialization and is devoted to the very important topic of preparing and processing raw input data for further analysis.
- Exploratory Data Analysis - the course is devoted to research data analysis and data visualization using the R language and such popular visualization packages as lattice and ggplot2.
- Reproducible Research - This course talks about such an important topic in data analysis as Reproducible Research. We consider the knitlr package for the R language, as well as the markup language R Markdown.
- Statistical Inference - formally, the course is devoted to the topic of statistical inference, but in fact it is a course on the basics of statistics and probability theory. Filed all in a very crumpled and chaotic form. One of the most controversial courses in this specialization. I hope that in future versions the course will be seriously revised.
- Regression Models - The course focuses on the topic of regression analysis. The course also has questions on the subject of material development and the hope that the creators of the course will pay attention to students' comments and seriously revise the course in the future.
- Practical Machine Learning - This course focuses on the basics of machine learning.
- Developing Data Products - a course devoted to the development of modern products in the subject of data analysis. Popular frameworks such as Shiny and Slidify are considered.
- Mathematical Biostatistics Boot Camp 1 - the first part of the course on biostatistics from Johns Hopkins University, is an unofficial addition to the specialization of Data Science, well covers the basics of statistics and probability theory.
- Mathematical Biostatistics Boot Camp 2 - the second part of the course on biostatistics from Johns Hopkins University, is an unofficial addition to the specialization of Data Science, well covers the basics of statistics and probability theory.

Next, we’ll look at courses that will help improve the general skills needed for a data analyst:

- Intro to Hadoop and MapReduce (Udacity) - This course focuses on the basics of working with Hadoop and large datasets.
- Data Wrangling with MongoDB (Udacity) - this course will focus on working with data in such a popular NoSQL database as MongoDB.
- Programming Foundations with Python (Udacity) - The course focuses on the basics of the Python programming language, which is rapidly gaining popularity among data analysts.
- Introduction to Databases (Coursera - Stanford University) - the course talks about working with relational data sources, as well as working with other popular data storage formats (XML, JSON)

Now let's move on to courses on probability theory and statistics. Of course, knowledge of these disciplines will be useful to anyone who claims to be a data analysis specialist. In some cases, the division of courses into categories is quite arbitrary, since many courses cover various aspects related to data analysis. The following is a list of courses in this category:

- Probabilty and Statistics (Khan Academy) is an excellent set of basic things in statistics and probability theory from Khan Academy.
- Case-Based Introduction to Biostatistics (Coursera - Johns Hopkins University) - the course provides in an accessible form the basics of statistics and probability theory with examples from biostatistics.
- Probabilistic Graphical Models (Coursera - Stanford University) is a short course on probability theory.
- Statistics: Making Sense of Data (Coursera - University of Toronto) is another course on the basics of statistics.
- Data Analysis and Statistical Inference (Coursera - Duke University) is an excellent course in data analysis, which provides an overview of the basics of probability theory and statistics.
- Statistics One (Coursera - Princeton University) is a good course on the basics of statistics. The material is presented at an accessible level and does not require special knowledge from the listener to master the material.
- Statistics in Medicine (Stanford Online) - the basis of statistics based on examples from medicine.
- Statistics for Medical Professionals (CME) (Stanford Online) - the basis of statistics based on examples from medicine.
- Stat_2.1x - Introduction to Statistics: Descriptive Statistics (edX - BerkleyX) is the first part of a series of courses on statistics and probability theory. The first part is devoted to descriptive statistics.
- Stat_2.2x - Introduction to Statistics: Probability (edX - BerkleyX) - The second part of a series of courses on statistics and probability theory. The second part is devoted to the basics of probability theory.
- Stat_2.3x - Introduction to Statistics: Inference (edX - BerkleyX) - the third part of a series of courses on statistics and probability theory. The third part is devoted to the topic of statistical inference.
- 6.041x Introduction to Probability - The Science of Uncertainty (edX - MITx) - A course on probability theory from MIT.
- Explore Statistics with R (edX - KIx) is a new course on working with the statistical programming language R. The first session of this course begins on September 9, 2014.
- Intro to Statistics (Udacity) is another course on the basics of statistics.
- Statistics (Udacity) is a fairly simple course in probability theory and statistics.

The following is a list of courses that focus on various aspects of the topic of data analysis, such as machine learning, natural language processing, neural networks, recommendation systems, social network analysis, artificial intelligence and others:

- Data Analysis (Coursera - Johns Hopkins University) - an 8-week course in data analysis using the R language.
- Introduction to Data Science (Coursera - University of Washington) - The course lasts 8 weeks. One of the most popular online courses on the basics of Data Science.
- Machine Learning (Coursera - University of Washington) is a great machine learning course from the University of Washington that lasts 10 weeks.
- Machine Learning (Coursera - Stanford University) is one of the most famous Machine Learning courses taught by Stanford University professor Andrew Ng. The course lasts 10 weeks. The course is quite simple and clear, it does not require any special knowledge for its successful completion, while it covers quite a lot of areas of Machine Learning.
- Natural Language Processing (Coursera - Stanford University) is one of the most popular online natural language processing courses from Stanford University.
- Introduction to Recommender Systems (Coursera - University of Minnesota) - Introduction to Recommender Systems. This is not to say that the course has been carefully worked out, but there are not so many courses on this one, so it may be interesting to those involved in the topic of recommendation systems.
- Neural Networks for Machine Learning (Coursera - University of Toronto) - a course on the use of neural networks in machine learning.
- Natural Language Processing (Coursera - Columbia University) is another course on natural language processing.
- Social Network Analysis (Coursera - University of Michigan) - The course focuses on the popular topic of social network data analysis.
- Statistical Learning (Stanford Online) - A course on the basics of machine learning with supervisor (Supervised learning).
- SABR101x Sabremetrics: Introduction to Baseball Analytics (edX - BUx-Boston University) - This course explains many aspects of Data Science and Big Data based on an analysis of sports statistics (in this case, baseball).
- PH525x Data Analysis for Genomics (edX - HarvardX) is a fairly simple course on data analysis.
- 15.071x The Analytics Edge (edX - MITx) - A course with excellent material on data analysis and machine learning.
- Learning From Data (edX - CaltechX) is one of the best machine learning courses. Accessible to many topics of machine learning.
- CS188.1x Artificial Intelligence (edX - BerkleyX) is probably one of the most interesting online courses on the subject of artificial intelligence. The course uses the Python programming language.
- Intro to Data Science (Udacity) - An introduction to Data Science by Udacity.
- Machine Learning 1 — Supervised Learning (Udacity) is the first part of a series of machine learning courses from Udacity. The first part is devoted to the topic of supervised learning.
- Machine Learning 2 — Unsupervised Learning (Udacity) is the second part of a series of machine learning courses from Udacity. The second part is devoted to the topic of learning without a teacher (Unsupervised learning).
- Machine Learning 3 — Reinforcement Learning (Udacity) is the third part of a series of machine learning courses from Udacity. The third part is devoted to the popular machine learning technique Reinforcement Learning.
- Exploratory Data Analysis (Udacity) - a course on data visualization using the R language.
- Artificial Intelligence for Robotics (Udacity) - an introduction to the topic of artificial intelligence programming using the example of an unmanned vehicle.
- Intro to Artificial Intelligence (Udacity) - A course on the basics of artificial intelligence.
- CS109 Data Science (Harvard) - video lecture course on the basics of Data Science from Harvard Extension School

Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 2 (June 16 - 23, 2014)