
Overview of the most interesting materials on data analysis and machine learning No. 16 (September 29 - October 5, 2014)

I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning.
General
Using the Data-Driven Approach in Machine Learning
Another interesting article from the MachineLearningMastery blog, in this case we will talk about what are the possibilities to improve the efficiency of machine learning algorithms.Introduction to machine learning for developers
A good introduction to the topic of machine learning for developers, which mentions many aspects that are necessary for working with machine learning algorithms.Top 30 Data Science
Blogs Ranking of the best Data Science blogs by DataScienceCentral.Improving Machine Learning Skills
Several helpful tips from the author of the MachineLearningMastery blog that can help improve machine learning skills.How to successfully pass an interview for a position in the field of Data Science
An interesting and useful article that will help you prepare for an interview for a position in the field of Data Science.Vowpal Wabbit Modules in Azure ML
Continue with the Micorosoft Technet Machine Learning blog post about Vowpal Wabbit in Microsoft's Azure ML cloud learning machine service.22 Skills Required by Data Scientist
An interesting article by Vincent Granville on the popular DataScienceCentral portal about the skills that a data analyst needs in his area of expertise.First week of the Stanford's Machine Learning course.
The author shares his impressions of the first week of the popular machine learning course from Andrew Ng and Stanford University, the next session of which started not so long ago at Coursera.
Theory and algorithms of machine learning, code examples
Naive Bayes and Text Classification (Part 1)
On the computational complexity of MapReduce A
good article on the theoretical foundations of the MapReduce software model.Introduction to neural networks
A rather lengthy article from the blog of Andrej Karpathy (CS PhD student at Stanford), in which the author talks about machine learning and neural networks, gives code examples and says that this article will be supplemented with new materials over time.Using machine learning and NodeJS to determine the sex of Instagram users
A good example of a predictive model based on neural networks to determine the sex of Intstagram users based on various input parameters, as well as using NodeJS.Introduction to the support vector method
Useful article from the blog of Analytics Vidhya, in which the operation of the support vector method (Support Vector Machines) is described in a rather simple language.Evaluation of the effectiveness of a binary classification system
A brief introduction to the evaluation of the effectiveness of binary classification systems.miniCRAN: your own library repository
An article that briefly talks about the miniCRAN library for the R programming language, which allows you to create your own library repository.Running RStudio in the cloud
An article on how to quickly and easily launch RStudio in a browser using a cloud solution and Docker.Outputting several variables on a line diagram in ggplot2
A small practical example of outputting several variables on a linear diagram using the programming language R and the ggplot2 library.
Machine Learning Competitions
Interview with Diogo Ferreira
Useful interview on the MachineLearningMastery blog with a successful participant in machine learning competitions Diogo Ferreira.A simple model for Kaggle “Bike Sharing Demand”
Description of a fairly simple model for the machine learning competition “Bike Sharing Demand” on Kaggle with examples in the programming language R.
Online courses, training materials and literature
The Mining Massive Datasets
online course started On September 29, 2014, an online course started on Coursera that attracted so much attention. This is a course from Stanford University called Mining Massive Datasets.The Field Guide to Data Science
book A brief description and free version of a curious book called The Field Guide to Data Science on the basics of Data Science.Announcement of the book “Practical Data Science Cookbook”
A small article-announcement of a rather curious book “Practical Data Science Cookbook”.Reading List (October)
A list of books from the Dave Gilles blog (Professor of Economics at the University of Victoria) that the professor believes may be interesting to read.Book “Getting Started with Impala”
An announcement of the curious book “Getting Started with Impala” on Cloudera’s blog.
Videos
Martin Maechler on practicing good R code
Martin Maechler (a member of the R-Core team) made an interesting presentation at the useR conference! 2014. In this video, he will talk about the practice of good code both in the programming language R, and in general about the best techniques and practices in programming.Materials from the meeting “PostgreSQL 9.4 news and something else”
Not so long ago, an interesting meeting took place in the office of Yandex and was dedicated to the PostgreSQL DBMS. And then there were videos from this meeting.Nando de Freitas on decision trees
An excellent lecture from Professor Nando de Freitas from The University of British Columbia on decision trees.Jürgen Schmidhuber about Deep Learning
An interesting video in which Professor Jürgen Schmidhuber of IDSIA (International Computer Science Institute) talks about the history of Deep Learning and the renewed interest in this machine learning method at present.
Data engineering
Using Pinot for real-time analytics
An interesting article on LinkedIn's blog about the architecture of their real-time analytics solutions using a proprietary product called Pinot.NoSQL storage performance test results A fresh and
interesting comparison of the performance of various NoSQL storage (Apache Cassandra, MongoDB, CouchBase) under various load profiles.Scalable Decision Trees in Apache Spark
Continuing discussions on the new version of Apache Spark 1.1, in this case we will talk about decision trees and the possibilities of their scaling in the machine learning library MLlib.ForestDB Beta
Announcement Announcement of the new open-source ForestDB key-value store from the creators of CouchBase.What is Apache Storm
An article that briefly describes Apache Strorm.
Reviews
DataScienceCentral Weekly Digest
Regular weekly data analysis digest from DataScienceCentral.Top Niut Blanche Content (September)
The best September content from the popular Nuit Blanche blog.Hadoop Weekly Weekly Review # 89 (September 28)
Weekly Hadoop ecosystem news and content.Hadoop Weekly Weekly Review # 88 (September 21)
Weekly Hadoop ecosystem news and content.- The most interesting materials from Freakonometrics No. 170
A collection of the most interesting materials from the popular Freakonometrics portal. The most interesting materials from Freakonometrics No. 169
A collection of the most interesting materials from the popular Freakonometrics portal.The most interesting materials from Freakonometrics No. 168
A collection of the most interesting materials from the popular Freakonometrics portal.The most interesting materials on High Scalability
An overview of the most interesting materials on HighScalability from the popular portal High Scalability.
Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 15 (September 22 - 28, 2014)