
Overview of the most interesting materials on data analysis and machine learning No. 21 (November 3 - 9, 2014)

I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning.
General
Machine Learning as a Service - Free and in the Cloud
Microsoft Azure Big Data
How we did analytics for a highly loaded site
DeepMind creates a computer that repeats human short-term memory
3 questions that need to be answered when choosing a Data Science program A
useful article that contains 3 important questions with comments to them that you should answer when choosing a training program on Data Science topics.22 Data Science Tips
In this article, you can find 22 Data Science tips from Vincent Granville, a renowned data analyst and creator of the Data Science Central portal.Flexibility of the data model
A little thought about such an important property of the data model as flexibility.Open problems on the topic of working with data on Facebook
An article from the Facebook blog of the company talks about various unresolved problems and questions of the company in the topic of working with data.10 recommendations for implementing Big Data principles
10 useful recommendations from the popular Big Data Analytics News portal.Launching R in the Azure ML Cloud
A short article that talks about the possibility of running R in the Azure ML cloud from Microsoft.
Theory and algorithms of machine learning, code examples
How to take control of a huge list of machine learning algorithms
The author of the popular blog MachineLearningMastery gives some tips to help you understand a large number of different machine learning algorithms.Hello World machine learning
Another great article from the author of the blog MachineLearningMastery, which will be interesting for beginners and will help to understand the huge number of algorithms that are in machine learning.Clustering and a distributed computing model
A story about various clustering methods and the possibility of using a distributed computing model when using these clustering algorithms.Outlier Detection - Using Machine Learning to Detect Anomalies in Time Series Analysis
An article from the Microsoft Technet Machine Learning blog post about finding anomalies in analyzing time series using machine learning and Azure ML.Analysis of R code coverage by unit tests
A very interesting article is devoted to analysis of the level of code coverage by unit tests in the R programming language using the testCoverage library.Tweet text tone analysis using the ALYIEN Text Analysis API
Another interesting article about text analysis is, in this case, interesting material about tweet text analysis.Introduction to neural networks
Another article that will tell you about the basics of such an interesting and popular topic now as neural networks.Intuition of regularized logistic regression
A small article to help you better understand regularized logistic regression.Introduction to the method of principal components
A small, good article on the basics of the method of principal components analysis.The importance of the baseline result
The author of the MachineLearningMastery blog talks about what the baseline result is and why it is important.
Machine Learning Competitions
First Place: The Hunt for Prohibited Content
Interviews with winners of the Avito.ru The Hunt Prohibited Content Machine Learning Contest at Kaggle.Runner-up: The Hunt for Prohibited Content
Interviews with runners-up in the Avito.ru The Hunt Prohibited Content machine learning competition at Kaggle.
Online courses, training materials and literature
Mining of Massive Datasets site.
On this site you can find links to a book on this topic and links to various online courses.
Videos
Hadley Wickham: An Introduction to dplyr (Part 1)
Hadley Wickham Performance on useR! 2014 dedicated to the popular dplyr library for the R. programming languageHadley Wickham: An Introduction to dplyr (Part 2)
Part Two of Hadley Wickham's performance on useR! 2014, dedicated to the popular dplyr library for the programming language R.
Data engineering
HighLoad ++ 2014: The device of modern distributed Object Storage on the example of LeoFS (Alexander Chistyakov, Git in Sky)
Another presentation from the next conference of developers of highly loaded systems HighLoad ++ 2014. This is a presentation by Alexander Chistyakov from Git in Sky: The device of the modern distributed Object Storage on the example of LeoFS.HighLoad ++ 2014: Sharding: patterns and antipatterns (Konstantin Osipov, Alexey Rybak)
Slides from another interesting report that opened the HighLoad ++ 2014 conference entitled “Sharding: patterns and antipatterns”.Using Apache Spark and Neo4j to analyze large graphs
An article that discusses the possibility of using the popular Apache Spark and Neo4j products to work with large graphs.Netflix Dynomite - How to Make Distributed Databases Distributed
An interesting article about Netflix's Dynomite open source solution.Flafka: Apache Flume and Apache Kafka for event processing
In these reviews, there were already several links to materials on Apache Kafka, and in this case this is quite an interesting article from the Cloudera company blog about using Apache Kafka and Apache Flume for event processing.NoSQL in the Hadoop World
An interesting article from Cloudera’s blog about NoSQL in the Hadoop world.Work with sessions at near real-time speeds using Spark Streaming and Apache Hadoop
An interesting article from Cloudera’s blog about using Spark Streaming features.Three tips for modeling data in a document-oriented database (Part 1)
The first part of a series of articles devoted to modeling data in a document-oriented database.10 tips for modeling data in the world of relational and NoSQL repositories
A small article in which you can find 10 tips for modeling data in the world of relational and NoSQL repositories.Introduction to Hadoop MapReduce
An article that explains well the basic concepts of Hadoop MapReduce.
Reviews
Best Resources of the Week from Data Elixir
A collection of links to various data analysis related materials collected by Data Elixir over the past week.DataScienceCentral Weekly Digest
Regular weekly data analysis digest from DataScienceCentral.Digest of the best resources from DataScienceCentral
A good list of fresh interesting articles and resources from DataScienceCentral.10 best materials of the week;
10 best materials of the week on Data Science topics from the Data Science Report portal.The best materials of the week (October 26 - November 1)
The best materials of the week on the topic of data analysis from the portal KDnuggets.Data Mining News
A small list of interesting resources on the topic of Data Mining dated November 5.The most interesting materials from Freakonometrics No. 181
A collection of the most interesting materials from the popular Freakonometrics portal.The most interesting materials from Freakonometrics No. 180
A collection of the most interesting materials from the popular Freakonometrics portal.The most interesting materials on High Scalability
An overview of the most interesting materials on HighScalability from the popular portal High Scalability.
Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 20 (October 27 - November 2, 2014)