# Overview of the most interesting materials on data analysis and machine learning No. 12 (September 1 - 8, 2014)

I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning. This release turned out to be quite voluminous, it has a lot of materials on Data Engineering. More and more materials appear from the KDD 2014 conference. As usual, there are articles about various machine learning competitions, including articles about the recent ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competition. There are also quite a lot of code examples in the programming languages R and Python. There is a mention of, it seems to me, a very curious online course "Introduction to Computational Finance and Financial Econometrics".

## Data Analysis and Machine Learning Materials

- Analysis of ILSVRC results

Analysis of the results of the recent ImageNet Large Scale Visual Recognition Challenge (ILSVRC), the annual image processing competition in which the Google team took first place. - Guide to Data Modeling in MongoDb

Not so long ago the company's website published Daprota Data Modeling Adviser for MongoDB - very useful guidance on modeling data in NoSQL database MongoDb - AVITO.ru competition on Kaggle

The author talks about the experience gained during participation in the AVITO.ru competition on Kaggle and about the analysis of various approaches to solving the problem that were used by other participants in the competition. - A framework for building a dictionary for text analysis

Continuation of a series of articles on text analysis and on working with unstructured data. In this article, the author talks about possible approaches to solving the problem of building a dictionary in the analysis of text data. - Improving image processing algorithms

A small article about the annual competition in the field of image processing, in which the team from Google won first place, doubling last year’s result. - “Introduction to Computational Finance and Financial Econometrics”

online course Most recently, an online course has begun at Coursera that will be useful to those who are interested in statistics and the R programming language, as well as those who are interested in using statistical methods in the financial sector. - About linear regression in simple language

A brief introduction to linear regression written in a fairly simple language. - Stinger.next: Improved SQL with Hadoop and Hive

An article from the Hortonworks blog about plans for the new Stinger.next product, which will significantly improve many of the quality indicators of SQL queries when working with Hadoop. - Using a graph database for text analysis

An example of using a graph database Neo4j and Graphify to classify text using the Deep Learning algorithm. - Slides from the KDD 2014 Conference

Slides from several performances from KDD 2014. - Introduction to Machine Learning Studio for Microsoft Azure ML

This article is about the Machine Learning Studio, which allows you to work with the new cloud-based product for machine learning Microsoft Azure ML. - Deep Learning at Google

A short news article about Google’s progress in machine learning, Deep Learning. The article does not address the technical details of the implementation of Deep Learning algorithms. - ShinyTree: jsTree + shiny

A short example of rendering using the shinyTree library for the R programming language and the jsTree JavaScript library. - Creating an Excel document using Python and Pandas

Sample code that demonstrates creating an Excel document using the Python programming language and the Pandas library. - NoSQL Trends: August 2014

Current trends for major NoSQL systems from various Internet recruitment sites (Indeed, SimplyHired). - My favorite graphs

The author of the article talks about several types of graphs that allow you to simply and clearly visualize different types of source data. - Video lectures from the Big Data, Large Scale Machine Learning

course Video materials from the Big Data, Large Scale Machine Learning course, which took place in 2013 and lasted 14 weeks, with Yann LeCun and John Langford as the main instructors. - Sampling error and non-sampling error

A short article that explains well the difference between two concepts: sampling error and non-sampling error. - Machine Learning with R

The author of the MachineLearningMastery blog tells how to quickly start applying machine learning algorithms in the programming language R. - An exciting year for Apache Spark

A short article on how the popularity of the Apache Spark product has evolved over the past year. - How to translate MapReduce queries into Apache Spark A

useful article from the Cloudera blog that talks about how to translate MapReduce queries into Apache Spark , which is gaining in popularity, and understand the difference between the concepts in these two approaches. - What is Big Data?

What is Big Data, more than 40 experts answer this question on the Berkley blog. - Assessing the accuracy of a predictive model using R Caret

5 methods for evaluating the accuracy of a predictive model available in the Caret machine learning library for the R programming language, described by the author of the popular MachineLearningMastery blog. - Hadoop Ecosystem Ecosystem Digest

A collection of the best materials for August on the Apache Hadoop ecosystem from Cloudera’s blog. - Introduction to Predictive Analytics

The first part of a new series of articles from the insideBIGDATA portal, this time on the topic of Predictive Analytics. - Using Google Charts in R Markdown

A short article that gives an example of using Google Charts in R Markdown docs. - Who is a Data Scientist?

It seems to me a good attempt to describe what Data Scientist does. - Using templates in D3.js

Using templates in the popular D3 visualization library in the JavaScript programming language. - 6 types of activities of Data Scientist

An interesting article that talks about 6 different areas of activity that Data Scientist has to deal with in his daily work. - 9 Tips for Choosing a NoSQL Storage (Part 1)

The first part of a series of articles that will tell you how to choose the right NoSQL storage. - High Performance Content Overview A

weekly digest of the most interesting high performance content from the popular HighScalability portal - Apache Pig with Apache Spark

An interesting article from the Cloudera blog about using Apache Pig with Apache Spark. - Cumulative frequency diagram in R

An example of constructing a cumulative partial diagram using the programming language R and the ggplot2 library. - Image analysis using EBImage

An example of working with images using the programming language R and the EBImage library. - 5 ways to create two-dimensional diagrams in R

5 examples of creating two-dimensional diagrams using the programming language R. - A few words about “linear” regression

An interesting article on linear regression with examples in the programming language R. - Working with MongoDb from R

Useful and very up-to-date article on how to work with NoSQL MongoDb database from programming language R. - Machine Learning and Data Analysis Newsletters

It is often difficult to keep track of all the news in data analysis and machine learning. The author of the popular MachineLearningMastery blog offers a short list of newsletters that can simplify the task of getting the latest news from the field of Data Science. - Notifications in R

Example code that will allow you to receive notifications when a script in the programming language R has ended. - Notifications about errors in R

Another example code that allows you to send notifications in case of errors when executing a script in the programming language R. - Statistical modeling versus machine learning

An interesting comparison of statistical modeling and machine learning. - Neural networks step by step

A good illustrated example of how a neural network works. - Interesting datasets

Several different social media datasets with a short description of each. - An example of a decision tree implementation An example of a decision tree

implementation in the Python programming language.

Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 11 (August 25 - September 1, 2014)