
Overview of the most interesting materials on data analysis and machine learning No. 17 (October 6 - 12, 2014)

I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning.
General
Data mining makes scientific discoveries
A simple way to assess the intelligibility of a text in Russian
16 options for developing his skills in Data Science
An excellent article from the author of the MachineLearningMastery blog, in which he offers many different directions for his own development as a specialist in the field of data analysis that currently exists.Introduction to Big Data in the financial sector (Part 3)
The third part of a series of articles about the use of Big Data in banking and finance from the insideBIGDATA portal. This section will cover topics such as Credit Scoring and Back Trading / Testing.How to start an analytics career A
useful article from the Vidhya Analytics blog, where you can find a list of resources and a set of practical tips for those who are interested in a career in data analysis.Introduction to In-Memory Computing (Part 3)
Continuation of a series of articles on In-Memory Computing from insideBIGDATA. In this case, we will talk about the types of In-Memory Computing.Role of Julia in Data Science
An interesting article about the programming language Julia and its role in the field of data analysis.7 things to keep in mind about Big Data
An interesting article from the Big Data Analytics News blog that offers 7 things to keep in mind before introducing Big Data related technologies.Azure ML Helps CMU Use Electricity More Efficiently
An interesting post from the Microsoft Technet Machine Learning blog post on how Microsoft's new cloud-based product Azure ML helps Carnegie Mellon University (CMU) make more efficient use of electricity.Why R is better than Excel for data analysis
A useful post from Fantasy Football Analytics describing the advantages of the R programming language over Excel in the field of data analysis.Microsoft Prediction Lab
A short post from the Microsoft Technet Machine Learning blog about the Microsoft Prediction Lab.The 200 best DataScienceCentral bloggers The
200 best data analysis bloggers from the popular DataScienceCentral portal.
Theory and algorithms of machine learning, code examples
Working with Data Frame in R A
good article about manipulating Data Frame objects in the R programming language from basic to using the dplyr library.Introduction to Feature Selection
Another interesting and useful article from the author of the blog MachineLearningMastery, in this case we will talk about such an important step in the machine learning process as Feature Selection.Introduction to the k Nearest Neighbors Method
A fairly simple description of the k nearest neighbors method from the Vidhya Analytics blog.
Machine Learning Competitions
Competition Avito.ru-2014: recognition of contact information on images
Competition for solving an applied problem from the field of image analysis.Tradeshift Text Classification
Machine Learning Contest A new Tradeshift Text Classification machine learning contest has begun on the Kaggle website.
Online courses, training materials and literature
The online course “Social Network Analysis”
has started . Most recently on Coursera the online course “Social Network Analysis” has begun, which is devoted to the analysis of social networks, which many may find interesting and useful.The free book “DBA's Guide to NoSQL” by
Robin Schumacher in an article on the DataStax blog told that it was freely available in its small but quite interesting book “DBA's Guide to NoSQL”, which may be interesting for beginners in the NoSQL- topic storage facilities.Review of the book "Modern Optimization with R"
Review of the new book "Modern Optimization with R" from the portal KDnuggets, dedicated to the effective work with the programming language R.Announcement of the second edition of the book “Doing Bayesian Data Analysis”
Announcement of the second edition of the interesting book “Doing Bayesian Data Analysis”, which will be released shortly.Review of the book "Monte Carlo simulation and resampling methods for social science"
Another review of the interesting book "Monte Carlo simulation and resampling methods for social science". For examples, the book uses the programming language R.Review of the book “Analytics in a Big Data World”
A small review of a curious book on the topic of data analysis “Analytics in a Big Data World”.
Videos
Materials from the meeting “Moscow Cassandra Meetup at Yandex”
On October 4, a meeting dedicated to the popular Apache Cassandra data warehouse was held at the Yandex office. In this post you can find videos from this meeting.Ruslan Salakhutdinov on Deep Learning at the KDD 2014 conference
Post on an interesting report by Ruslan Salakhutdinov from the University of Toronto on various aspects of the application of machine learning, namely the application of Deep Learning.
Data engineering
Storage systems: how to choose ?!
Meeting "PostgreSQL in Avito.ru"
Announcement of the meeting dedicated to PostgreSQL DBMS, which will be held in Moscow.Apache Spark broke the previous record for the speed of sorting a large amount of data.
An article from the DataBricks blog, from which you can learn about the results of sorting performance tests using Apache Spark a large amount of data.7 most popular APIs in the field of Big Data (part 1)
In this series of articles we will focus on various options for working with big data.7 most popular APIs in the field of Big Data (part 2)
The second part of a series of articles on various options for working with big data.The Story of Apache Storm
Nathan Marz, author of Apache Storm, posted a very interesting article on his blog about the history of the emergence and development of Apache Storm.How to choose a data warehouse
A short article on how to make the right choice of a data warehouse to successfully complete a specific task.Cloudera Live
Service A useful service from Cloudera called Cloudera Live, which will help beginners quickly learn how to work with the Hadoop ecosystem.What is Write Concern in MongoDB?
The article, the author of which briefly talks about the various modes of writing to the NoSQL MongoDB database.Announcement of Couchbase Server 3.0
Announcement of the release of a new version of one of the most popular NoSQL repositories.
Reviews
Data Mining News
A small list of interesting resources on the topic of Data Mining on October 8th.DataScienceCentral Weekly Digest
Regular weekly data analysis digest from DataScienceCentral.Digest of the best resources from DataScienceCentral
A good list of fresh interesting articles and resources from DataScienceCentral.The best materials of the week (September 28 - October 4)
The best materials of the week on the topic of data analysis from the portal KDnuggets.The best materials for September
The best materials for September on the topic of data analysis from the portal KDnuggets.10 лучших материалов недели
10 лучших материалов недели по тематике Data Science от портала «Data Science Report»Наиболее интересные материалы от Freakonometrics №172
Сборник наиболее интересных материалов от популярного портала Freakonometrics.Наиболее интересные материалы от Freakonometrics №171
Сборник наиболее интересных материалов от популярного портала Freakonometrics.Наиболее интересные материалы по High Scalability
Обзор наиболее интересных материалов по теме HighScalability от популярного портала High Scalability.Лучшие материалы: NoSQL Zone (3 — 9 октября)
Сборник лучших материалов от популярного портала DZone по теме NoSQL.
Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 16 (September 29 - October 5, 2014)