
Overview of the most interesting materials on data analysis and machine learning No. 19 (October 20 - 26, 2014)

I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning.
General
IBM announces new Watson technology implementation projects and Watson Group headquarters in New York
Results of the Russian AI Cup 2014
How to make data speak
The role of big data in private investigations and analysis
Yandex.Maps can now create heat maps
HDConf Conference: Photo-Video-Slide Report
50+ libraries for face recognition
More than 50 different libraries, Face Detection / Recognition APIs that can be used in your applications.Introduction to Big Data in the financial sector (Part 5)
The fifth and final part of a series of articles about the use of Big Data in banking and finance from insideBIGDATA portal.Popular questions at an interview for an analyst position (part 2)
The second part of a series of articles that contains several popular questions that are asked at an interview for an analyst position.New Java Machine Learning Library
An article that discusses the pros and cons of the new open source machine learning framework for the Java programming language Datumbox.MIT Scientists Can Predict Bitcoin Value
An article about a group of MIT scientists who built a predictive regression model to predict short-term bitcoin exchange rate fluctuations, which allowed them to double their investments in two months.Introduction to In-Memory Computing (Part 4)
Continuation of a series of articles on In-Memory Computing from insideBIGDATA. In this case, we will focus on measuring the performance of In-Memory Computing.Introduction to In-Memory Computing (Part 5)
The fifth and final part of a series of articles on In-Memory Computing from insideBIGDATA. In this case, we will focus on the product GridGain Data Fabric.SQL or NoSQL?
Another small article, which reflects the author’s thoughts on such a popular question now as the choice of technology for a data warehouse.Information Search with Apache Lucene and Tika (Part 1)
The first part of a series of articles devoted to the topic of information search with Apache Lucene and the Tika library.Information Search with Apache Lucene and Tika (Part 1)
The second part of a series of articles devoted to the topic of information search using Apache Lucene and the Tika library.Information Search with Apache Lucene and Tika (Part 1)
The third part of a series of articles devoted to the topic of information search using Apache Lucene and the Tika library.15 timeless articles on Data Science
A list of 15 articles from the DataScienceCentral portal, which were published 1-2 years ago, but still have not lost their popularity and relevance.
Theory and algorithms of machine learning, code examples
Benford's law and distributions falling under it
Markov random fields
How to master machine learning algorithms
5 great tips from the author of the MachineLearningMastery blog on how to properly approach the issue of studying various machine learning algorithms.Nonlinear regression
A fairly simple description of the concept of nonlinear regression.First look at Distributed R
A short note about a very interesting project from HP Labs called Distributed R.How MKL allows you to increase the speed of the Revolution R Open
In the last review, there was a link to the announcement of the Revolution R Open, and in this article we will talk about the details of the implementation of this version of the programming language R, namely, to accelerate the operation of some operations using the Intel Math Kernel Library (MKL).Text Analysis with RapidMiner (Part 1)
The first part of a series of articles devoted to text analysis with RapidMiner.Text Analysis with RapidMiner (Part 2)
The second part of a series of articles devoted to text analysis using RapidMiner.Introduction to Neural Networks (Part 2)
A fairly simple description of the operation of neural networks from the Analytics Vidhya blog.
Machine Learning Competitions
What is a Data Hackathon?
An interesting video about a data hackathon held in mid-September under the auspices of MIT.How to choose a model for the final evaluation in the Kaggle competition
A very useful article from one of the participants in the machine learning competitions on how to choose the model for the final evaluation in the Kaggle competition.Tips for choosing a model in a machine learning competition
Continuing the discussion of the previous topic about choosing a final model in a machine learning competition, in this case this is the view of the author of the popular MachineLearningMastery blog on this interesting question.
Online courses, training materials and literature
Online course “Data visualization. The Basics
New MIT Big Data courses on edX
Some time ago, a publication appeared on the MIT website about an interesting MIT initiative on edX, namely, the launch of MIT Professional Education's first session of the Tackling the Challenges of Big Data course on edX, which will be available to all comers.3 Great Free Books on Data Science
A set of three books on Data Science with a short description of each, available for free.Book "Data Fluency"
Review of the new curious book "Data Fluency" from the authors.Foundations of Signal Processing and Fourier and Wavelet Signal Processing
Books A short note about interesting books, Foundations of Signal Processing and Fourier and Wavelet Signal Processing, by the author of the popular blog Nuit Blanche.
Videos
Scaling fuzzy search algorithms
An interesting report by Ken Kugler (President, Scale Unlimited) from the Cassandra Summit 2014 conference on scaling Fuzzy Matching functionality by comparing the degree of similarity of customer data in the banking sector using Apache Cassandra.Using Apache Spark for working with data
This post presents a set of videos dedicated to Apache Spark.
Data engineering
Microsoft DocumentDB: Article One, Introduction
Microsoft DocumentDB: Article Two, Resources, and Concepts
Kylin by eBay
An interesting open source product from eBay called Kylin - Distributed Analytics Engine with SQL interfaces and Hadoop-based OLAPs.Hadoop in the corporate sector
An interesting infographic about the use of Apache Hadoop in the corporate sector.Apache Kafka
stress testing for AWS An interesting article that shows the results of Apache Kafka stress testing for AWS.Sharding traps (part 2)
The second part of a series of articles on the intricacies of using sharding in the popular NoSQL repository MongoDB.Sharding traps (part 3)
The third part of a series of articles on the intricacies of using sharding in the popular NoSQL repository MongoDB.
Reviews
DataScienceCentral Weekly Digest
Regular weekly data analysis digest from DataScienceCentral.Лучшие материалы за неделю (12 — 18 октября)
Лучшие материалы за неделю по теме анализа данных от портала KDnuggets.Новости Data Mining
Небольшой список интересных ресурсов по теме Data Mining от 8 октября.Наиболее интересные материалы от Freakonometrics №177
Сборник наиболее интересных материалов от популярного портала Freakonometrics.Наиболее интересные материалы от Freakonometrics №176
Сборник наиболее интересных материалов от популярного портала Freakonometrics.Наиболее интересные материалы от Freakonometrics №175
Сборник наиболее интересных материалов от популярного портала Freakonometrics.The most interesting materials on High Scalability
An overview of the most interesting materials on HighScalability from the popular portal High Scalability.The best materials: NoSQL Zone (October
17-24 ) A collection of the best materials from the popular DZone portal on NoSQL.
Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 18 (October 13 - 19, 2014)