![](http://habrastorage.org/getpro/habr/avatars/286/449/056/28644905686259e5b7cdc57edd66a480.jpg)
Overview of the most interesting materials on data analysis and machine learning No. 14 (September 15-21, 2014)
![](https://habrastorage.org/files/4cf/086/e49/4cf086e49a2b4f66a420085544e4c2e9.jpg)
I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning. I also want to note that I have released the first digest on the topic of high performance and Data Enginering: Overview of the most interesting materials on high performance (September 15-21 , 2014) . I think that he might be interested in someone too.
General
KDD 2104: Google KV and Topic Modeling
The authors of the URX blog share their impressions of the recent KDD 2014 conference in New York, namely they talk about a system called Google Knowledge Vault, which is actively used by Google to improve search quality and other interesting topic topic modeling (Topic Modeling).Top 10 SlideShare Presentations on Data Science and Big Data The
article with a list of 10 presentations from SlideShare on Data Science and Big Data topics with the most views.CuDNN library for Deep Learning
Announcement of the NVIDIA library for working with Deep Learning algorithms, which uses GPU for computing, this approach allows you to increase the efficiency of machine learning algorithms.Statistics against heuristics
Interesting thoughts of the author of the article about when it is reasonable to use epistric approaches.The
conference "Effective Applications of the R Language " was held in London. The author of the blog "R: Analysis and Visualization of Data" talks about the conference "Effective Applications of the R Language (EARL)", dedicated to the use of the programming language R.Introduction to Predictive Analytics (Part 2)
The second part of a new series of articles from the portal insideBIGDATA on Predictive Analytics. In this case, we will focus on areas of application of Predictive Analytics in the corporate field of business.Introduction to Predictive Analytics (Part 3)
The third part of a new series of articles from the portal insideBIGDATA on Predictive Analytics. The third part describes the basic approaches that are used in teaching with a supervisor (Supervised learning), such as regression, classification and clustering.Popular questions at an interview for an analyst position
A small article that contains several popular questions that are asked at an interview for an analyst position.Vincent Granville about Big Data
Vincent Granville - the author of the DataScienceCentral portal, gives his thoughts and defines the concept of Big Data.How to succeed in Big Data
A small article with infographics, which will talk about the main factors that influence the company's success in Big Data.How to become a Data Scientist
A few tips on how to become a Data Scientist and be successful in the field of data analysis.Support for R in Azure ML
A small article from the Microsoft Technet Machine Learning blog about the possibility of using R in the Azure ML cloud solution.5 key ideas for understanding Big Data
An interesting post from the Smart Data Collective portal, which tells you 5 key points that will help you benefit from the data most effectively.Application of machine learning for trading (part 2)
Continuation of the topic of using machine learning for trading.10 experts in machine learning
A list of 10 famous people in the field of data analysis and machine learning.Data Mining vs. (?) Data Science
Some more interesting thoughts about terminology.Introduction to machine learning and a quick start with Azure ML
An interesting article describing the possibilities of Microsoft's new cloud-based machine learning product called Azure ML.
Machine Learning Competitions
Description of the Higgs Boson Machine Learning Challenge winning methodology
An interesting story from the winner of the Higgs Boson Machine Learning Challenge machine competition at Kaggle, where he describes the approach that brought him success in this competition.Kaggle in Class decoding competition for Morse code
In this short post, we will talk about a new competition that began at Kaggle in Class called Morse Learning Machine - v1. It is assumed that participants in the Morse Learning Machine will build a system that will decrypt messages encoded in Morse code contained in audio files.
Microsoft Machine Learning Hackathon An article from the Microsoft Technet Machine Learning blog post about the Microsoft Machine Learning Hackathon.
Online courses and training materials
New online course “Process Mining: Data science in Action”
announced. Recently, a new online course on Coursera on the subject of data analysis called “Process Mining: Data science in Action” was announced, which is presented by Eindhoven University of Technology.
Literature
Free Forecasting Principles and Practices
Rob J Hyndman said on his blog that his popular Forecasting Principles and Practices book can now be found online for free.
Theory and algorithms of machine learning, code examples
Visualization of GPS data
A good code example for visualizing data from a GPS device using the programming language R.Configuring .RProfile This
article is devoted to a useful and interesting topic of configuring R startup parameters using the .RProfile configuration file.Visualizing data with R Caret
The author of the MachineLearningMastery blog talks about data visualization options in Caret's popular machine learning library for the R programming language.Using R Caret for Predictive Modeling
The author of the MachineLearningMastery blog talks about using the popular Caret library for the R programming language for Predictive Modeling.Improving the Learning Model with R Caret
The author of the MachineLearningMastery blog talks about the possibilities for improving the learning model with the Caret library for the R programming language.A series of slides on the topic of data analysis on R
In this slide set, Yanchang Zhao covers seven interesting topics on data analysis and uses the R programming language for code examples.Diagnostics of linear regression models. Part 1
The first part of a series of articles on a rather interesting topic in the diagnosis of linear regression models from the blog “R: Data Analysis and Visualization”. The code examples in the article are written in the programming language R.Introduction to Probabilistic Programming
A pretty good introduction to probabilistic programming with probabilistic code examples.Analysis of the tonality of the text in movie reviews
An interesting example of the analysis of textual information, namely the analysis of the tonality of the text in movie reviews, using the popular graph database Neo4j and the Java programming language.Machine learning in a living environment
Colin Ristig talks about a rather interesting and important question that is sometimes forgotten - the operation of the machine learning algorithm in a living environment.Bibliography on Deep Learning
A large list of various scientific materials on the popular Deep Learning machine learning method, categorized.
Videos
Andrew Ng on Deep Learning
Andrew Ng of Stanford University made an interesting presentation on Deep Learning at the 2014 Robotics: Science and Systems Conference.Moscow Data Science. September 2014 Meetup
On September 5, I visited a rather interesting meetup called Moscow Data Science - “September 2014 Meetup”, organized by Mail.ru. The link will allow you to watch the video from this meeting, for convenience, I marked the start time and duration of the performance of each participant.
Data engineering
Who and how uses Hadoop
An interesting article about the current state of affairs in the Hadoop ecosystem: who uses it and how, as well as development prospects.Upcoming Data Science meetings in Moscow
In the near future, several interesting meetings are scheduled at once, so I decided to publish a short list of upcoming interesting meetings on data analysis and high performance in Moscow.10 способов работы с Hadoop через SQL-запросы
10 инструментов и способов для работы с Hadoop через SQL-запросы и небольшое описание каждого.Приглашаем на HadoopKitchen
Объявление о встрече, посвященной Hadoop, которая состоится в офисе Mail.ru. Я тоже собираюсь посетить данное мероприятие.Введение в HBase
Статья, содержащая видео и поясняющий материал по теме HBase — хранилища данных из экосистемы Hadoop, а также рассказывающая о ситуациях, когда стоит применять данное решение и когда не стоит.Анонс Apache Spark 1.1
Анонс новой версии Apache Spark 1.1 и описание основных нововведений.Потоковая обработка данных в Apache Spark 1.1
Статья о новых возможностях потоковой обработки данных в Apache Spark 1.1 и о вариантах использования данной функциональности.Статистические вычисления в Apache Spark 1.1
Описание расширенных возможностей статистических вычислений в Apache Spark 1.1.
Обзоры
Еженедельный дайджест от DataScienceCentral
Регулярный еженедельный дайджест статей по анализу данных от портала DataScienceCentral.Дайджест лучших ресурсов от DataScienceCentral
Неплохой список свежих интересных статей и ресурсов от DataScienceCentral.Лучшие статьи KDnuggets (7 — 13 сентября)
Список лучших статей портала популярного KDnuggets в период с 7 по 13 сентября.Data Mining News
A small list of interesting resources on the topic of Data Mining dated September 17.The most interesting materials from Freakonometrics The
collection of the most interesting materials from the popular Freakonometrics portal
Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 13 (September 8-14, 2014)
PS I think that many would like to see more material on topics in Russian, so if someone can advise them, then I I will be very grateful and add them to my list of resources that I follow.