Overview of the most interesting materials on data analysis and machine learning No. 18 (October 13-19, 2014)
I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning.
General
- Why and how to use data visualization?
- Search Technology at Airbnb
- Online broadcast of YaC 2014
On October 30, Yandex will hold the conference “Yet another Conference 2014”, which will be broadcast online, with a fairly large number of interesting reports. - Meet Revolution R Open
On October 15, 2014, Revolution Analytics announced the development of its advanced R distribution, Revolution Open R. - Comparison of Vowpal Wabbit, Liblinear / SBM and StreamSVM
Comparison of the performance of Vowpal Wabbit, Liblinear / SBM and StreamSVM based on the Webspam dataset. - Using the R Notebook in the
R Notebook Cloud : An interesting adaptation of IPython Notebook to work with the Domino Data Lab R programming language. - How Big Data can improve our lives.
Interesting Big Data infographic. - Publishing machine learning web services in Azure ML
An article from the Microsoft Technet Machine Learning blog in which the author talks about the capabilities of the Azure ML cloud service that enable data analysts to place their services in the cloud and publish them in the Azure Marketplace application store. - Revolution R Open and Revolution R Plus
Announcement of Revolution R Open and Revolution R Plus by Revolution Analytics. - 9 options for using BigML
An article from the BigML blog that, using infographics, talks about 9 different ways to use the BigML machine learning platform. - Machine Learning in the Cloud
Another interesting article from the Microsoft Technet Machine Learning blog, in this case, about distributed cloud computing for machine learning, and of course about Azure ML. - Intersect Datasets for Data Science Projects (Part 1)
A good list of different data sources. - Intersect Datasets for Data Science Projects (Part 2)
A good list of different data sources. - 5 areas that you should develop as a machine learning specialist
A good article from the author of the MachineLearningMastery blog, in which he talks about 5 areas that you should pay attention to when developing your machine learning skills. - Introduction to Big Data in the financial sector (Part 4)
The fourth part of a series of articles about the use of Big Data in the banking and financial sectors from insideBIGDATA portal. - 12 training camps on the topic of Data Science
An interesting list of training camps on the topic of Data Science, which, I think, will be constantly updated.
Theory and algorithms of machine learning, code examples
- Latent semantic analysis: implementation
- How do we cluster gifts in OK
- How to identify sales losses
- The process of machine learning (part 1)
The first part of a series of articles in which the author talks about the various stages of the machine learning process (mainly it will be about learning with a teacher - Supervised learning). - Deep Learning with Caffe and cuDNN
An interesting article from the nVidia blog about the possibilities of using Deep Learning machine learning using the Caffe framework and cuDNN library. - Deep Learning on Amazon EC2 GPUs using Python and nolearn
An article about the possibilities of using Amazon EC2 GPUs for machine learning using Deep Learning, as well as using the Python programming language and the nolearn library. - Analysis of Instagram using R
A short article on working with data from the popular Instagram service using the programming language R. - Implementation of the method of k nearest neighbors from scratch
Implementation of the method of k nearest neighbors from scratch using the Python programming language from the author of the popular blog MachineLearningMastery. - Introduction to Python Pandas
A collection of resources on the Pandas data library library for the Python programming language. - Bayesian rule visualization
Visual animated visualization of the Bayesian rule. - Introduction to Neural Networks
A fairly simple description of the operation of neural networks from the Analytics Vidhya blog. - Linear regression and
matrix operations in Excel Description of the possibility of using linear regression and matrix operations in Excel.
Online courses, training materials and literature
- New Specializations at Coursera
18 new specializations at Coursera have been announced. - Foundations of Data Analysis
online course Recently, a new online course from the University of Texas at Austin on the topic of data analysis called Foundations of Data Analysis was announced at edX. - Overview of the Scaling Apache Solr book Overview of the Scaling Apache Solr
book dedicated to scaling the popular full-text search platform Apache Solr. - Book “Data Mining for Managers”
Announcement of a new book on data analysis “Data Mining for Managers”.
Videos
- Yoshua Bengio on Deep Learning at the 2014 KDD Conference
Youshua Bengio (Department of Computer Science and Operations Research, University of Montreal) with an interesting presentation on Deep Learning at the KDD 2014 Conference. - Using the dplyr library to work with data in R
An interesting video on using the dplyr library for various data manipulations in the programming language R. - Interactive visualization with rCharts
Ramnath Vaidyanathan (Assistant Professor at McGill University) in this short video from the useR conference! 2014 talks about the possibilities of interactive visualization using the programming language R and the rCharts library.
Data engineering
- Hadoop: what, where and why
- What's New in RavenDB 3.0
This post will present a set of materials about the fairly popular RavenDB database and its new version of RavenDB 3.0. - Using RethinkDB with the Compose cloud
In addition to MongoDB and ElasticSearch, the Compose cloud has the ability to use the popular RethinkDB database, and this article will discuss the details of working with this database in the Compose cloud. - Using Apache Helix on LinkedId
Describes the features of Apache Helix and uses this framework in LinkedIn's infrastructure. - Modeling in Document-Oriented Databases (Part 1)
The first part of a curious series of articles from the Couchbase blog, which will discuss the issue of building effective models in document-oriented databases. - Hadoop Current Status
An article on the current status of the Hadoop ecosystem, presented as a visual infographic. - Sharding Traps (Part 1)
The first part of a series of articles on the intricacies of using sharding in the popular NoSQL repository MongoDB. - Comparing NoSQL and SQL
A short article is another comparison of NoSQL and SQL.
Reviews
- DataScienceCentral Weekly Digest
Regular weekly data analysis digest from DataScienceCentral. - Digest of the best resources from DataScienceCentral
A good list of fresh interesting articles and resources from DataScienceCentral. - Data Mining News
A small list of interesting resources on the topic of Data Mining on October 15. - The best materials of the week (October 5 - 11)
The best materials of the week on data analysis from the KDnuggets portal. - 10 лучших материалов недели
10 лучших материалов недели по тематике Data Science от портала «Data Science Report». - Наиболее интересные материалы от Freakonometrics №174
Сборник наиболее интересных материалов от популярного портала Freakonometrics. - Наиболее интересные материалы от Freakonometrics №173
Сборник наиболее интересных материалов от популярного портала Freakonometrics. - Наиболее интересные материалы по High Scalability
Обзор наиболее интересных материалов по теме HighScalability от популярного портала High Scalability. - Лучшие материалы: NoSQL Zone (10 — 16 октября)
Сборник лучших материалов от популярного портала DZone по теме NoSQL.
Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 17 (October 6 - 12, 2014)