Overview of the most interesting materials on data analysis and machine learning No. 11 (August 25 - September 1, 2014)
I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning. This release has a lot of diverse information. There are many articles on the topic of Data Engineering. There are materials for beginners and several video lectures. As Kaggle Machine Learning Competition is commonly referred to. An interesting article about startups in the field of Data Science. A curious article about improving gaming AI using machine learning.
Data Analysis and Machine Learning Materials
- Predictive modeling, teacher training and pattern classification A
good article on machine learning, which will be interesting for beginners as well, which touches on topics such as teaching with a teacher, visualization in machine learning, processing of input data, feature enginering, sampling and others.
- Ruslan Salakhutdinov on Deep Learning at the 2014 KDD Conference
Materials from a presentation by Ruslan Salakhutdinov from the University of Toronto at the 2014 KDD Conference in New York.
- Talk about Hadoop
Introduction to the Hadoop ecosystem in Russian. In the end there is a good set of links to useful materials on this topic.
- How to become a Data Scientist An
interesting article from the DataScienceCentral portal for those interested in the topic of Data Science. The article briefly describes the concept of Data Scientist, identifies 4 areas in this profession and discusses the tools that a data analysis specialist needs.
- Using the pbapply () function
An interesting example of using the pbapply () function from the pbapply library for the programming language R.
- Azure DocumentDB An
article about the new NoSQL database from Microsoft called Azure DocumentDB. DomentDB is still in preview. At the end of this article there is a good set of related links.
- Data Science startups from Y Combinator
In the field of Data Science there are quite a lot of opportunities for business development. This article provides a list of Data Science startups 2014 from the famous startup incubator Y Combinator.
- New Kaggle Competition: Epilepsy Seizure Prediction Challenge
Not long ago, a new machine learning competition, the American Epilepsy Society Seizure Prediction Challenge, started at Kaggle. The competition will last until November 17, 2014.
- 33 unusual problems that can be solved using Data Science
The author of the popular portal DataScienceCentral in his short post published a list of 33 problems from various areas of life that Vincent Granville believes can be solved using Data Science.
- DataScienceCentral Weekly Digest
Regular weekly data analysis digest from DataScienceCentral.
- List of interesting literature
A list of interesting books that may be interesting to read for those who are interested in the topic of data analysis.
- A new dataset from Microsoft Research
Just yesterday, an interesting dataset called Microsoft Research Dense Visual Annotation Corpus was published on the Microsoft Research website.
- How machine learning helped improve game AI
A rather interesting article written in a good living language about how machine learning techniques helped the author of the article greatly simplify and improve the effectiveness of AI for a game bot.
- The convergence of machine learning and Big Data
The article presents interesting observations by a well-known specialist in data analysis Mikko Braun on the need for convergence of the machine learning communities and Big Data, and that now they are actually quite far from each other, which leads to certain problems and inconveniences.
- Link Diagrams for Machine Learning and Data Mining
In this short post, there are two very interesting and useful mind maps on the topics of machine learning and Data Mining.
- Analysis of unstructured data.
Continuation of a series of articles on text analysis and work with unstructured data. In this case, the author proceeds from posing questions to practical aspects and discusses the topic of processing and cleaning unstructured text data, in preparation for further steps in analyzing this data.
- So you want to be a Data Scientist
An interesting short article describing the main aspects of a profession called Data Scientist.
- Using Big Data on the Securities Market
The author of the article offers 3 practical tips on using Big Data for investment in the securities markets, which everyone can use.
- 100 Popular Machine Learning Videos
A great, great list of one hundred machine learning videos from VideoLectures.Net.
- Online Course "Data Analysis and Statistical Inference"
On Monday, September 1, Coursera launches the second time a very well-proven online course on data analysis and statistics, "Data Analysis and Statistical Inference" from Duke University.
- Digest of the best resources from DataScienceCentral (September 1)
A good list of fresh interesting articles and resources from DataScienceCentral.
- Applying Bayesian machine learning methods with Apache Spark
A little interesting article from the authors of the blog Cloudera, which gives an example of the possibility of using Bayesian machine learning methods with the help of a popular Hadoop family product called Apache Spark and PyMC library for the Python programming language.
- Facts and myths about Big Data
A small interesting article from the popular portal insideBIGDATA, in which the author discusses the issues of the now popular Big Data topic and shares his thoughts about common misconceptions in this area.
- 12 MongoDB Tips
A short article that contains 12 useful tips for those who want to use the popular NoSQL MongoDB database in combat.
- John Chambers: Interfaces, Performance, and Big Data
John Chambers in this video from the “useR! 2014 conference ”discusses the past, present and future of the R programming language in a discussion called“ Interfaces, Efficiency and Big Data ”.
- Using Hadoop for large amounts of data
A fairly large article on the Hadoop ecosystem and its real use when working with large amounts of data.
- Write operations in MongoDB
An article that describes well the subtleties of the issue of writing and updating in MongoDB, citing several modes of working with MongoDB when updating data: Unacknowledged, Acknowledged, Journalled, etc.
- Nonlinear classification in R using decision trees
7 types of nonlinear classification using decision trees with code examples in the programming language R from the author of the popular data analysis blog MachineLearningMastery.
- Impala: plans for the future
A small article from the Cloudera blog about the company's plans for the future of the popular Hadoop product called Cloudera Impala, which allows you to work with data in Hadoop using SQL queries.
- Slamdata: SQL queries in MongoDB
Announcement of a rather interesting SlamData product that will allow you to execute SQL queries on data in MongoDB. The product is currently in beta testing, with a release scheduled for early October this year.
Previous issue: Overview of the most interesting materials on data analysis and machine learning No. 10 (August 18 - 25, 2014)