Overview of the most interesting materials on data analysis and machine learning No. 10 (August 18 - 25, 2014)

    I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning. This release has a lot of interesting materials for beginners. There is a couple of interesting videos. There are materials on Data Engineering. As usual, some articles are devoted to code examples related to data analysis and machine learning. And traditionally, several articles have been devoted to the topic of participation in machine learning competitions.

    Data Analysis and Machine Learning Materials

    • EN Literature MIT Deep Learning
      Book MIT Book of Deep Learning , a very popular machine learning area. The book is not yet complete, but many chapters are already available to readers.
    • EN Literature R Processing data using R
      A small book that can be useful to everyone who works with data using the R programming language, which is dedicated to processing and cleaning data in the preprocessing phase, which, as you know, takes a lot of time and takes a lot of effort from data analysis specialists.
    • EN For newbies Video lectures Python The Difficult Way to Learn Machine Learning - The
      Nathan Taggart pony story (Product Manager in New Relic) in this video tells the story of learning machine learning and what mistakes to avoid in this difficult task. The video is intended for beginners in the subject of data analysis and machine learning.
    • EN For newbies R What is R?
      A small capacious overview of the programming language R with a description of the advantages and disadvantages.
    • EN What companies need to know about Big Data
      An article discussing that many companies may need to change their approach to working with their data and focus more on current trends in Data Science.
    • EN Guide for the analysis of unstructured text data
      The first part of a series of articles from the popular portal Analytics Vidhya, devoted to the interesting topic of text analysis. This article describes the basic problems and issues, in future articles will describe the details of the implementation of solutions to these issues.
    • EN Data Analysis with Microsoft Mario Garzia, Microsoft
      Data Analyst Mario Garzia of Microsoft, in his article on the Microsoft Technet Machine Learning Blog, provides some interesting insights into the current state of data science.
    • EN Machine Learning Competitions 5 benefits of participating in machine learning competitions
      Another interesting article from the author of the MachineLearningMastery blog. This time we will talk about the benefits of participating in machine learning competitions at Kaggle.
    • RU R Visualizing time series using the googleVis library
      Not so long ago, news about the release of googleVis version 0.5.5 came out. This short post provides a very simple code example for visualizing time series using the googleVis library for the programming language R.
    • EN Data engineering Microsoft Azure DocumentDB
      A short article on Microsoft 's new NoSQL database called Azure DocumentDB.
    • RU Habr The use of machine learning for trading (Part 1)
      Introduction to the topic of the use of machine learning for trading. This series of articles has already been presented in surveys on data analysis and machine learning. In this case, this is the translation of the first part into Russian.
    • EN Data engineering Improving Query Performance in Apache Hive with Partitioning
      A small article from the Cloudera blog on how you can improve query performance in Apache Hive using partitioning.
    • RU Online course A new online course at Stanford University - Mining Massive Datasets was announced
      September 29, 2014 at Coursera a very interesting online course from Stanford University - Mining Massive Datasets starts.
    • EN Python Fast HDF5 with Pandas
      An example of working with the HDF5 information storage format from the Pandas data analysis framework for the Python programming language.
    • EN Interesting resources on Deep Learning
      A list of resources on the popular machine learning technique Deep Learning, compiled by the renowned portal KDnuggets.
    • EN For newbies Data engineering This is not NoSQL against RDBMS, this is ACID + Foreign Keys against Eventual Consistency
      A bit of curious discussion about NoSQL and RDBMS data warehouses.
    • EN Machine Learning Competitions An example of solving a problem on Kaggle
      An example of a possible solution to the popular Kaggle machine learning competition “Predict Bike Sharing Demand” using the Gradient Boosted Trees technique. The example uses the GraphLab Create machine learning tool.
    • EN For newbies Logistic regression visualization
      In machine learning, logistic regression is often used. This short post presents a visualization of the work of logistic regression in the form of an animated image.
    • EN Machine Learning and Computer Vision (Part 2) The second part of a series of articles from the Microsoft Technet Machine Learning Blog devoted to the use of machine learning in solving pattern recognition issues and the use of computer vision technologies. The article is small and written in simple language, without diving into the details of this rather complex topic.
    • EN For newbies Data engineering The Hadoop Ecosystem A
      small, helpful article that gives a brief description of the basic elements of the Hadoop ecosystem.
    • EN For newbies What is Big Data?
      An interesting small article in which the author discusses what Big Data is and an attempt is made to give the simplest description of this term.
    • EN R Using expression in R
      An interesting article about using the expression () function in the programming language R.
    • RU For newbies Supervised learning flowchart
      Many people are familiar with this type of machine learning, such as supervised learning. In this short post, in the form of a flowchart, a good visualization of the sequence of typical actions when teaching with a teacher is presented.
    • EN 21 great graphs
      A few excellent graphs for visualizing data using various kinds of graphs and charts from the DataScienceCentral portal.
    • EN Machine Learning Competitions How to successfully compete in Kaggle competitions
      Another useful article on how to successfully compete in machine learning competitions in Kaggle.
    • EN Online course Announcement of the Capstone project in data analysis specialization from Coursera
      A small article announcing the Capstone project, which belongs to the final phase of the Data Science Specialization from Johns Hopkins University. You can connect to the project if all 9 specialization courses have been successfully completed.
    • EN Data engineering Video lectures Sybil: Google Machine Learning Scaling System
      In this talk, Tushar Chandra talks about the fate of Sybil at Google. Sybil is an important research project at Google that implements various machine learning algorithms, allowing them to scale. This development is widely used in Google.
    • EN Four main languages ​​for data analysis
      The results of a poll conducted by the popular KDnuggets portal on the most popular languages ​​used for data analysis.
    • EN For newbies Mathematics for machine learning The
      article is devoted to the question of the necessary mathematical skills necessary for mastering the basic knowledge of machine learning. The author indicates that the article is a draft version and that additional information will appear in it over time.
    • EN R Where libraries are installed in RStudio
      A small article on the curious question of where RStudio installs libraries.
    • EN 44 data analysis
      articles An interesting selection of articles and resources from the best data analysis experts, compiled by DataScienceCentral

    Previous issue:  Overview of the most interesting materials on data analysis and machine learning No. 9 (August 11 - 18, 2014)

    Also popular now: