Overview of the most interesting materials on data analysis and machine learning No. 9 (August 11 - 18, 2014)


    I present to you the next issue of a review of the most interesting materials on the topic of data analysis and machine learning. This release has a lot of interesting videos. A number of materials are devoted to the topic of Data Engineering. There are many practical examples of code in the programming languages ​​R and Python in this release. As usual, a lot of material is devoted to machine learning algorithms.


    Data Analysis and Machine Learning Materials


    • EN Visualization using the D3 framework
      A short article on data visualization using the popular J3 framework D3.
    • EN Python Own image search
      The author talks about his own development in Python, which makes it easier to work with images on a local computer.
    • EN Data engineering Video lectures Alex Smola talks about scalable machine learning
      This is another lecture from a series of lectures that were presented at Machine Learning Summer School (MLSS '14) in Pittsburgh. In this video lecture, a well-known specialist in the field of computer science, including in the field of machine learning, Alex Smola (a researcher at Google, a professor at Carnegie Mellon University), addresses a very interesting and important topic of scaling in machine learning.
    • EN The Future of Content Consumption through the Eyes of Yahoo
      An interesting article about Yahoo's future plans for artificial intelligence and machine learning.
    • EN R 21 navigation tools in R
      A useful set of 21 navigation tools for the R programming language, which will be useful to everyone.
    • EN The development of artificial intelligence technologies on Facebook envies this person
      An interesting article about Yann LeCunn, one of the most famous experts in the field of data analysis and machine learning, who is one of the founders of Deep Learning and is now engaged in the development of machine learning technologies at Facebook.
    • EN List of leading researchers in the field of data analysis An
      interesting list of leading researchers and scientists in the field of data analysis and Data Science from the popular portal KDnuggets, based on the processing of data results with Microsoft Academic Search.
    • RU R Selecting a subset of records from a large file
      When working with a large file in the R programming language, it is often more convenient to work with a small random subset of records from the entire data set. This short article provides sample code for extracting a subset of records from a file.
    • EN Python Apache Spark with IPython
      A short article from the Cloudera blog post on integrating Apache Spark and IPython.
    • EN Python Machine Learning
      Library PyStruct A library for machine learning, namely Structured Learning using the Python programming language. The library was created with a focus on the similarity of design with the popular machine learning library scikit-learn.
    • EN Quick Learning with Vowpal Wabbit
      A short article from the Microsoft Technet Machine Learning Blog about the open source machine learning system Vowpal Rabbit, developed by Microsoft Research and which can be integrated with the Microsoft Azure ML cloud-based machine learning platform.
    • EN Video lectures The best videos of the first half of the year on data analysis
      In this article, you can find a list of the best videos of the first half of 2014 that were on the IBM Big Data & Analytics Hub portal.
    • EN QuickML Machine Learning Library
      An interesting library for machine learning using the Java programming language.
    • EN SAS in the cloud
      This article briefly talks about the work of SAS in the AWS cloud from Amazon, as well as the integration of the SAS platform with some AWS services.
    • EN 38 articles on data analysis that everyone should read
      An excellent list of 38 articles on data analysis that will be interesting to anyone interested in this topic.
    • EN R How to make oblique labels on the axes of the graph
      How to make oblique labels on the axes of a graph is a question that often arises when using standard visualization tools in the programming language R. This article has a small code example that allows you to make labels to the axes at different angles of inclination.
    • EN For newbies How to Improve Your Machine Learning Skills
      A nice little article written in simple language on how to improve your machine learning skills.
    • EN Comparison of data analysis software
      Comparison table of software products (R, MATLAB, SAS, STATA and SPSS) for built-in support for various statistical analysis tools in them.
    • EN Data engineering 18 main tools of the Hadoop family The
      number of new tools around Hadoop is growing rapidly and keeping track of all the latest in this area is quite difficult. In this article, you can find a list of 18 main ones with a brief description of each.
    • EN R SemPlot library for the R language
      A small example of the use of the semPlot library, which is designed to visualize Structural equation modeling (SEM) data, which allows you to explore various complex relationships between variables.
    • EN R Prisoner dilemma: an example in R
      An interesting example of the implementation of the fundamental problem from the game theory "Prisoner Dilemma" using the programming language R.
    • EN For newbies Python Some basic statistics
      A few examples of simple operations from statistics with examples in the Python programming language.
    • RU Python Transforming data from SAS to SQLite
      A useful Python code example for transforming data from SAS to SQLite.
    • EN R GrapherR: GUI visualization system for R
      GrapherR is a library for the R programming language, which allows you to visualize various data, but what is very important - this library has its own GUI.
    • EN Theory Convolutional neural networks The
      publication is devoted to the topic of convolutional neural networks, with a sufficiently deep immersion in the material and theory on this interesting and popular topic.
    • EN For newbies So you wanted to try Deep Learning?
      This article focuses on the popular topic of Deep Learning, but rather is a set of useful and interesting resources on this topic that will allow you to better understand the topic of Deep Learning.
    • EN A brief description of OpenML
      A short article about the increasingly popular OpenML machine learning portal, where you can also participate in machine learning competitions.
    • EN For newbies Python Research data analysis using Python and Pandas
      A very interesting article about research data analysis using Python and Pandas, with code examples based on the popular “Titanic” dataset with Kaggle.
    • EN Data engineering Video lectures Building Machine Learning Infrastructure
      In this interesting video with a very easy presentation style, Josh Willis (Senior Director of Data Science at Cloudera) talks about what Cloudera is currently working on and how to use machine learning in a living environment with a lot of data or Industrial Machine Learning , which is often much more difficult than academic machine learning.
    • EN Data engineering New in CDH 5.1: Read Caching in HDFS
      This article will talk about the new functionality in CDH 5.1: Read Caching in HDFS, which potentially will significantly increase read speed on systems that use HDFS.
    • EN R Nonlinear classification in R
      Eight kinds of nonlinear classification with examples in the programming language R.

    Previous issue:  Overview of the most interesting materials on data analysis and machine learning No. 8 (August 4 - 11, 2014)

    Also popular now: