Overview of the most interesting materials on high performance (September 15-21, 2014)
I present to you the first issue of a review of the most interesting materials on high performance. When I was preparing the next issue of a review of the most interesting materials on data analysis and machine learning, I realized that the self-sufficient topics of the collected materials stand out for themselves. I hope that this type of review will also be useful and interesting. I will try to expand the list of resources that I follow when preparing these reviews.
High Performance Materials
- Using Apache Samza on LinkedIn
An article from the LinkedIn blog about how they use Apache Samza in their application and how this product helped solve data problems.
- Who and how uses Hadoop
An interesting article about the current state of affairs in the Hadoop ecosystem: who uses it and how, as well as development prospects.
- Upcoming Data Science meetings in Moscow
In the near future, several interesting meetings are scheduled at once, so I decided to publish a short list of upcoming interesting meetings on data analysis and high performance in Moscow.
- A new type of aggregation in Elasticsearch
An article from the Elasticsearch blog about the new aggregation function top_hits, which was added to the large list of such functions in version 1.3.0.
- New version of Apache Tez
A short article from the Hortonworks blog about the features of the new version of Apache Tez 0.5.
- Hadoop SQL Queries with Apache Drill
A short article about Apache Drill that allows you to work with Hadoop through SQL query syntax.
- Investigation of the effect of multi-user workload on Cloudera Impala
An article from the Cloudera blog, which presents the results of an interesting study conducted on the Cloudera Impala product with various load profiles.
- Top 10 SlideShare Presentations on Data Science and Big Data The
article with a list of 10 presentations from SlideShare on Data Science and Big Data topics with the most views.
- Using disk space in MongoDB
A short article to help you better understand how the NoSQL MongoDB database uses disk space.
- Weak isolation is a serious problem
Interesting thoughts about database isolation levels.
- 10 lessons from Microsoft Azure
A very interesting post that gives 10 useful recommendations for properly scaling an application when using Microsoft Azure cloud based on your own experience.
- Using Redis on Twitter
An interesting video in which Yao Yu talks about using Redis at Twiiter for scaling. And in the article, by reference, you can find excellent material based on this speech.
- KDD 2104: Google KV and Topic Modeling
The authors of the URX blog share their impressions of the recent KDD 2014 conference in New York, namely they talk about a system called Google Knowledge Vault, which is actively used by Google to improve search quality, and they talk about thematic modeling (Topic Modeling).
- Why Loggly chose AWS Route 53 rather than ELB
An interesting article from Loggly’s blog about why they chose Amazon Route 53 DNS rather than AWS Elastic Load Balancing (ELB).
- FireBox: building block for Warehouse-Scale Computers in 2020
Video from the FAST'14 conference entitled “FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers”, in which Krste Asanović (University of California, Berkeley) presents his view on future development of Warehouse Scale Computers (WSC).
- About caching on @Scale
The authors of the OpenDNS blog share their impressions of the @Scale conference, organized by Facebook, and talk about the various modern approaches to caching that were described at the conference.
- Facebook completely disconnected one data center for checking fault tolerance
Jay Parikh from Facebook at the @Scale conference held in San Francisco, spoke about an interesting experiment conducted on Facebook, namely, completely disconnecting one of the data centers to check the overall fault tolerance of the system.
- Announcement of Apache Spark 1.1
Announcement of a new version of Apache Spark 1.1 and a description of the main innovations.
- Streaming data in Apache Spark 1.1
An article about the new streaming features in Apache Spark 1.1 and options for using this functionality.
- Statistical computing in Apache Spark 1.1
Describes the advanced features of statistical computing in Apache Spark 1.1.
- Elasticsearch Metrics
A short article from the Compose blog about Elasticsearch metrics.
- News from the Apache Software Foundation Blog
A short list of the latest news from the Apache Software Foundation Blog.
- Weekly digest from Rackspace
Weekly digest of interesting materials from the company Rackspace.
- 10 ways to work with Hadoop through SQL queries
10 tools and ways to work with Hadoop through SQL queries and a short description of each.
- Overview of the most interesting materials on Hadoop No. 87
Traditional digest of the most interesting materials on Hadoop for a week from the Hadoop Weekly portal.
- 174 drivers for MongoDB open source
A large set of 174 drivers for open source code for NoSQL MongoDB database for different programming languages.
- What's New in RavenDB 3.0
Description of the features of the new version of the popular RavenDB database.
- Syncing MongoDB and Elasticsearch
A small article about the Transporter service, which allows you to quickly synchronize MongoDB and ElasticSearch.
- Introduction to HBase
An article that contains video and explanatory material on the topic of HBase - data warehouses from the Hadoop ecosystem, as well as about situations when this solution is worth applying and when it is not worth it.
- Using OCRFile in Cascading and Apache Crunch
An example of using OCRFile for Cascading and Apache Crunch, which can improve their performance.
- We invite you to HadoopKitchen
Announcement of the meeting dedicated to Hadoop, which will be held in the office of Mail.ru. I am also going to attend this event.
- How to succeed in Big Data
A small article with infographics, which will talk about the main factors that influence the company's success in Big Data.
- Vincent Granville about Big Data
Vincent Granville - the author of the DataScienceCentral portal, gives his thoughts and defines the concept of Big Data.
- 5 key ideas for understanding Big Data
An interesting post from the Smart Data Collective portal, which tells you 5 key points that will help you benefit from the data most effectively.