Qizmt - an analogue of MapReduce for Windows

Interesting yesterday announcement from Myspace.com

Today we are opening at Open Source Qizmt, an internal distributed computing framework created by the Data Mining team in Myspace. Qizmt can be used for numerous operations that require the processing of large amounts of data. Such as filtering in the recommendation system and analytics.

Some sources have already reported on this and wrote that this is a framework for a recommendation system. This is not true. This is a full MapReduce implementation written for Windows.

Not so often, .NET enthusiasts are faced with open source projects of this level. Despite the fact that the system declared as Alpha, quite a lot of functionality is declared (which is not surprising, since it seems to be working on the myspace framework)

Rapid development of mapreduce jobs in C #
Easy installer
Built-in IDE / Debugger (including step through debugging jobs on the cluster)
From any machine in the cluster:
Cluster Assembly Cache (CAC) - .NET assemblies cache for mapreduce jobs
3 types of jobs:
- Mapreduce - set logic for large amounts of data
- Remote - for those tasks that do not fit the mapreducer template
- Local - orchestration of connections between Mapreduce and Remote jobs
3 ways to exchange data in mapreduce

- Sorted - key / value pairs are evenly sorted by cluster
- Grouped - unsorted but similar key / value pairs on one reducer
- Hash sorted - super fast way to sort random data

All this looks pretty impressive, although I think now it’s not really important what such frameworks are written on. All the same, they are used by platform independent methods - lightweight services a la REST / REST2. They say the same Bing uses Hadoop . But in any case, it's nice that colleagues from Myspace shared the code.

Yes, google code

Tags:

Qizmt - an analogue of MapReduce for Windows

Also popular now: