A Data Scientist's blog: GraphLab: A parallel framework for machine learning (ML)

Thursday, May 17, 2012

GraphLab: A parallel framework for machine learning (ML)

GraphLab: A parallel framework for machine learning (ML)

GraphLab is a powerful new system (A Tool) for designing and implementing parallel algorithms in machine learning. At the beginning it has been developed for scientific purposes.

The GraphLab is a flexible tool developed in Java 1.6 which provide graph facilities.These facilities have been developed and used to implement CAD (computer added design) design flows for embedded system design and more precisely hardware IP generation.

The GraphLab tool includes many design flows for High-Level Synthesis and many more functionality's. Its key features for hardware designers are:

High-level synthesis under latency constraint,
High-level synthesis under area constraint (tutorial),
High-level synthesis using non-uniform word-length (tutorial),
High-level synthesis for multimode design (mutually exlusive applications sharing resources in a single design),
High-level synthesis using redundancy techniques for high-reliabilty applications,
Full pipeline design generation for high throughtput applications,
FSM controller optimization for low area and low power design

The other input labguages supported are: MATLAB, C/C++ and other input languages used in the GraphLab tool.

There is not a released version of GraphLab, since GraphLab is a tool which increases everyday its possibilities with new functionnalities.

GraphLab Vs MapReduce

MapReduce abstraction, is defined in two parts:

A Map stage which performs computation on indepedent problems which can be solved in isolation, and
2. A Reduce stage which combines the results.

GraphLab provides a similar analog to the Map in the form of an Update Function. The Update Function however, is able to read and modify overlapping sets of data (program state) in a controlled fashion as defined by the user provided data graph. The user provided data graph represents the program state with arbitrary blocks of memory associated with each vertex and edges. In addition the update functions can be recursively triggered with one update function spawning the application of update functions to other vertices in the graph enabling dynamic iterative computation. GraphLab uses powerful scheduling primitives to control the order update functions are executed.
The GraphLab analog to Reduce is the Sync Operation. The Sync Operation also provides the ability to perform reductions in the background while other computation is running. Like the update function sync operations can look at multiple records simultaneously providing the ability to operate on larger dependent contexts.

GraphLab performs around x50 - x100 faster than hadoop based Mahout (Is a framework for machine learning and part of the Apache Foundation. A sub-framework of Mahout is Taste used specifically for collaborative filtering).

A Data Scientist's blog

Search This Blog

Thursday, May 17, 2012

GraphLab: A parallel framework for machine learning (ML)

No comments:

Post a Comment