Hadoop 2.0: Let’s Yarn!
“Multiple Application to run on the same platform”
YARN is a re-architecture of Hadoop that allows multiple applications to run on the same platform.
With YARN, applications run “in” Hadoop, instead of “on” Hadoop. YARN allows the simultaneous execution of multiple applications on HDFS, the distributed file system while providing better monitoring of data throughout its lifecycle. Analyze batch processes, but also data streams and can also analyze interactive query.
With the increasing use of big data in business, comes the need to expand the functionality of hadoop nature from batch processing into various types of processes, such as interactive, stream, online, and so forth. In addition, with the increasing amount of resource that is managed in hadoop, there’s a need for a system that allows the utilization of these resources more efficiently. These needs can be met by the presence of Yarn, which marks the birth of Hadoop 2.
YARN is a re-architecture of Hadoop that allows multiple applications to run on the same platform. With YARN, applications run “in” Hadoop, instead of “on” Hadoop.
YARN allows the simultaneous execution of multiple applications on HDFS, the distributed file system while providing better monitoring of data throughout its lifecycle. It enable us to perform not only batch processes, but also data streams and interactive queries with hadoop.
Yarn offers the following trait:
Flexible – Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming
Efficient – Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service
Shared – Provides a stable, reliable, secure foundation and shared operational services across multiple workloads
YARN enhances the power of a Hadoop compute cluster in the following ways:
The processing power in data centers continues to grow quickly. Because YARN ResourceManager focuses exclusively on scheduling, it can manage those larger clusters much more easily.
- Compatibility with MapReduce
Existing MapReduce applications and users can run on top of YARN without disruption to their existing processes.
- Improved cluster utilization
The Resource Manager is a pure scheduler that optimizes cluster utilization according to criteria such as capacity guarantees, fairness, and SLAs. Also, unlike before, there are no named map and reduce slots, which helps to better utilize cluster resources.
- Support for workloads other than MapReduce
Additional programming models such as graph processing and iterative modeling are now possible for data processing. These added models allow enterprises to realize near real-time processing and increased ROI on their Hadoop investments.
With MapReduce becoming a user-land library, it can evolve independently of the underlying resource manager layer and in a much more agile manner.
Implementation of Yarn (and therefore Hadoop 2) opens wider possibilities for business to gain advantages from big data.