And here comes the most awaited version of Hadoop that eliminates the sole reliance of Hadoop on the MapReduce programming model! It’s Hadoop 2.0 or generally mentioned as Hadoop 2. This version comes up with some new cool features so as to bedim the limitations of the first version. The most recent released version of Hadoop is 2.4.0. It is making headlines due to its biggest highlight known as supporting Automatic Failover of the YARN ResourceManager.
A question or might be a bunch of questions must be popping in your mind about the existence of YARN. We, at Besant Technologies in Chennai, here will give you a brief sketch of all such queries and will suggest you to get started with Hadoop Training as this is what the need of the hour to shake hands with the latest IT market and to earn more bucks!
YARN is an esteemed and weighty hallmark of Hadoop 2.0. Yet another Resource Negotiator is what YARN stands for! It has the capability to put the job scheduling and resource management functions in a separate layer that is just below the data processing layer, thus enables Hadoop 2.0 to run on a large number of applications. The principal part is that YARN can process Petabytes and Terabytes of data that is available in the HDFS without the use of MapReduce. It makes use of non-MapReduce applications such as GIRAPH, MPI.
Another salient trait is HDFS Federation. Besides HDFS, Hadoop cluster storage subsystem has now been generalized to support other frameworks too. Similar is the case with YARN too. The block storage layer has now been generalized by the new storage architecture, which makes it available for use not only by HDFS but also by other storage devices.
Third feature is that in Hadoop 2.0, the Namenode is in High Availability (HA) mode. In Hadoop 1.0, if due to some problem, namenode gets down, the whole Hadoop cluster will break down too! This was actually giving sleepless nights both to the Hadoop researchers as well as the operators because Namenode is something that holds all the metadata. Hadoop 2.0 shows up to rescue us from such breakneck situation. The solution it gave is to make run the two redundant Name nodes in the same cluster – in Active/Passive way. One of this name node will act as the primary Name Node and the other one would be a standby! So in case the former one fails due to any reason, the second one (passive namenode) will become active in such case and takes over the situation.
The Ameliorated Resource Utilization feature is fourth in this list! As YARN divided the major functionalities (Job scheduling and resource management) into 2 different daemons, it resulted into improved resource utilization. These 2 daemons are –
- Global RM (Resource Manager)
- Pre-application AM (Application Master)
Fifth one is Data Node Caching. This one allows faster access to the data. The applications (like Pig, Hive or HBase) as well as the users can now single out the files which are needed to be cached according to them. This helps in quickly reading the queries from the frequently looked up tables.
File System Snapshots is another cool trait to be mention here. It takes the point-in-time image of sub tree of a file system or the entire file system. This helps in three major areas – while taking backup, gives protection against the human errors, during disaster recovery.
There is still a lot more to add into this list. We guess that it’s not required to throw light upon the existence of Hadoop and its market value! Most of you must be mindful about it already. So if you know the score of Hadoop 1.0, then it’s really easy to imagine the scenario of market after this Hadoop 2.0 comes into light. This is high time to tie bond with this quickly swelling technology and give a nice push to your career. Get started with Hadoop Training in Chennai with Besant Technologies, which is having the set of best Hadoop professionals to train you!