Comparison between Hadoop 1.X vs Hadoop 2.X vs Hadoop 3.X

Hadoop 1.X	Hadoop 2.X	Hadoop 3.X
Hadoop 1.x was released in 2011	Hadoop 2.x released in 2012	Hadoop 3.x released in 2017
It introduced MapReduce and HDFS. That is to say, the MapReduce frameowrk is used as data processing and for resource management also.	YARN (Yet another resource negotiator) added for better resource management. As a result, it enabled multi-tenancy. Therefore, the same cluster can be used by MapReduce as well as by some other processes using YARN.	In Hadoop 3.x, the YARN resource model is generalized to support user-defined resource types beyond CPU and memory. For example, the administrator can define resources like GPUs, software licenses, or locally-attached storage. YARN tasks can then be scheduled based on the availability of these resources.
Supports single tenancy only	Supports multiple tenants using YARN	Multiple tenants are supported here.
Hadoop 1.x uses Master-Slave architecture that consists of a single master and multiple slaves. So, in case the master node gets failed then the entire clusters become unavailable.	Hadoop 2.x is also a Master-Slave architecture. However, this consists of multiple masters that includes active namenode and standby namenode. So, in this case if master node get failed then the standby master node will take over it. As a result, hadoop 2.x fixes the problem of a single point of failure.	It added supports for multiple active namenodes
Hadoop 1.x is limited to 4000 nodes per cluster.	It supports up to 10000 nodes in a cluster.	The scalability is improved in Hadoop 3.x and it can have more than 10000 nodes in one cluster.
	Manual intervention is needed for namenode recovery.	We don’t need manual intervention for namenode recovery.
	Java 7 is the minimum supported version	Java 8 is the minimum supported version.
	It supports HDFS(default), FTP, Amazon S3 and Windows Azure Storage Blobs (WASB) file systems.	All file systems including Microsoft Azure Data Lake filesystem is compatible with Hadoop 3.x.
	It uses 3x replication scheme that results in 200% storage overhead.	Hadoop 3 uses eraser encoding in HDFS that helps to reduce the storage overhed. It has 50% storage overhead only.
		It added support for GPU hardware that can be used to execute deep leanring algorithms on a Hadoop cluster.

Comparision between Hadoop 1, Hadoop 2 and Hadoop 3

Now, to summarize the above points, we can have a look at the below image:

Difference between Hadoop 1.x, Hadoop 2.x and Hadoop 3.x

Thanks for the reading. Please share your inputs in the comment section.

Rate This

[Total: 3 Average: 3.7]

123456

Jul 25, 2023 at 3:15 pm

Hadoop 1.X Hadoop 2.X Hadoop 3.X
Hadoop 1.x was released in 2011 Hadoop 2.x released in 2012 Hadoop 3.x released in 2017
It introduced MapReduce and HDFS. That is to say, the MapReduce frameowrk is used as data processing and for resource management also. YARN (Yet another resource negotiator) added for better resource management. As a result, it enabled multi-tenancy. Therefore, the same cluster can be used by MapReduce as well as by some other processes using YARN. In Hadoop 3.x, the YARN resource model is generalized to support user-defined resource types beyond CPU and memory. For example, the administrator can define resources like GPUs, software licenses, or locally-attached storage. YARN tasks can then be scheduled based on the availability of these resources.
Supports single tenancy only Supports multiple tenants using YARN Multiple tenants are supported here.
Hadoop 1.x uses Master-Slave architecture that consists of a single master and multiple slaves. So, in case the master node gets failed then the entire clusters become unavailable. Hadoop 2.x is also a Master-Slave architecture. However, this consists of multiple masters that includes active namenode and standby namenode. So, in this case if master node get failed then the standby master node will take over it. As a result, hadoop 2.x fixes the problem of a single point of failure. It added supports for multiple active namenodes
Hadoop 1.x is limited to 4000 nodes per cluster. It supports up to 10000 nodes in a cluster. The scalability is improved in Hadoop 3.x and it can have more than 10000 nodes in one cluster.
Manual intervention is needed for namenode recovery. We don’t need manual intervention for namenode recovery.
Java 7 is the minimum supported version Java 8 is the minimum supported version.
It supports HDFS(default), FTP, Amazon S3 and Windows Azure Storage Blobs (WASB) file systems. All file systems including Microsoft Azure Data Lake filesystem is compatible with Hadoop 3.x.
It uses 3x replication scheme that results in 200% storage overhead. Hadoop 3 uses eraser encoding in HDFS that helps to reduce the storage overhed. It has 50% storage overhead only.
It added support for GPU hardware that can be used to execute deep leanring algorithms on a Hadoop cluster.

Difference between Hadoop 1.x, Hadoop 2.x and Hadoop 3.x

Comparison between Hadoop 1.X vs Hadoop 2.X vs Hadoop 3.X

Like this:

Related Articles

1 thought on “Difference between Hadoop 1.x, Hadoop 2.x and Hadoop 3.x”

Leave a Comment Cancel Reply

Comparison between Hadoop 1.X vs Hadoop 2.X vs Hadoop 3.X

Share this:

Like this:

Related Articles

1 thought on “Difference between Hadoop 1.x, Hadoop 2.x and Hadoop 3.x”

Leave a Comment Cancel Reply