How To Connect Datanode(Slave)To NameNode(Master)

5 min readNov 4, 2020

Master node

This node manages all services and operations. One Master node is enough for a cluster but having the secondary one increases scalability and high availability. The main operation Master node does is, running NameNode process that cordinates Hadoop storage operations.

Slave node

This node provides required infrastructure such as CPU, memory and local disk for storing and processing data. This does all slave processes; the main is running DataNode process. Generally cluster comprises at least three Slave nodes but cluster can be easily scaled up by adding many number of Salve nodes.

Namenode

This node is a part of Master node and responsible for coordinating HDFS functions. For an example, when a location of a file block is requested, Master node gets the location from the Namenode process.

Datanode

Datanode is a process that handles actual reading and writing of data blocks from/to storage. This is under Slave node and it is a slave for Namenode.

Steps :-

First we have to configure the HDFS cluster inside the Datanode and the Namenode.

For this we need of hadoop and java software to setup the hadoop cluster.
For downloading both the hadoop and java software use the below link.

Meet Google Drive - One place for all your files

Google Drive is a free way to keep your files backed up and easy to reach from any phone, tablet, or computer. Start…

drive.google.com

Step 1: Installation of hadoop and java S/W

For installing the hadoop and java software inside the cluster use the below command :

Note:- It is same for both the datanode and the namenode

Rpm -ivh hadoop-1.2.1–1.x86_64.rpm — force

Rpm -ivh jdk-8u171-linux-x86.rpm — force

Inside NameNode :

Step 2: Configuration of “hdfs-site.xml” and “core-site. xml”.

Now we will configure the “hdfs-site.xml” and “core-site. xml” inside the “cd /etc/hadoop”

Inside the “hdfs-site.xml” we will give the below code :

<property>
<name>dfs.name.dir</name>
<value>/nn3</value>
</property>

The name I have given inside the <name> </name>, the same name we have to give all the Namenodes . But we can give any name inside the <value></value> I have given “/nn3". You can give any eg. /nn,/nn1 etc.

**namenode hdfs-site.xml configuration**

Inside the “core-site.xml” we will give the below code :

<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9001</value>
</property>

Same for “core-site.xml” , same name we have to give that I have given to and in the <value></value> we have to give the IP of datanode i.e. Private or Public IP.

Inside Namenode we gave neutral-IP 0.0.0.0 you can say that it is a default gateway to reach/connect to any other system IP both privately and publicly.

In my case I am using Port : 9001 you can give any port number.

**namenode core-site.xml configuration**

Before connecting any data node or using any storage we need to format NameNode directory using:

hadoop namenode -format

Step 3: Start Namenode services

To start/stop the Namenode services use below command:

hadoop-daemon.sh start namenode
hadoop-daemon.sh stop namenode

we can verify it by using “Jps” command, the namenode services started or not.

we can check through “netstat -tnlp” command on which port it is available.

Here, All the Namenode services is started .
Now we will go to the Datanode Services.

Inside DataNode :

Step 4: Configuration of “hdfs-site.xml” and “core-site. xml”.

Same, we will configure the “hdfs-site.xml” and “core-site. xml” in the “cd /etc/hadoop” in datanode.

Inside the “hdfs-site.xml” we will give the below code :

<property>
<name>dfs.data.dir</name>
<value>/dn6</value>
</property>

The name I have given inside the <name> </name>, the same name we have to give all the datanodes . But we can give any name inside the <value></value> I have given “/dn6”. You can give any eg. /dn,/dn1 etc.