In this blog I discuss how to add a Datanode to cluster
I am adding node d4n to the cluster
Step 1
[As root - Passwordless ssh setup on namenode and snn]
ssh-copy-id d4n
[As hdfs - Passwordless ssh setup on namenode and snn]
ssh-copy-id d4n
[As yarn,mapred,spark - Passwordless ssh setup on rm]
ssh-copy-id d4n
I now refer to one of my previous blogs to complete the pre-req setup. This includes completion of system level configurations in order to support hadoop installation.
This includes user creation, groups creation and other setup required.
Hadoop V2 - Pre-req Completion
Step 2
[As root - copy hadoop on d4n ]
cd /usr/local
scp -r nn:/usr/local/hadoop-2.7.5 .
Step 3
[As root - conf files]
mkdir /etc/hadoop
cd /etc/hadoop
scp -r nn:/etc/hadoop/conf .
chmod -R 775 /etc/hadoop/
Step 4
[As root - soft link creation]
ln -s /usr/local/hadoop-2.7.5 /usr/local/hadoop
ln -s /etc/hadoop/conf /usr/local/hadoop-2.7.5/etc/hadoop
Step 5
[As root - Directories creation]
mkdir -p /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp
chmod 775 /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp
chown hdfs:hadoop /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp
@ this point your hadoop node is ready
Now comes the easy part
Step 6
[As root - Update conf files on Namenode]
Update your hdfs-site.xml file
<property>
<name>dfs.hosts</name>
<value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>
Similarly for yarn-site.xml
<property>
<name>yarn.resourcemanager.node.include-path</name>
<value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>
Though I have already done this as part of my initial installation, you might want to do so to secure your installation and to allow only specific hosts to connect to nn.
Now Update your dfs.hosts.include file and slaves file in the same directory to include the new host
cat slaves
d1n
d2n
d3n
d4n
Once done distribute slaves and dfs.hosts.include on nn,snn and rm
Step 7
[As hdfs on nn and snn]
hdfs dfsadmin -refreshNodes
Note - You might need to restart your snn to take this effect.
[As yarn on rm]
yarn rmadmin -refreshNodes
Step 8
[As hdfs - Start hadoop on d4n]
hadoop-daemon.sh start datanode
[As yarn - start nodemanager on d4n]
yarn-daemon.sh start nodemanager
Step 9
Verify the daemons running
[As yarn - on namenode]
yarn node -all -list
[As hdfs - on namenode]
hdfs dfsadmin -report live
Step 10
To configure spark
Follow Blog for Spark Configuration, this is for complete cluster, but you can pretty much extend the same for single node addition
The key change that is required is spark slaves configuration file -
[As root on rm]
cd /etc/spark/conf
Append d4n to slaves file.
Step 11
[As spark - on d4n]
start-slave.sh spark://rm.novalocal:7077
This will start spark worker on d4n
You can verify the status from http://rm:8080 (WebUI to spark)
Step 12
Finally it's a good idea to run your balancer utility now.
hdfs balancer -threshold 1
I am adding node d4n to the cluster
Step 1
[As root - Passwordless ssh setup on namenode and snn]
ssh-copy-id d4n
[As hdfs - Passwordless ssh setup on namenode and snn]
ssh-copy-id d4n
[As yarn,mapred,spark - Passwordless ssh setup on rm]
ssh-copy-id d4n
I now refer to one of my previous blogs to complete the pre-req setup. This includes completion of system level configurations in order to support hadoop installation.
This includes user creation, groups creation and other setup required.
Hadoop V2 - Pre-req Completion
Step 2
[As root - copy hadoop on d4n ]
cd /usr/local
scp -r nn:/usr/local/hadoop-2.7.5 .
Step 3
[As root - conf files]
mkdir /etc/hadoop
cd /etc/hadoop
scp -r nn:/etc/hadoop/conf .
chmod -R 775 /etc/hadoop/
Step 4
[As root - soft link creation]
ln -s /usr/local/hadoop-2.7.5 /usr/local/hadoop
ln -s /etc/hadoop/conf /usr/local/hadoop-2.7.5/etc/hadoop
Step 5
[As root - Directories creation]
mkdir -p /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp
chmod 775 /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp
chown hdfs:hadoop /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp
@ this point your hadoop node is ready
Now comes the easy part
Step 6
[As root - Update conf files on Namenode]
Update your hdfs-site.xml file
<property>
<name>dfs.hosts</name>
<value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>
Similarly for yarn-site.xml
<property>
<name>yarn.resourcemanager.node.include-path</name>
<value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>
Though I have already done this as part of my initial installation, you might want to do so to secure your installation and to allow only specific hosts to connect to nn.
Now Update your dfs.hosts.include file and slaves file in the same directory to include the new host
cat slaves
d1n
d2n
d3n
d4n
Once done distribute slaves and dfs.hosts.include on nn,snn and rm
Step 7
[As hdfs on nn and snn]
hdfs dfsadmin -refreshNodes
Note - You might need to restart your snn to take this effect.
[As yarn on rm]
yarn rmadmin -refreshNodes
Step 8
[As hdfs - Start hadoop on d4n]
hadoop-daemon.sh start datanode
[As yarn - start nodemanager on d4n]
yarn-daemon.sh start nodemanager
Step 9
Verify the daemons running
[As yarn - on namenode]
yarn node -all -list
[As hdfs - on namenode]
hdfs dfsadmin -report live
Step 10
To configure spark
Follow Blog for Spark Configuration, this is for complete cluster, but you can pretty much extend the same for single node addition
The key change that is required is spark slaves configuration file -
[As root on rm]
cd /etc/spark/conf
Append d4n to slaves file.
Step 11
[As spark - on d4n]
start-slave.sh spark://rm.novalocal:7077
This will start spark worker on d4n
You can verify the status from http://rm:8080 (WebUI to spark)
Step 12
Finally it's a good idea to run your balancer utility now.
hdfs balancer -threshold 1
No comments:
Write comments