Advertisement

Monday, May 7, 2018

Hadoop V2 - Adding New Node

In this blog I discuss how to add a Datanode to cluster

I am adding node d4n to the cluster

Step 1

[As root - Passwordless ssh setup on namenode and snn]
ssh-copy-id d4n
[As hdfs - Passwordless ssh setup on namenode and snn]
ssh-copy-id d4n

[As yarn,mapred,spark - Passwordless ssh setup on rm]
ssh-copy-id d4n

I now refer to one of my previous blogs to complete the pre-req setup. This includes completion of system level configurations in order to support hadoop installation.
This includes user creation, groups creation and other setup required.
Hadoop V2 - Pre-req Completion



Step 2
[As root - copy hadoop  on d4n ]
cd /usr/local
scp -r nn:/usr/local/hadoop-2.7.5 .


Step 3
[As root - conf files]
mkdir /etc/hadoop
cd /etc/hadoop
scp -r nn:/etc/hadoop/conf .
chmod -R 775 /etc/hadoop/


Step 4
[As root -  soft link creation]
ln -s /usr/local/hadoop-2.7.5 /usr/local/hadoop
ln -s /etc/hadoop/conf /usr/local/hadoop-2.7.5/etc/hadoop


Step 5
[As root - Directories creation]
mkdir -p /opt/HDPV2/logs /opt/HDPV2/pids  /opt/HDPV2/1 /opt/HDPV2/2  /opt/HDPV2/tmp
chmod 775 /opt/HDPV2/logs /opt/HDPV2/pids  /opt/HDPV2/1 /opt/HDPV2/2  /opt/HDPV2/tmp
chown hdfs:hadoop /opt/HDPV2/logs /opt/HDPV2/pids  /opt/HDPV2/1 /opt/HDPV2/2  /opt/HDPV2/tmp



@ this point your hadoop node is ready
Now comes the easy part 


Step 6
[As root - Update conf files on Namenode]

Update your hdfs-site.xml file
<property>
        <name>dfs.hosts</name>
        <value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>


Similarly for yarn-site.xml
<property>
        <name>yarn.resourcemanager.node.include-path</name>
        <value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>


Though I have already done this as part of my initial installation, you might want to do so to secure your installation and to allow only specific hosts to connect to nn.

Now Update your dfs.hosts.include file and slaves file in the same directory to include the new host
cat slaves
d1n
d2n
d3n
d4n

Once done distribute slaves and dfs.hosts.include on nn,snn and rm

Step 7
[As hdfs on nn and snn]
hdfs dfsadmin -refreshNodes
Note - You might need to restart your snn to take this effect.
[As yarn on rm]
yarn rmadmin -refreshNodes

Step 8
[As hdfs -  Start hadoop on d4n]
hadoop-daemon.sh start datanode

[As yarn -  start nodemanager on d4n]
yarn-daemon.sh start nodemanager

Step 9
Verify the daemons running
[As yarn - on namenode]
yarn node -all -list
[As hdfs - on namenode]
hdfs dfsadmin -report live


Step 10
To configure spark
Follow Blog for Spark Configuration, this is for complete cluster, but you can pretty much extend the same for single node addition


The key change that is required is spark slaves configuration file
[As root on rm]
cd /etc/spark/conf
Append d4n to slaves file.

Step 11    
[As spark - on d4n]
start-slave.sh spark://rm.novalocal:7077

This will start spark worker on d4n
You can verify the status from http://rm:8080 (WebUI to spark)

Step 12  
Finally it's a good idea to run your balancer utility now.
hdfs balancer -threshold 1 

No comments:
Write comments