This is in continuation with my last blog Hadoop Software Setup and Environment Configuration
cd /etc/hadoop/conf
[As root or sudo hduser]
Make changes in hadoop-env.sh as below
cat hadoop-env.sh
export JAVA_HOME=/usr/java/latest ## CustomSet
export HADOOP_LOG_DIR=/opt/HDPV1/logs #CustomSet
export HADOOP_PID_DIR=/opt/HDPV1/pids #CustomSet
Contents of core-site.xml (Only property section)
<property>
<name>fs.default.name</name>
<value>hdfs://nn:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>600</value>
</property>
Contents of hdfs-site.xml (Only property section)
<property>
<name>dfs.http.address</name>
<value>nn:50070</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/HDPV1/1/dfs/nn,/opt/HDPV1/2/dfs/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/HDPV1/1/dfs/dn,/opt/HDPV1/2/dfs/dn</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>snn:50090</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/opt/HDPV1/1/dfs/snn</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>1048576</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>4294967296</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>20</value>
</property>
<property>
<name>dfs.hosts</name>
<value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/etc/hadoop/conf/dfs.hosts.exclude</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
Contents of slaves file
cat slaves
d1n
d2n
d3n
d4n
## The Contents of slaves and include are purposefully let in different formats. The contents of include must be in FQDN format, as this is how hadoop datanodes daemones register themselves with NN.
However Slaves file is used for ssh usage and start of hadoop daemones by start-dfs.sh (start-all.sh)
Contens of dfs.hosts.include
cat dfs.hosts.include
d1node.cluster.com
d2node.cluster.com
d3node.cluster.com
d4node.cluster.com
Contents of Masters file
(Remember Masters on SNN should be NN - For Failover)
[hduser@namenode conf]$ cat masters
snn
[On Name Node - As hduser]
NameNode Format
hadoop namenode -format
18/02/20 10:19:44 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/HDPV1/1/dfs/nn/current/edits
18/02/20 10:19:44 INFO common.Storage: Storage directory /opt/HDPV1/1/dfs/nn has been successfully formatted.
18/02/20 10:19:44 INFO common.Storage: Image file /opt/HDPV1/2/dfs/nn/current/fsimage of size 112 bytes saved in 0 seconds.
18/02/20 10:19:44 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/HDPV1/2/dfs/nn/current/edits
18/02/20 10:19:44 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/HDPV1/2/dfs/nn/current/edits
18/02/20 10:19:44 INFO common.Storage: Storage directory /opt/HDPV1/2/dfs/nn has been successfully formatted.
18/02/20 10:19:44 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode.cluster.com/192.168.10.51
************************************************************/
[As root]
Copy the Configuration to all the nodes
# for i in $(cat /tmp/hosts) ;do scp hadoop-env.sh dfs.hosts.include core-site.xml hdfs-site.xml slaves masters ${i}:/etc/hadoop/conf/ ; done
[As root - Give Permissions[
# for i in $(cat /tmp/hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop ; done;
[As hduser - On NameNode]
start-dfs.sh
starting namenode, logging to /opt/HDPV1/logs/hadoop-hduser-namenode-namenode.cluster.com.out
d1n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d1node.cluster.com.out
d3n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d3node.cluster.com.out
d4n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d4node.cluster.com.out
d2n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d2node.cluster.com.out
snn: starting secondarynamenode, logging to /opt/HDPV1/logs/hadoop-hduser-secondarynamenode-snamenode.cluster.com.out
Verify Java processes (Hadoop Processes)
[As hduser - On NameNode]
##for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | grep -vi jps; echo' ; done;
namenode.cluster.com
28557 NameNode
snamenode.cluster.com
13643 SecondaryNameNode
d1node.cluster.com
4476 DataNode
d2node.cluster.com
7285 DataNode
d3node.cluster.com
1928 DataNode
d4node.cluster.com
17210 DataNode
@This point Hadoop Cluster is up and running with 1 NN, 1 SNN and 4 DN.
hadoop dfsadmin -report
Configured Capacity: 133660540928 (124.48 GB)
Present Capacity: 133660540928 (124.48 GB)
DFS Remaining: 133660311552 (124.48 GB)
DFS Used: 229376 (224 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)
Name: 192.168.10.54:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018
Name: 192.168.10.57:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:49 CET 2018
Name: 192.168.10.55:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018
Name: 192.168.10.58:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018
# Finally Provide permissions 777 to /tmp/ on HDFS
[hduser@namenode ~]$ hadoop fs -chmod 777 /tmp
cd /etc/hadoop/conf
[As root or sudo hduser]
Make changes in hadoop-env.sh as below
cat hadoop-env.sh
export JAVA_HOME=/usr/java/latest ## CustomSet
export HADOOP_LOG_DIR=/opt/HDPV1/logs #CustomSet
export HADOOP_PID_DIR=/opt/HDPV1/pids #CustomSet
Contents of core-site.xml (Only property section)
<property>
<name>fs.default.name</name>
<value>hdfs://nn:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>600</value>
</property>
Contents of hdfs-site.xml (Only property section)
<property>
<name>dfs.http.address</name>
<value>nn:50070</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/HDPV1/1/dfs/nn,/opt/HDPV1/2/dfs/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/HDPV1/1/dfs/dn,/opt/HDPV1/2/dfs/dn</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>snn:50090</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/opt/HDPV1/1/dfs/snn</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>1048576</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>4294967296</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>20</value>
</property>
<property>
<name>dfs.hosts</name>
<value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/etc/hadoop/conf/dfs.hosts.exclude</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
Contents of slaves file
cat slaves
d1n
d2n
d3n
d4n
## The Contents of slaves and include are purposefully let in different formats. The contents of include must be in FQDN format, as this is how hadoop datanodes daemones register themselves with NN.
However Slaves file is used for ssh usage and start of hadoop daemones by start-dfs.sh (start-all.sh)
Contens of dfs.hosts.include
cat dfs.hosts.include
d1node.cluster.com
d2node.cluster.com
d3node.cluster.com
d4node.cluster.com
Contents of Masters file
(Remember Masters on SNN should be NN - For Failover)
[hduser@namenode conf]$ cat masters
snn
[On Name Node - As hduser]
NameNode Format
hadoop namenode -format
18/02/20 10:19:44 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/HDPV1/1/dfs/nn/current/edits
18/02/20 10:19:44 INFO common.Storage: Storage directory /opt/HDPV1/1/dfs/nn has been successfully formatted.
18/02/20 10:19:44 INFO common.Storage: Image file /opt/HDPV1/2/dfs/nn/current/fsimage of size 112 bytes saved in 0 seconds.
18/02/20 10:19:44 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/HDPV1/2/dfs/nn/current/edits
18/02/20 10:19:44 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/HDPV1/2/dfs/nn/current/edits
18/02/20 10:19:44 INFO common.Storage: Storage directory /opt/HDPV1/2/dfs/nn has been successfully formatted.
18/02/20 10:19:44 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode.cluster.com/192.168.10.51
************************************************************/
[As root]
Copy the Configuration to all the nodes
# for i in $(cat /tmp/hosts) ;do scp hadoop-env.sh dfs.hosts.include core-site.xml hdfs-site.xml slaves masters ${i}:/etc/hadoop/conf/ ; done
[As root - Give Permissions[
# for i in $(cat /tmp/hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop ; done;
[As hduser - On NameNode]
start-dfs.sh
starting namenode, logging to /opt/HDPV1/logs/hadoop-hduser-namenode-namenode.cluster.com.out
d1n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d1node.cluster.com.out
d3n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d3node.cluster.com.out
d4n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d4node.cluster.com.out
d2n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d2node.cluster.com.out
snn: starting secondarynamenode, logging to /opt/HDPV1/logs/hadoop-hduser-secondarynamenode-snamenode.cluster.com.out
Verify Java processes (Hadoop Processes)
[As hduser - On NameNode]
##for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | grep -vi jps; echo' ; done;
namenode.cluster.com
28557 NameNode
snamenode.cluster.com
13643 SecondaryNameNode
d1node.cluster.com
4476 DataNode
d2node.cluster.com
7285 DataNode
d3node.cluster.com
1928 DataNode
d4node.cluster.com
17210 DataNode
@This point Hadoop Cluster is up and running with 1 NN, 1 SNN and 4 DN.
hadoop dfsadmin -report
Configured Capacity: 133660540928 (124.48 GB)
Present Capacity: 133660540928 (124.48 GB)
DFS Remaining: 133660311552 (124.48 GB)
DFS Used: 229376 (224 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)
Name: 192.168.10.54:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018
Name: 192.168.10.57:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:49 CET 2018
Name: 192.168.10.55:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018
Name: 192.168.10.58:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018
# Finally Provide permissions 777 to /tmp/ on HDFS
[hduser@namenode ~]$ hadoop fs -chmod 777 /tmp
No comments:
Write comments