In this blog I discuss setting up Namenode High availability using QJM (Quorum Journal Manager)
Functioning of QJM
1. 3 (or 5 or odd number) nodes running QJM
2. NN writes Edit logs to these nodes
3. SNN (Standby) reads and applies changes from these edit logs
4. ZKFC (Zookeper failover controller) running on NN AND SNN
5. whenever failure detected a new NN is elected thus
Failover can be manual or automatic
In this blog we only do manual failover, in the following blogs I will add on configuration for automatic failvoer
Stop cluster and spark
[nn-hdfs] stop-dfs.sh
[rm-yarn] - stop-yarn.sh
[rm-spark] - stop-all.sh
[rm-mapred] - mr-jobhistory-daemon.sh stop historyserver
Setting Up Instructions
hdfs-site.xml
1. Create logical nameservice
<property>
<name>dfs.nameservices</name>
<value>devcluster</value>
</property>
2. Setup path prefix (core_site.xml)
<property>
<name>fs.defaultFS</name>
<value>hdfs://devcluster</value>
</property>
3. Create Namenode identifiers
<property>
<name>dfs.ha.namenodes.devcluster</name>
<value>nn1,nn2</value>
</property>
4. Set Journal Nodes
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value> <value>qjournal://d1.novalocal:8485;d2.novalocal:8485;d3.novalocal:8485/devcluster</value>
</property>
Here make sure that hosts are fully qualified and resolvable.
5. Provide shared edits directory on JN'setting
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/HDPV2/journal/node/local/data</value>
</property>
This is the folder on nodes which will act like journal ndoes
6. Configure RPC Address
<property>
<name>dfs.namenode.rpc-address.devcluster.nn1</name>
<value>nn:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.devcluster.nn2</name>
<value>snn:8020</value>
</property>
7. Configure HTTP Listen addresses
<property>
<name>dfs.namenode.http-address.devcluster.nn1</name>
<value>nn:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.devcluster.nn2</name>
<value>snn:50070</value>
</property>
8. Configure Java class for Failover
This class helps to determine which namenode is active when client contacts to namenode service.
<property>
<name> dfs.client.failover.proxy.provider.devcluster </name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
9. Configure Fencing
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
Note - you must install fuser (package psmisc)
9.1 Specify Fencing keys
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
10. Finally remove properties from your hdfs-site file. These are no longer required as I have already provided these for individual nodes via cluster and service id
(Also remove rpc configuration if done)
dfs.http.address
dfs.secondary.http.address
11. Now scp hdfs-site.xml to snn, d1n, d2n, d3n and rm
[As root]
cd $CONF
for i in $(cat /tmp/all_hosts) ;do scp hdfs-site.xml core-site.xml ${i}:/etc/hadoop/conf/ ; done
for i in $(cat /tmp/all_hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop/conf ; done;
12. Setup Journal Nodes d1n,d2n and d3n
[As root]
mkdir -p /opt/HDPV2/journal/node/local/data
chown -R hdfs:hadoop /opt/HDPV2/journal
hdfs hadoop-daemon.sh start journalnode
13. [As hdfs start ZKFC on nn]
hdfs zkfc –formatZK -force
[hdfs@nn conf]$ hdfs zkfc -formatZK -force
18/04/18 04:58:40 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at nn/192.168.2.101:8020
18/04/18 04:58:40 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at nn/192.168.2.101:8020. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller.
14. Start NameNode [As hdfs on nn]
hadoop-daemon.sh start namenode
15. Bootstrap Standby
hdfs namenode -bootstrapStandby
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: devcluster
Other Namenode ID: nn1
Other NN's HTTP address: http://nn:50070
Other NN's IPC address: nn/192.168.2.101:8020
Namespace ID: 1878602801
Block pool ID: BP-1256862429-192.168.2.101-1523506309307
Cluster ID: CID-65c51f92-e1c0-4a17-a921-ae0e61f3251f
Layout version: -63
isUpgradeFinalized: true
16. Start Secondary NameNode [As hdfs on snn]
hadoop-daemon.sh start namenode
17. Stop both namenodes [As hdfs on nn and snn]
hadoop-daemon.sh stop namenode
17.1 Stop all journal daemons. [As hdfs on d1n,d2n and d3n]
hadoop-daemon.sh stop journalnode
18. Start the Cluster [As hdfs on nn]
start-dfs.sh
19. Force Transition one node to Active
This is an important discussion, when you first start, you fill see both your nodes are in standby mode.
This is because we have not configured automatic failover and zoo keeper controller processes.
I will cover that next, but as of now let's focus on task at hand.
Check Service State
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn1
standby
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn2
standby
hdfs haadmin -failover -forcefence -forceactive nn2 nn1
Failover from nn2 to nn1 successful
Remember always - When you do a forced failover, other node must be fenced, so it will be shutdown automatically. You can start it manually.
Now check the Service State
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn1
active
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn2
standby
You can look into webui of each to confirm the status.
Get configuration of namenodes
[hdfs@nn ~]$ hdfs getconf -namenodes
nn snn
Start Cluster
[rm-yarn] - start-yarn.sh
[rm-spark] - start-all.sh
[rm-mapred] - mr-jobhistory-daemon.sh start historyserver
The directory structure of the QJM directory is similar to namenodes metadata directory. however, it has only edits, as they are only required by Standby Node to rollforward it's in memory fsimage.
/opt/HDPV2/journal/node/local/data/devcluster/current
[root@d1n current]# ls -rlht
total 1.1M
-rw-r--r--. 1 hdfs hadoop 155 Apr 18 04:39 VERSION
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 04:39 edits_0000000000000000701-0000000000000000702
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:23 edits_0000000000000000703-0000000000000000704
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:25 edits_0000000000000000705-0000000000000000706
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:26 edits_0000000000000000707-0000000000000000708
-rw-r--r--. 1 hdfs hadoop 2 Apr 18 05:26 last-promised-epoch
drwxr-xr-x. 2 hdfs hadoop 6 Apr 18 05:26 paxos
-rw-r--r--. 1 hdfs hadoop 2 Apr 18 05:26 last-writer-epoch
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:28 edits_0000000000000000709-0000000000000000710
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:30 edits_0000000000000000711-0000000000000000712
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:32 edits_0000000000000000713-0000000000000000714
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:34 edits_0000000000000000715-0000000000000000716
-rw-r--r--. 1 hdfs hadoop 1.0M Apr 18 05:34 edits_inprogress_0000000000000000717
-rw-r--r--. 1 hdfs hadoop 8 Apr 18 05:34 committed-txid
Functioning of QJM
1. 3 (or 5 or odd number) nodes running QJM
2. NN writes Edit logs to these nodes
3. SNN (Standby) reads and applies changes from these edit logs
4. ZKFC (Zookeper failover controller) running on NN AND SNN
5. whenever failure detected a new NN is elected thus
Failover can be manual or automatic
In this blog we only do manual failover, in the following blogs I will add on configuration for automatic failvoer
Stop cluster and spark
[nn-hdfs] stop-dfs.sh
[rm-yarn] - stop-yarn.sh
[rm-spark] - stop-all.sh
[rm-mapred] - mr-jobhistory-daemon.sh stop historyserver
Setting Up Instructions
hdfs-site.xml
1. Create logical nameservice
<property>
<name>dfs.nameservices</name>
<value>devcluster</value>
</property>
2. Setup path prefix (core_site.xml)
<property>
<name>fs.defaultFS</name>
<value>hdfs://devcluster</value>
</property>
3. Create Namenode identifiers
<property>
<name>dfs.ha.namenodes.devcluster</name>
<value>nn1,nn2</value>
</property>
4. Set Journal Nodes
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value> <value>qjournal://d1.novalocal:8485;d2.novalocal:8485;d3.novalocal:8485/devcluster</value>
</property>
Here make sure that hosts are fully qualified and resolvable.
5. Provide shared edits directory on JN'setting
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/HDPV2/journal/node/local/data</value>
</property>
This is the folder on nodes which will act like journal ndoes
6. Configure RPC Address
<property>
<name>dfs.namenode.rpc-address.devcluster.nn1</name>
<value>nn:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.devcluster.nn2</name>
<value>snn:8020</value>
</property>
7. Configure HTTP Listen addresses
<property>
<name>dfs.namenode.http-address.devcluster.nn1</name>
<value>nn:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.devcluster.nn2</name>
<value>snn:50070</value>
</property>
8. Configure Java class for Failover
This class helps to determine which namenode is active when client contacts to namenode service.
<property>
<name> dfs.client.failover.proxy.provider.devcluster </name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
9. Configure Fencing
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
Note - you must install fuser (package psmisc)
9.1 Specify Fencing keys
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
10. Finally remove properties from your hdfs-site file. These are no longer required as I have already provided these for individual nodes via cluster and service id
(Also remove rpc configuration if done)
dfs.http.address
dfs.secondary.http.address
11. Now scp hdfs-site.xml to snn, d1n, d2n, d3n and rm
[As root]
cd $CONF
for i in $(cat /tmp/all_hosts) ;do scp hdfs-site.xml core-site.xml ${i}:/etc/hadoop/conf/ ; done
for i in $(cat /tmp/all_hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop/conf ; done;
12. Setup Journal Nodes d1n,d2n and d3n
[As root]
mkdir -p /opt/HDPV2/journal/node/local/data
chown -R hdfs:hadoop /opt/HDPV2/journal
hdfs hadoop-daemon.sh start journalnode
13. [As hdfs start ZKFC on nn]
hdfs zkfc –formatZK -force
[hdfs@nn conf]$ hdfs zkfc -formatZK -force
18/04/18 04:58:40 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at nn/192.168.2.101:8020
18/04/18 04:58:40 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at nn/192.168.2.101:8020. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller.
14. Start NameNode [As hdfs on nn]
hadoop-daemon.sh start namenode
15. Bootstrap Standby
hdfs namenode -bootstrapStandby
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: devcluster
Other Namenode ID: nn1
Other NN's HTTP address: http://nn:50070
Other NN's IPC address: nn/192.168.2.101:8020
Namespace ID: 1878602801
Block pool ID: BP-1256862429-192.168.2.101-1523506309307
Cluster ID: CID-65c51f92-e1c0-4a17-a921-ae0e61f3251f
Layout version: -63
isUpgradeFinalized: true
16. Start Secondary NameNode [As hdfs on snn]
hadoop-daemon.sh start namenode
17. Stop both namenodes [As hdfs on nn and snn]
hadoop-daemon.sh stop namenode
17.1 Stop all journal daemons. [As hdfs on d1n,d2n and d3n]
hadoop-daemon.sh stop journalnode
18. Start the Cluster [As hdfs on nn]
start-dfs.sh
19. Force Transition one node to Active
This is an important discussion, when you first start, you fill see both your nodes are in standby mode.
This is because we have not configured automatic failover and zoo keeper controller processes.
I will cover that next, but as of now let's focus on task at hand.
Check Service State
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn1
standby
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn2
standby
hdfs haadmin -failover -forcefence -forceactive nn2 nn1
Failover from nn2 to nn1 successful
Remember always - When you do a forced failover, other node must be fenced, so it will be shutdown automatically. You can start it manually.
Now check the Service State
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn1
active
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn2
standby
Get configuration of namenodes
[hdfs@nn ~]$ hdfs getconf -namenodes
nn snn
Start Cluster
[rm-yarn] - start-yarn.sh
[rm-spark] - start-all.sh
[rm-mapred] - mr-jobhistory-daemon.sh start historyserver
The directory structure of the QJM directory is similar to namenodes metadata directory. however, it has only edits, as they are only required by Standby Node to rollforward it's in memory fsimage.
/opt/HDPV2/journal/node/local/data/devcluster/current
[root@d1n current]# ls -rlht
total 1.1M
-rw-r--r--. 1 hdfs hadoop 155 Apr 18 04:39 VERSION
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 04:39 edits_0000000000000000701-0000000000000000702
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:23 edits_0000000000000000703-0000000000000000704
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:25 edits_0000000000000000705-0000000000000000706
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:26 edits_0000000000000000707-0000000000000000708
-rw-r--r--. 1 hdfs hadoop 2 Apr 18 05:26 last-promised-epoch
drwxr-xr-x. 2 hdfs hadoop 6 Apr 18 05:26 paxos
-rw-r--r--. 1 hdfs hadoop 2 Apr 18 05:26 last-writer-epoch
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:28 edits_0000000000000000709-0000000000000000710
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:30 edits_0000000000000000711-0000000000000000712
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:32 edits_0000000000000000713-0000000000000000714
-rw-r--r--. 1 hdfs hadoop 42 Apr 18 05:34 edits_0000000000000000715-0000000000000000716
-rw-r--r--. 1 hdfs hadoop 1.0M Apr 18 05:34 edits_inprogress_0000000000000000717
-rw-r--r--. 1 hdfs hadoop 8 Apr 18 05:34 committed-txid
No comments:
Write comments