Advertisement

Sunday, March 4, 2018

Hadoop V1/V2 - Administrative Commands Starter

In this blog we are going to see some basic administrative commands in Hadoop - 


1. Status Report of DFS (Distributed File Systems)
 hadoop dfsadmin -report
Configured Capacity: 133660540928 (124.48 GB)
Present Capacity: 133660540928 (124.48 GB)
DFS Remaining: 133660270592 (124.48 GB)
DFS Used: 270336 (264 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)

Name: 192.168.10.54:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415065600(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


Name: 192.168.10.57:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 61440 (60 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415073792(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


Name: 192.168.10.55:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 69632 (68 KB)
on DFS Used: 0 (0 KB)
DFS Remaining: 33415065600(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


Name: 192.168.10.58:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415065600(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


2. Check Contents of DFS
[hduser@namenode ~]$ hadoop dfs -ls /
Found 1 items
drwxr-xr-x   - hduser supergroup          0 2018-02-22 04:14 /tmp

3. Create Directory
[hduser@namenode ~]$ hadoop dfs -mkdir /user/vin

4. Copy  and Delete File / Directory in HDFS
[hduser@namenode hadoop]$ ls -l conf/
total 84
-rwxr-xr-x 1 root root 7457 Jul 23  2013 capacity-scheduler.xml
-rwxr-xr-x 1 root root 1095 Jul 23  2013 configuration.xsl
-rwxr-xr-x 1 root root  415 Feb 22 03:48 core-site.xml
-rwxr-xr-x 1 root root   76 Feb 22 03:48 dfs.hosts.include
-rwxr-xr-x 1 root root  327 Jul 23  2013 fair-scheduler.xml
-rwxr-xr-x 1 root root 2497 Feb 22 03:48 hadoop-env.sh
-rwxr-xr-x 1 root root 2052 Jul 23  2013 hadoop-metrics2.properties
-rwxr-xr-x 1 root root 4644 Jul 23  2013 hadoop-policy.xml
-rwxr-xr-x 1 root root 1357 Feb 22 03:48 hdfs-site.xml
-rwxr-xr-x 1 root root 5018 Jul 23  2013 log4j.properties
-rwxr-xr-x 1 root root 2033 Jul 23  2013 mapred-queue-acls.xml
-rwxr-xr-x 1 root root 1359 Feb 25 13:49 mapred-site.xml
-rwxr-xr-x 1 root root    4 Feb 22 03:48 masters
-rwxr-xr-x 1 root root   16 Feb 22 03:48 slaves
-rwxr-xr-x 1 root root 2042 Jul 23  2013 ssl-client.xml.example
-rwxr-xr-x 1 root root 1994 Jul 23  2013 ssl-server.xml.example
-rwxr-xr-x 1 root root  382 Jul 23  2013 taskcontroller.cfg
-rwxr-xr-x 1 root root 3890 Jul 23  2013 task-log4j.properties

Copy file to particular Directory hdfs
[hduser@namenode etc]$ hadoop dfs -put group /user/vin/


Copy file to Directory with target file name
[hduser@namenode etc]$ hadoop dfs -put group /user/vin/mygroupfile

Wild Card Listing of File
[hduser@namenode etc]$ hadoop dfs -ls /user/vin/*group*
-rw-r--r--   3 hduser supergroup        682 2018-03-02 13:19 /user/vin/group
-rw-r--r--   3 hduser supergroup        682 2018-03-02 13:19 /user/vin/mygroupfile


[hduser@namenode hadoop]$ hadoop dfs -put conf/ /user/vin/

[hduser@namenode hadoop]$ hadoop dfs -ls /user/vin/conf
Found 18 items
-rw-r--r--   3 hduser supergroup       7457 2018-03-02 13:08 /user/vin/conf/capacity-scheduler.xml
-rw-r--r--   3 hduser supergroup       1095 2018-03-02 13:08 /user/vin/conf/configuration.xsl
-rw-r--r--   3 hduser supergroup        415 2018-03-02 13:08 /user/vin/conf/core-site.xml
-rw-r--r--   3 hduser supergroup         76 2018-03-02 13:08 /user/vin/conf/dfs.hosts.include
-rw-r--r--   3 hduser supergroup        327 2018-03-02 13:08 /user/vin/conf/fair-scheduler.xml
-rw-r--r--   3 hduser supergroup       2497 2018-03-02 13:08 /user/vin/conf/hadoop-env.sh
-rw-r--r--   3 hduser supergroup       2052 2018-03-02 13:08 /user/vin/conf/hadoop-metrics2.properties
-rw-r--r--   3 hduser supergroup       4644 2018-03-02 13:08 /user/vin/conf/hadoop-policy.xml
-rw-r--r--   3 hduser supergroup       1357 2018-03-02 13:08 /user/vin/conf/hdfs-site.xml
-rw-r--r--   3 hduser supergroup       5018 2018-03-02 13:08 /user/vin/conf/log4j.properties
-rw-r--r--   3 hduser supergroup       2033 2018-03-02 13:08 /user/vin/conf/mapred-queue-acls.xml
-rw-r--r--   3 hduser supergroup       1359 2018-03-02 13:08 /user/vin/conf/mapred-site.xml
-rw-r--r--   3 hduser supergroup          4 2018-03-02 13:08 /user/vin/conf/masters
-rw-r--r--   3 hduser supergroup         16 2018-03-02 13:08 /user/vin/conf/slaves
-rw-r--r--   3 hduser supergroup       2042 2018-03-02 13:08 /user/vin/conf/ssl-client.xml.example
-rw-r--r--   3 hduser supergroup       1994 2018-03-02 13:08 /user/vin/conf/ssl-server.xml.example
-rw-r--r--   3 hduser supergroup       3890 2018-03-02 13:08 /user/vin/conf/task-log4j.properties
-rw-r--r--   3 hduser supergroup        382 2018-03-02 13:08 /user/vin/conf/taskcontroller.cfg

Delete Directory
[hduser@namenode ~]$ hadoop fs -rmr /tmp/o1
Moved to trash: hdfs://nn:8020/tmp/o1

Delete File(s)
[hduser@namenode ~]$ hadoop fs -rm /user/vin/conf/mapred-site.xml
Moved to trash: hdfs://nn:8020/user/vin/conf/mapred-site.xml

[hduser@namenode ~]$ hadoop fs -rm /user/vin/conf/hadoop*
Moved to trash: hdfs://nn:8020/user/vin/conf/hadoop-env.sh
Moved to trash: hdfs://nn:8020/user/vin/conf/hadoop-metrics2.properties
Moved to trash: hdfs://nn:8020/user/vin/conf/hadoop-policy.xml


5. Set Quota on Directory
[hduser@namenode hadoop]$ hadoop dfs -mkdir /user/nish
[hduser@namenode hadoop]$ hadoop dfsadmin -setSpaceQuota 1098304000 /user/nish #1000M Quota

6. Check Quota
[hduser@namenode hadoop]$ hadoop dfs -count -q /user/nish
        none             inf        10983040        10983040            1            0                  0 hdfs://nn:8020/user/nish

7. Put File After Quota
 [hduser@namenode conf]$ ls -l mapred-site.xml
-rwxr-xr-x 1 root root 1359 Feb 25 13:49 mapred-site.xml

[hduser@namenode conf]$ hadoop dfs -put mapred-site.xml /user/nish/
18/03/02 13:16:50 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/nish is exceeded: quota=10983040 diskspace consumed=384.0m

*** Quota Should be atleast Block Size * replication Factor

8. Add Quota to number of files
# First Increase space Quota to 1G
[hduser@namenode etc]$ hadoop dfsadmin -setSpaceQuota 1073741824 /user/nish
Set # of Files to 10
[hduser@namenode etc]$ hadoop dfsadmin -setQuota 10 /user/nish

Check Quota Output
[hduser@namenode etc]$ hadoop dfs -count -q /user/nish
          10               8      1073741824      1073741824            1            1                  0 hdfs://nn:8020/user/nish

9. Put 10 Files After Quota
[hduser@namenode hadoop]$ hadoop dfs -put conf/* /user/nish/
put: org.apache.hadoop.hdfs.protocol.NSQuotaExceededException: The NameSpace quota (directories and files) of directory /user/nish is exceeded: quota=10 file count=11

Check How many Files are placed?
 hadoop dfs -ls /user/nish
Found 9 items
-rw-r--r--   3 hduser supergroup       7457 2018-03-02 13:26 /user/nish/capacity-scheduler.xml
-rw-r--r--   3 hduser supergroup       1095 2018-03-02 13:26 /user/nish/configuration.xsl
-rw-r--r--   3 hduser supergroup        415 2018-03-02 13:26 /user/nish/core-site.xml
-rw-r--r--   3 hduser supergroup         76 2018-03-02 13:26 /user/nish/dfs.hosts.include
-rw-r--r--   3 hduser supergroup        327 2018-03-02 13:26 /user/nish/fair-scheduler.xml
-rw-r--r--   3 hduser supergroup       2497 2018-03-02 13:26 /user/nish/hadoop-env.sh
-rw-r--r--   3 hduser supergroup       2052 2018-03-02 13:26 /user/nish/hadoop-metrics2.properties
-rw-r--r--   3 hduser supergroup       4644 2018-03-02 13:26 /user/nish/hadoop-policy.xml
-rw-r--r--   3 hduser supergroup          0 2018-03-02 13:16 /user/nish/mapred-site.xml

10. Clear Quotas
[hduser@namenode hadoop]$ hadoop dfsadmin -clrSpaceQuota /user/nish
[hduser@namenode hadoop]$ hadoop dfsadmin -clrQuota /user/nish

Verify Quota
[hduser@namenode hadoop]$ hadoop dfs -count -q /user/nish
        none             inf            none             inf            1            9              18563 hdfs://nn:8020/user/nish

11.  FSCK
[hduser@namenode ~]$ hadoop fsck /
FSCK started by hduser from /192.168.10.51 for path / at Sat Mar 03 14:30:30 CET 2018
................................Status: HEALTHY
 Total size:    57952 B
 Total dirs:    13
 Total files:   32
 Total blocks (validated):      31 (avg. block size 1869 B)
 Minimally replicated blocks:   31 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          4
 Number of racks:               1
FSCK ended at Sat Mar 03 14:30:30 CET 2018 in 25 milliseconds


The filesystem under path '/' is HEALTHY

[hduser@namenode ~]$ hadoop fsck /user/vin/conf/capacity-scheduler.xml -files -blocks -locations
FSCK started by hduser from /192.168.10.51 for path /user/vin/conf/capacity-scheduler.xml at Sat Mar 03 14:31:36 CET 2018
/user/vin/conf/capacity-scheduler.xml 7457 bytes, 1 block(s):  OK
0. blk_4751168492478894034_1008 len=7457 repl=3 [192.168.10.55:50010, 192.168.10.57:50010, 192.168.10.54:50010]

Status: HEALTHY
 Total size:    7457 B
 Total dirs:    0
 Total files:   1
 Total blocks (validated):      1 (avg. block size 7457 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          4
 Number of racks:               1
FSCK ended at Sat Mar 03 14:31:36 CET 2018 in 1 milliseconds


The filesystem under path '/user/vin/conf/capacity-scheduler.xml' is HEALTHY


12. Killing a Job in Hadoop
Find out the job Id using super user and kill it  -

[hduser@namenode ~]$ hadoop job -list
hadoop 1 jobs currently running
JobId   State   StartTime       UserName        Priority        SchedulingInfo
job_201803041408_0003   1       1520169058581   mapred  NORMAL  NA

[hduser@namenode ~]$ hadoop job  -kill job_201803041408_0003
Killed job job_201803041408_0003

13. Killing Specific Task Attempt in Hadoop
[hduser@namenode ~]$ hadoop job -list
1 jobs currently running
JobId   State   StartTime       UserName        Priority        SchedulingInfo
job_201803041408_0006   4       1520169519215   mapred  NORMAL  NA
[hduser@namenode ~]$ hadoop job -list-attempt-ids job_201803041408_0006 reduce running
attempt_201803041408_0006_r_000000_0
attempt_201803041408_0006_r_000001_0
attempt_201803041408_0006_r_000002_0
attempt_201803041408_0006_r_000003_0
attempt_201803041408_0006_r_000004_0
attempt_201803041408_0006_r_000005_0
attempt_201803041408_0006_r_000006_0
attempt_201803041408_0006_r_000007_0
[hduser@namenode ~]$ hadoop job -kill-task attempt_201803041408_0006_r_000006_0
Killed task attempt_201803041408_0006_r_000006_0

No comments:
Write comments