In this blog we will discuss on Hadoop 2 commands.
The key difference in hadoop 1 and hadoop 2 commands is that we use hdfs instead of hadoop as invoker utility / method (with obvious differences and additions in the Version 2)
Refer to one of the earlier blogs published to find out details of many other commands
The key difference as mentioned is that they are for Hadoop 1, however the overall concepts still remains the same in Hadoop 1 and Hadoop 2
A. hdfs dfs
1. List Files by File Names
hdfs dfs -ls /
[hdfs@nn ~]$ hdfs dfs -ls /
Found 2 items
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:25 /tmp
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:35 /user
[hdfs@nn ~]$ hdfs dfs -ls /user/
Found 3 items
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:35 /user/app-logs
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:25 /user/history
-rw-r--r-- 3 hdfs admingroup 416 2018-04-12 02:47 /user/hosts
[Number 3 in second column above line specifies the replication factor, note - there is no replication factor for directories]
2. List Specific Directory
hdfs dfs -ls -d /user/
[hdfs@nn ~]$ hdfs dfs -ls -d /user/
drwxrwxrwx - hdfs admingroup 0 2018-04-12 02:47 /user
3. Get specific Details for file
hdfs dfs -stat "specifier" filename
[hdfs@nn ~]$ hdfs dfs -stat "%n-%b" /user/hosts
hosts-416
[hdfs@nn ~]$ hdfs dfs -stat "%n" /user/hosts
hosts
List of Specifiers
%b Size of file in bytes
%F Will return "file", "directory", or "symlink" depending on the type of inode
%g Group name
%n Filename
%o HDFS Block size in bytes ( 128MB by default )
%r Replication factor
%u Username of owner
%y Formatted mtime of inode
%Y UNIX Epoch mtime of inode
4. Directory Creation
hdfs dfs -mkdir [-p] <dir>
[hdfs@nn ~]$ hdfs dfs -mkdir /user/hadoop
[Create Directory]
[hdfs@nn ~]$ hdfs dfs -mkdir -p /user/hadoop/dir1
[Create Directory along with Parenty directory just as unix]
5. Directory Deletion
hdfs dfs -rm -R <dir>
This will delete directory (empty / non-empty) and contents recursively.
[hdfs@nn ~]$ hdfs dfs -rm -R /user/hadoop/dir1
18/04/12 02:57:52 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
18/04/12 02:57:52 INFO fs.TrashPolicyDefault: Moved: 'hdfs://nn:8020/user/hadoop/dir1' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current/user/hadoop/dir1
Moved: 'hdfs://nn:8020/user/hadoop/dir1' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current
hdfs dfs -rmdir <dir>
Delete Empty Directory
[hdfs@nn ~]$ hdfs dfs -rmdir /user/hadoop
hdfs dfs –rm <dir> –skipTrash
Skip Trash during Deletion
hdfs dfs -expunge
Empty Trash
6. Changing Ownership to File
hdfs dfs -chown username:group <file_name>
Change ownership to specific file
hdfs dfs -chown -R username:group <dir>
Recursive Changing of ownership
[hdfs@nn ~]$ hdfs dfs -ls /user/hosts
-rw-r--r-- 3 hdfs admingroup 416 2018-04-12 02:47 /user/hosts
[hdfs@nn ~]$ hdfs dfs -chown mapred:hadoop /user/hosts
[hdfs@nn ~]$ hdfs dfs -ls /user/hosts
-rw-r--r-- 3 mapred hadoop 416 2018-04-12 02:47 /user/hosts
In hadoop user and group can be anything and it is not mandatory it exists on the system,
You must be superuser (hdfs) to change ownership.
7. Changing Group Membership of file
hdfs dfs -chgroup <group_name> <file_name>
hdfs dfs -chgrp mygroup /user/hosts
[hdfs@nn ~]$ hdfs dfs -ls /user/hosts
-rw-r--r-- 3 mapred mygroup 416 2018-04-12 02:47 /user/hosts
hdfs dfs -chgrp -R username:group <dir>
Recursive change of group membership
8. Changing Permissions of file/directory
hdfs dfs –chmod [-R] <mode> <file/dir>
-R: Recursive
<mode>: Octal mode as in Linux/UNIX
8. Free Space find
hdfs dfs -df -h
[hdfs@nn logs]$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://nn:8020 575.9 G 96 K 575.9 G 0%
9. Utilization
hdfs dfs -du -h
[hdfs@nn logs]$ hdfs dfs -du -h /
152.4 T 457.2 T /data_d
0 0 /home
0 0 /lost+found
110.1 K 330.3 K /schema_d
Column 1 is raw size of file and Column 2 is total size including replicated blocks.
So in general Col2 = Col 1 * 3 (if replication is uniformly 3 across all the files)
Show Utilization Summary
[hdfs@nn]$ hdfs dfs -du -s -h /
110.0 T 321.1 T /
10. Create empty file
hdfs dfs -touchz /user/empty.txt
[hdfs@nn logs]$ hdfs dfs -ls /user/empty.txt
-rw-r--r-- 3 hdfs admingroup 0 2018-04-12 05:22 /user/empty.txt
11. Change Replication Factor
hdfs dfs -setrep -w <new_replication_factor> <filename>
[hdfs@nn ~]$ hdfs dfs -setrep -w 2 /user/hosts
Replication 2 set: /user/hosts
Waiting for /user/hosts ...
WARNING: the waiting time may be long for DECREASING the number of replications.
. done
hdfs dfs –setrep -w 2 -R <dir>
[Change rep factor of all files in the directory - recursively for all subdirectories and files]
B. hdfs dfsadmin
1. hdfs dfsadmin -report
Report Detail of hdfs as whole and details of datanodes
[hdfs@nn ~]$ hdfs dfsadmin -report
Configured Capacity: 618396254208 (575.93 GB)
Present Capacity: 618396254208 (575.93 GB)
DFS Remaining: 618396155904 (575.93 GB)
DFS Used: 98304 (96 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
2. hdfs dfsadmin -printTopology
Print Rack topology (as running)
[hdfs@nn ~]$ hdfs dfsadmin -printTopology
Rack: /default-rack
192.168.2.104:50010 (d1.novalocal)
192.168.2.105:50010 (d2.novalocal)
192.168.2.106:50010 (d3.novalocal)
3. hdfs dfsadmin –refreshNodes
Update Namenode with the list of nodes that are allowd to connect as configured by dfs.hosts parameter in hdfs-site.xml
[hdfs@nn ~]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
4. hdfs dfsadmin -metave <file>
[hdfs@nn ~]$ hdfs dfsadmin -metasave out.txt
Created metasave file out.txt in the log directory of namenode hdfs://nn:8020
Additional information in out.txt viz.
- Blocking Waiting Replication
- Total # of blocks
- Blocks being replicated
C. hdfs balancer
1. Set threshold and run balancer
hdfs balancer -threshold 10
hdfs balancer
If balancer is run next without specificying threshold, it will use threshold which was last specified.
2. Limit B.W to balancer
hdfs dfsadmin -setBalancerBandwidth <b.width in bytes/secon>
[hdfs@nn logs]$ hdfs dfsadmin -setBalancerBandwidth 1024000
Balancer bandwidth is set to 1024000
(10MB/second)
Note - this command is discussed here becauase it is a balancer command
The key difference in hadoop 1 and hadoop 2 commands is that we use hdfs instead of hadoop as invoker utility / method (with obvious differences and additions in the Version 2)
Refer to one of the earlier blogs published to find out details of many other commands
The key difference as mentioned is that they are for Hadoop 1, however the overall concepts still remains the same in Hadoop 1 and Hadoop 2
A. hdfs dfs
1. List Files by File Names
hdfs dfs -ls /
[hdfs@nn ~]$ hdfs dfs -ls /
Found 2 items
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:25 /tmp
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:35 /user
[hdfs@nn ~]$ hdfs dfs -ls /user/
Found 3 items
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:35 /user/app-logs
drwxrwxrwx - hdfs admingroup 0 2018-04-12 00:25 /user/history
-rw-r--r-- 3 hdfs admingroup 416 2018-04-12 02:47 /user/hosts
[Number 3 in second column above line specifies the replication factor, note - there is no replication factor for directories]
2. List Specific Directory
hdfs dfs -ls -d /user/
[hdfs@nn ~]$ hdfs dfs -ls -d /user/
drwxrwxrwx - hdfs admingroup 0 2018-04-12 02:47 /user
3. Get specific Details for file
hdfs dfs -stat "specifier" filename
[hdfs@nn ~]$ hdfs dfs -stat "%n-%b" /user/hosts
hosts-416
[hdfs@nn ~]$ hdfs dfs -stat "%n" /user/hosts
hosts
List of Specifiers
%b Size of file in bytes
%F Will return "file", "directory", or "symlink" depending on the type of inode
%g Group name
%n Filename
%o HDFS Block size in bytes ( 128MB by default )
%r Replication factor
%u Username of owner
%y Formatted mtime of inode
%Y UNIX Epoch mtime of inode
4. Directory Creation
hdfs dfs -mkdir [-p] <dir>
[hdfs@nn ~]$ hdfs dfs -mkdir /user/hadoop
[Create Directory]
[hdfs@nn ~]$ hdfs dfs -mkdir -p /user/hadoop/dir1
[Create Directory along with Parenty directory just as unix]
5. Directory Deletion
hdfs dfs -rm -R <dir>
This will delete directory (empty / non-empty) and contents recursively.
[hdfs@nn ~]$ hdfs dfs -rm -R /user/hadoop/dir1
18/04/12 02:57:52 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
18/04/12 02:57:52 INFO fs.TrashPolicyDefault: Moved: 'hdfs://nn:8020/user/hadoop/dir1' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current/user/hadoop/dir1
Moved: 'hdfs://nn:8020/user/hadoop/dir1' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current
hdfs dfs -rmdir <dir>
Delete Empty Directory
[hdfs@nn ~]$ hdfs dfs -rmdir /user/hadoop
hdfs dfs –rm <dir> –skipTrash
Skip Trash during Deletion
hdfs dfs -expunge
Empty Trash
6. Changing Ownership to File
hdfs dfs -chown username:group <file_name>
Change ownership to specific file
hdfs dfs -chown -R username:group <dir>
Recursive Changing of ownership
[hdfs@nn ~]$ hdfs dfs -ls /user/hosts
-rw-r--r-- 3 hdfs admingroup 416 2018-04-12 02:47 /user/hosts
[hdfs@nn ~]$ hdfs dfs -chown mapred:hadoop /user/hosts
[hdfs@nn ~]$ hdfs dfs -ls /user/hosts
-rw-r--r-- 3 mapred hadoop 416 2018-04-12 02:47 /user/hosts
In hadoop user and group can be anything and it is not mandatory it exists on the system,
You must be superuser (hdfs) to change ownership.
7. Changing Group Membership of file
hdfs dfs -chgroup <group_name> <file_name>
hdfs dfs -chgrp mygroup /user/hosts
[hdfs@nn ~]$ hdfs dfs -ls /user/hosts
-rw-r--r-- 3 mapred mygroup 416 2018-04-12 02:47 /user/hosts
hdfs dfs -chgrp -R username:group <dir>
Recursive change of group membership
8. Changing Permissions of file/directory
hdfs dfs –chmod [-R] <mode> <file/dir>
-R: Recursive
<mode>: Octal mode as in Linux/UNIX
8. Free Space find
hdfs dfs -df -h
[hdfs@nn logs]$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://nn:8020 575.9 G 96 K 575.9 G 0%
9. Utilization
hdfs dfs -du -h
[hdfs@nn logs]$ hdfs dfs -du -h /
152.4 T 457.2 T /data_d
0 0 /home
0 0 /lost+found
110.1 K 330.3 K /schema_d
Column 1 is raw size of file and Column 2 is total size including replicated blocks.
So in general Col2 = Col 1 * 3 (if replication is uniformly 3 across all the files)
Show Utilization Summary
[hdfs@nn]$ hdfs dfs -du -s -h /
110.0 T 321.1 T /
10. Create empty file
hdfs dfs -touchz /user/empty.txt
[hdfs@nn logs]$ hdfs dfs -ls /user/empty.txt
-rw-r--r-- 3 hdfs admingroup 0 2018-04-12 05:22 /user/empty.txt
11. Change Replication Factor
hdfs dfs -setrep -w <new_replication_factor> <filename>
[hdfs@nn ~]$ hdfs dfs -setrep -w 2 /user/hosts
Replication 2 set: /user/hosts
Waiting for /user/hosts ...
WARNING: the waiting time may be long for DECREASING the number of replications.
. done
hdfs dfs –setrep -w 2 -R <dir>
[Change rep factor of all files in the directory - recursively for all subdirectories and files]
B. hdfs dfsadmin
1. hdfs dfsadmin -report
Report Detail of hdfs as whole and details of datanodes
[hdfs@nn ~]$ hdfs dfsadmin -report
Configured Capacity: 618396254208 (575.93 GB)
Present Capacity: 618396254208 (575.93 GB)
DFS Remaining: 618396155904 (575.93 GB)
DFS Used: 98304 (96 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
2. hdfs dfsadmin -printTopology
Print Rack topology (as running)
[hdfs@nn ~]$ hdfs dfsadmin -printTopology
Rack: /default-rack
192.168.2.104:50010 (d1.novalocal)
192.168.2.105:50010 (d2.novalocal)
192.168.2.106:50010 (d3.novalocal)
3. hdfs dfsadmin –refreshNodes
Update Namenode with the list of nodes that are allowd to connect as configured by dfs.hosts parameter in hdfs-site.xml
[hdfs@nn ~]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
4. hdfs dfsadmin -metave <file>
[hdfs@nn ~]$ hdfs dfsadmin -metasave out.txt
Created metasave file out.txt in the log directory of namenode hdfs://nn:8020
Additional information in out.txt viz.
- Blocking Waiting Replication
- Total # of blocks
- Blocks being replicated
C. hdfs balancer
1. Set threshold and run balancer
hdfs balancer -threshold 10
hdfs balancer
If balancer is run next without specificying threshold, it will use threshold which was last specified.
2. Limit B.W to balancer
hdfs dfsadmin -setBalancerBandwidth <b.width in bytes/secon>
[hdfs@nn logs]$ hdfs dfsadmin -setBalancerBandwidth 1024000
Balancer bandwidth is set to 1024000
(10MB/second)
Note - this command is discussed here becauase it is a balancer command
No comments:
Write comments