Advertisement

Thursday, April 12, 2018

Hadoop V2 - Administration

In this blog we will discuss on Hadoop 2 commands.
The key difference in hadoop 1 and hadoop 2 commands is that we use hdfs instead of hadoop as invoker utility / method (with obvious differences and additions in the Version 2)

Refer to one of the earlier blogs published to find out details of many other commands
The key difference as mentioned is that they are for Hadoop 1, however the overall concepts still remains the same in Hadoop 1 and Hadoop 2




A. hdfs dfs
    1. List Files by File Names
    hdfs dfs -ls /
   
    [hdfs@nn ~]$ hdfs dfs -ls /
    Found 2 items
    drwxrwxrwx   - hdfs admingroup          0 2018-04-12 00:25 /tmp
    drwxrwxrwx   - hdfs admingroup          0 2018-04-12 00:35 /user
    [hdfs@nn ~]$ hdfs dfs -ls /user/
    Found 3 items
    drwxrwxrwx   - hdfs admingroup          0 2018-04-12 00:35 /user/app-logs
    drwxrwxrwx   - hdfs admingroup          0 2018-04-12 00:25 /user/history
    -rw-r--r--   3 hdfs admingroup        416 2018-04-12 02:47 /user/hosts
   
    [Number 3 in second column above line specifies the replication factor, note - there is no replication factor for directories]
   
   
    2. List Specific Directory
    hdfs dfs -ls -d /user/
   
    [hdfs@nn ~]$ hdfs dfs -ls -d /user/
    drwxrwxrwx   - hdfs admingroup          0 2018-04-12 02:47 /user

    3. Get specific Details for file
    hdfs dfs -stat "specifier" filename
   
    [hdfs@nn ~]$ hdfs dfs -stat "%n-%b" /user/hosts
    hosts-416
    [hdfs@nn ~]$ hdfs dfs -stat "%n" /user/hosts
    hosts

    List of Specifiers
    %b Size of file in bytes
    %F Will return "file", "directory", or "symlink" depending on the type of inode
    %g Group name
    %n Filename
    %o HDFS Block size in bytes ( 128MB by default )
    %r Replication factor
    %u Username of owner
    %y Formatted mtime of inode
    %Y UNIX Epoch mtime of inode

   
    4. Directory Creation
    hdfs dfs -mkdir [-p] <dir>
   
    [hdfs@nn ~]$ hdfs dfs -mkdir /user/hadoop
    [Create Directory]
    [hdfs@nn ~]$ hdfs dfs -mkdir -p /user/hadoop/dir1
    [Create Directory along with Parenty directory just as unix]
   
    5. Directory Deletion
    hdfs dfs -rm -R <dir>
    This will delete directory (empty / non-empty) and contents recursively.
   
    [hdfs@nn ~]$ hdfs dfs -rm -R /user/hadoop/dir1
    18/04/12 02:57:52 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
    18/04/12 02:57:52 INFO fs.TrashPolicyDefault: Moved: 'hdfs://nn:8020/user/hadoop/dir1' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current/user/hadoop/dir1
    Moved: 'hdfs://nn:8020/user/hadoop/dir1' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current
   
    hdfs dfs -rmdir <dir>
    Delete Empty Directory
    [hdfs@nn ~]$ hdfs dfs -rmdir /user/hadoop
   
    hdfs dfs –rm <dir> –skipTrash
    Skip Trash during Deletion
   
    hdfs dfs -expunge
    Empty  Trash

    6. Changing Ownership to File
    hdfs dfs -chown username:group <file_name>
    Change ownership to specific file
   
    hdfs dfs -chown -R username:group <dir>
    Recursive Changing of ownership
   
   
    [hdfs@nn ~]$ hdfs dfs -ls /user/hosts
    -rw-r--r--   3 hdfs admingroup        416 2018-04-12 02:47 /user/hosts
   
    [hdfs@nn ~]$ hdfs dfs -chown mapred:hadoop /user/hosts
    [hdfs@nn ~]$ hdfs dfs -ls /user/hosts
    -rw-r--r--   3 mapred hadoop        416 2018-04-12 02:47 /user/hosts
   
    In hadoop user and group can be anything and it is not mandatory it exists on the system,
    You must be superuser (hdfs) to change ownershi
p.

    7. Changing Group Membership of file
       
    hdfs dfs -chgroup <group_name> <file_name>
   
    hdfs dfs -chgrp mygroup /user/hosts
       
    [hdfs@nn ~]$ hdfs dfs -ls /user/hosts
    -rw-r--r--   3 mapred mygroup        416 2018-04-12 02:47 /user/hosts
   
    hdfs dfs -chgrp -R username:group <dir>
    Recursive change of group membership
   
    8. Changing Permissions of file/directory
    hdfs dfs –chmod [-R] <mode> <file/dir>
    -R: Recursive
    <mode>: Octal mode as in Linux/UNIX
   
    8. Free Space find
    hdfs dfs -df -h
   
    [hdfs@nn logs]$ hdfs dfs -df -h
    Filesystem         Size  Used  Available  Use%
    hdfs://nn:8020  575.9 G  96 K    575.9 G    0%
   
    9. Utilization
    hdfs dfs -du -h
    [hdfs@nn logs]$    hdfs dfs -du -h /
    152.4 T 457.2 T /data_d
    0 0 /home
    0 0 /lost+found
    110.1 K 330.3 K /schema_d
   
    Column 1 is raw size of file and Column 2 is total size including replicated blocks.
    So in general Col2 = Col 1 * 3 (if replication is uniformly 3 across all the files)
   
    Show Utilization Summary

    [hdfs@nn]$    hdfs dfs -du -s -h /
    110.0 T 321.1 T /
   
    10. Create empty file
    hdfs dfs -touchz /user/empty.txt
   
    [hdfs@nn logs]$ hdfs dfs -ls /user/empty.txt
    -rw-r--r--   3 hdfs admingroup          0 2018-04-12 05:22 /user/empty.txt
   
    11. Change Replication Factor
    hdfs dfs -setrep -w <new_replication_factor> <filename>
   
    [hdfs@nn ~]$ hdfs dfs -setrep -w 2 /user/hosts
    Replication 2 set: /user/hosts
    Waiting for /user/hosts ...
    WARNING: the waiting time may be long for DECREASING the number of replications.
    . done
   
    hdfs dfs –setrep -w 2 -R <dir>
    [Change rep factor of all files in the directory - recursively for all subdirectories and files]


   
B. hdfs dfsadmin
    1. hdfs dfsadmin -report
    Report Detail of hdfs as whole and details of datanodes

   
    [hdfs@nn ~]$ hdfs dfsadmin -report
    Configured Capacity: 618396254208 (575.93 GB)
    Present Capacity: 618396254208 (575.93 GB)
    DFS Remaining: 618396155904 (575.93 GB)
    DFS Used: 98304 (96 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
   
    -------------------------------------------------
    Live datanodes (3):
   
    2. hdfs dfsadmin -printTopology
    Print Rack topology (as running)

    [hdfs@nn ~]$ hdfs dfsadmin -printTopology
    Rack: /default-rack
        192.168.2.104:50010 (d1.novalocal)
        192.168.2.105:50010 (d2.novalocal)
        192.168.2.106:50010 (d3.novalocal)
       
    3. hdfs dfsadmin –refreshNodes
    Update Namenode with the list of nodes that are allowd to connect as configured by dfs.hosts parameter in hdfs-site.xml

   
    [hdfs@nn ~]$ hdfs dfsadmin -refreshNodes
    Refresh nodes successful
   
    4. hdfs dfsadmin -metave <file>
    [hdfs@nn ~]$ hdfs dfsadmin -metasave out.txt

    Created metasave file out.txt in the log directory of namenode hdfs://nn:8020
   
    Additional information in out.txt viz.
        - Blocking Waiting Replication
        - Total # of blocks
        - Blocks being replicated

       
C. hdfs balancer
    1. Set threshold and run balancer
    hdfs balancer -threshold 10
   
    hdfs balancer
    If balancer is run next without specificying threshold, it will use threshold which was last specified.
   
    2. Limit B.W to balancer
    hdfs dfsadmin -setBalancerBandwidth <b.width in bytes/secon>
   
    [hdfs@nn logs]$ hdfs dfsadmin -setBalancerBandwidth 1024000
    Balancer bandwidth is set to 1024000
    (10MB/second)
    Note - this command is discussed here becauase it is a balancer command

   
       
   

No comments:
Write comments