Advertisement

Monday, April 16, 2018

Hadoop V2 - Snapshot

In this blog I discuss HDFS snapshot feature

HDFS Snapshot - 
1. Feature to take snapshots of directory to prevent errors
2. It is used to query old versions of data
3. Default directories are not enabled for snapshot
4. Only Namenode knows about Snapshots as it maintains the metadata information, Datanodes do not have knowledge of it
5. Name are unique i.e. you cannot create a snapshot with same name for a given directory

6. N data copying happens, only blocks list and file size are recorded by the snapshot file and normal operations go on as in normal mode.
7. Cannot delete files from hdfs snapshot directories as they can only be listed and copied only.



1. Enabling Snapshot
    hdfs dfs -allowSnapshot <path>
   [hdfs@nn ~]$ hdfs dfsadmin -allowSnapshot /data/conf

   
    Allowing snaphot on /data/conf succeeded

2. Create Snapshot
    hdfs dfs -allowSnapshot <path> [<snapshotname>]
    [hdfs@nn ~]$ hdfs dfs -createSnapshot /data/conf
    Created snapshot /data/conf/.snapshot/s20180416-005602.358

    If you do not specify anyname, a system generated name is created for snapshot.
    [hdfs@nn ~]$ hdfs dfs -createSnapshot /data/conf Snap1
    Created snapshot /data/conf/.snapshot/Snap1


3. Deleting Snapshot
    hdfs dfs -deleteSnapshot <path> <snapshotname>
    [hdfs@nn ~]$ hdfs dfs -deleteSnapshot /data/conf s20180416-005602.358

   

4. Listing Snapshots   
    hdfs dfs -ls <path/.snapshot>
    [hdfs@nn ~]$ hdfs dfs -ls /data/conf/.snapshot
    Found 1 items
    drwxr-xr-x   - hdfs admingroup          0 2018-04-16 00:56 /data/conf/.snapshot/Snap1

   
5. List Directories on which Snapshots are enabled
    [hdfs@nn ~]$ hdfs lsSnapshottableDir
    drwxr-xr-x 0 hdfs admingroup 0 2018-04-16 00:53 0 65536 /data1
    drwxr-xr-x 0 hdfs admingroup 0 2018-04-16 00:56 1 65536 /data/conf


6. Difference in 2 Snapshots
    hdfs snapshotDiff <path> <Snap1_name> <Snap2_name>
    [hdfs@nn ~]$ hdfs snapshotDiff /data/conf Snap1 Snap2
    Difference between snapshot Snap1 and snapshot Snap2 under directory /data/conf:
    M       .
    +       ./hosts
    -       ./ssl-server.xml.example
    -       ./yarn-site.xml
    Where '+' is file added and '-' is removed, 'R' is renamed


7.  Listing Old Contents
    hdfs dfs -ls /data/conf/.snapshot/Snap2
    You can list the old content by listing files in the snapshot directory.
   
8. Recovering
    Files can be recovered by copying contents (file) from the snapshot Directory

9. Deleting
    Directory can only be deleted only if there are no snapshots present, so delete all snapshots manually by using
     hdfs dfs -deleteSnapshot <path> <snapshotname>
   

No comments:
Write comments