In this blog I discuss HDFS snapshot feature
HDFS Snapshot -
1. Feature to take snapshots of directory to prevent errors
2. It is used to query old versions of data
3. Default directories are not enabled for snapshot
4. Only Namenode knows about Snapshots as it maintains the metadata information, Datanodes do not have knowledge of it
5. Name are unique i.e. you cannot create a snapshot with same name for a given directory
6. NO data copying happens, only blocks list and file size are recorded by the snapshot file and normal operations go on as in normal mode.
7. Cannot delete files from hdfs snapshot directories as they can only be listed and copied only.
1. Enabling Snapshot
hdfs dfs -allowSnapshot <path>
[hdfs@nn ~]$ hdfs dfsadmin -allowSnapshot /data/conf
Allowing snaphot on /data/conf succeeded
2. Create Snapshot
hdfs dfs -allowSnapshot <path> [<snapshotname>]
[hdfs@nn ~]$ hdfs dfs -createSnapshot /data/conf
Created snapshot /data/conf/.snapshot/s20180416-005602.358
If you do not specify anyname, a system generated name is created for snapshot.
[hdfs@nn ~]$ hdfs dfs -createSnapshot /data/conf Snap1
Created snapshot /data/conf/.snapshot/Snap1
3. Deleting Snapshot
hdfs dfs -deleteSnapshot <path> <snapshotname>
[hdfs@nn ~]$ hdfs dfs -deleteSnapshot /data/conf s20180416-005602.358
4. Listing Snapshots
hdfs dfs -ls <path/.snapshot>
[hdfs@nn ~]$ hdfs dfs -ls /data/conf/.snapshot
Found 1 items
drwxr-xr-x - hdfs admingroup 0 2018-04-16 00:56 /data/conf/.snapshot/Snap1
5. List Directories on which Snapshots are enabled
[hdfs@nn ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs admingroup 0 2018-04-16 00:53 0 65536 /data1
drwxr-xr-x 0 hdfs admingroup 0 2018-04-16 00:56 1 65536 /data/conf
6. Difference in 2 Snapshots
hdfs snapshotDiff <path> <Snap1_name> <Snap2_name>
[hdfs@nn ~]$ hdfs snapshotDiff /data/conf Snap1 Snap2
Difference between snapshot Snap1 and snapshot Snap2 under directory /data/conf:
M .
+ ./hosts
- ./ssl-server.xml.example
- ./yarn-site.xml
Where '+' is file added and '-' is removed, 'R' is renamed
7. Listing Old Contents
hdfs dfs -ls /data/conf/.snapshot/Snap2
You can list the old content by listing files in the snapshot directory.
8. Recovering
Files can be recovered by copying contents (file) from the snapshot Directory
9. Deleting
Directory can only be deleted only if there are no snapshots present, so delete all snapshots manually by using
hdfs dfs -deleteSnapshot <path> <snapshotname>
HDFS Snapshot -
1. Feature to take snapshots of directory to prevent errors
2. It is used to query old versions of data
3. Default directories are not enabled for snapshot
4. Only Namenode knows about Snapshots as it maintains the metadata information, Datanodes do not have knowledge of it
5. Name are unique i.e. you cannot create a snapshot with same name for a given directory
6. NO data copying happens, only blocks list and file size are recorded by the snapshot file and normal operations go on as in normal mode.
7. Cannot delete files from hdfs snapshot directories as they can only be listed and copied only.
1. Enabling Snapshot
hdfs dfs -allowSnapshot <path>
[hdfs@nn ~]$ hdfs dfsadmin -allowSnapshot /data/conf
Allowing snaphot on /data/conf succeeded
2. Create Snapshot
hdfs dfs -allowSnapshot <path> [<snapshotname>]
[hdfs@nn ~]$ hdfs dfs -createSnapshot /data/conf
Created snapshot /data/conf/.snapshot/s20180416-005602.358
If you do not specify anyname, a system generated name is created for snapshot.
[hdfs@nn ~]$ hdfs dfs -createSnapshot /data/conf Snap1
Created snapshot /data/conf/.snapshot/Snap1
3. Deleting Snapshot
hdfs dfs -deleteSnapshot <path> <snapshotname>
[hdfs@nn ~]$ hdfs dfs -deleteSnapshot /data/conf s20180416-005602.358
4. Listing Snapshots
hdfs dfs -ls <path/.snapshot>
[hdfs@nn ~]$ hdfs dfs -ls /data/conf/.snapshot
Found 1 items
drwxr-xr-x - hdfs admingroup 0 2018-04-16 00:56 /data/conf/.snapshot/Snap1
5. List Directories on which Snapshots are enabled
[hdfs@nn ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs admingroup 0 2018-04-16 00:53 0 65536 /data1
drwxr-xr-x 0 hdfs admingroup 0 2018-04-16 00:56 1 65536 /data/conf
6. Difference in 2 Snapshots
hdfs snapshotDiff <path> <Snap1_name> <Snap2_name>
[hdfs@nn ~]$ hdfs snapshotDiff /data/conf Snap1 Snap2
Difference between snapshot Snap1 and snapshot Snap2 under directory /data/conf:
M .
+ ./hosts
- ./ssl-server.xml.example
- ./yarn-site.xml
Where '+' is file added and '-' is removed, 'R' is renamed
7. Listing Old Contents
hdfs dfs -ls /data/conf/.snapshot/Snap2
You can list the old content by listing files in the snapshot directory.
8. Recovering
Files can be recovered by copying contents (file) from the snapshot Directory
9. Deleting
Directory can only be deleted only if there are no snapshots present, so delete all snapshots manually by using
hdfs dfs -deleteSnapshot <path> <snapshotname>
No comments:
Write comments