Advertisement

Monday, April 16, 2018

Hadoop V2 - Trash

In this blog I discuss about HDFS Trash Feature.

Trash is a feature provided by HDFS similar to recycle bin of Windows. However there are differences / few changes 


1. It is a user only feature - i.e files are moved to trash only when deleted using 'hdfs dfs' command
Files removed programmeticaly are deleted permanently by default
(for this you need to use moveToTrash() instance)
2. Trash is disabled by default and enabled by setting property fs.trash.interval in minutes
Below example sets it for 2 days

<name>fs.trash.interval</name>
<value>2880</value>

3. After the specified period files are deleted by default.

4. fs.trash.checkpoint.interval makes sure after how many minutes as defined, the trash directory is checked for files greater than the trash.interval.

5. This parameter should be set on all the nodes which are going to be client nodes and not just the Namenode, so different clients can have different settings.

Deleting File
[hdfs@nn conf]$ hdfs dfs -rm /user/hosts
18/04/16 00:27:43 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
18/04/16 00:27:43 INFO fs.TrashPolicyDefault: Moved: 'hdfs://nn:8020/user/hosts' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current/user/hosts
Moved: 'hdfs://nn:8020/user/hosts' to trash at: hdfs://nn:8020/user/hdfs/.Trash/Current


Restoring File
Copy
[hdfs@nn conf]$ hdfs dfs -cp /user/hdfs/.Trash/Current/user/hosts /user/
or Move
[hdfs@nn conf]$ hdfs dfs -mv /user/hdfs/.Trash/Current/user/hosts /user/

Deleting Trash
Expunge command will only remove trash files of the user who is running the command.
[mapred@nn ~]$ hdfs dfs -expunge
18/04/16 00:31:56 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
18/04/16 00:31:56 INFO fs.TrashPolicyDefault: Created trash checkpoint: /user/mapred/.Trash/180416003156


Skipping Trash
[hdfs@nn ~]$ hdfs dfs -rm -skipTrash /user/hosts/
Deleted /user/hosts



Note

  1. Trash Directory is automatically created even if you delete , make sure user has permission on /user directory to write files
  2. The directory will have the time stamp when first while was deleted.
  3. The file which is deleted will have timestamp of when it was created.
  4. If you try to delete 2 files with same name then second file will numbered with timestamp.
  5. Trash enabled --> Delete file; Trash Disabled --> Refreshed configuration --> File     still present. This means even if you disable trash and restart, the files will be present
  6. /user/<username>/.Trash/Current is where the trash files are

No comments:
Write comments