Advertisement

Thursday, April 5, 2018

Hadoop V2 - Storage Policies

In this blog I discuss how to setup Hetrogenous Storage and Storage Policies

(All Steps are run as hdfs user)


Step 1 -
Go to DataNode where you want to make changes

Step 2 -Edit the hdfs-site.xml and add Disk Directives
cd $CONF
sudo vi hdfs-site.xml
Make Below Changes
<property>
        <name>dfs.data.dir</name>
        <value>[DISK]file:///opt/HDPV2/1/dfs/dn,[SSD]file:///opt/HDPV2/2/dfs/dn,[ARCHIVE]file:///opt/HDPV2/3/dfs/dn,[RAM_DISK]file:///opt/HDPV2/4/dfs/dn</value>
</property>

<property>
        <name>dfs.storage.policy.enabled</name>
        <value>true</value>
</property>


I have used all four directives in my example, though physically they all point to same directory

Step 3 - Restart hadoop DataNode Daemon

[hdfs@d1node conf]$ hadoop-daemon.sh stop datanode
stopping datanode
[hdfs@d1node conf]$ hadoop-daemon.sh start datanode
starting datanode, logging to /opt/HDPV2/logs/hadoop-hdfs-datanode-d1node.cluster.com.out

(Do Step 2 and 3 for all Datanodes you want changes)


Step 4  - Create 4 directories for different policies
hdfs dfs -mkdir /storage/
hdfs dfs -mkdir /storage/Hot
hdfs dfs -mkdir /storage/Cold
hdfs dfs -mkdir /storage/Warm
hdfs dfs -mkdir /storage/one_ssd




Step 5 - Setup Storage Policies

[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Hot -policy Hot
Set storage policy Hot on /storage/Hot
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Cold -policy Cold
Set storage policy Cold on /storage/Cold
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/one_ssd -policy ONE_SSD
Set storage policy ONE_SSD on /storage/one_ssd
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Warm -policy Warm
Set storage policy Warm on /storage/Warm


Step 6 - List Policies Definition

[hdfs@namenode conf]$hdfs storagepolicies -listPolicies
Block Storage Policies:
        BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
        BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}
        BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
        BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]}
        BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
        BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}

       
Step 7 - Get Storage Policy for the path
[hdfs@namenode conf]$ hdfs storagepolicies -getStoragePolicy -path /storage/Hot
The storage policy of /storage/Hot:
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}



Step 8 - Run mover (to distribute the data as per the new policies)
[hdfs@namenode conf]$ hdfs mover

 
(Caution run mover with paths to minimize the load at a given point of time)
hdfs mover [-p <files/dirs> | -f <local file name]

No comments:
Write comments