In this blog I discuss how to setup Hetrogenous Storage and Storage Policies
(All Steps are run as hdfs user)
Step 1 -
Go to DataNode where you want to make changes
Step 2 -Edit the hdfs-site.xml and add Disk Directives
cd $CONF
sudo vi hdfs-site.xml
Make Below Changes
<property>
<name>dfs.data.dir</name>
<value>[DISK]file:///opt/HDPV2/1/dfs/dn,[SSD]file:///opt/HDPV2/2/dfs/dn,[ARCHIVE]file:///opt/HDPV2/3/dfs/dn,[RAM_DISK]file:///opt/HDPV2/4/dfs/dn</value>
</property>
<property>
<name>dfs.storage.policy.enabled</name>
<value>true</value>
</property>
I have used all four directives in my example, though physically they all point to same directory
Step 3 - Restart hadoop DataNode Daemon
[hdfs@d1node conf]$ hadoop-daemon.sh stop datanode
stopping datanode
[hdfs@d1node conf]$ hadoop-daemon.sh start datanode
starting datanode, logging to /opt/HDPV2/logs/hadoop-hdfs-datanode-d1node.cluster.com.out
(Do Step 2 and 3 for all Datanodes you want changes)
Step 4 - Create 4 directories for different policies
hdfs dfs -mkdir /storage/
hdfs dfs -mkdir /storage/Hot
hdfs dfs -mkdir /storage/Cold
hdfs dfs -mkdir /storage/Warm
hdfs dfs -mkdir /storage/one_ssd
Step 5 - Setup Storage Policies
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Hot -policy Hot
Set storage policy Hot on /storage/Hot
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Cold -policy Cold
Set storage policy Cold on /storage/Cold
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/one_ssd -policy ONE_SSD
Set storage policy ONE_SSD on /storage/one_ssd
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Warm -policy Warm
Set storage policy Warm on /storage/Warm
Step 6 - List Policies Definition
[hdfs@namenode conf]$hdfs storagepolicies -listPolicies
Block Storage Policies:
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]}
BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
Step 7 - Get Storage Policy for the path
[hdfs@namenode conf]$ hdfs storagepolicies -getStoragePolicy -path /storage/Hot
The storage policy of /storage/Hot:
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
Step 8 - Run mover (to distribute the data as per the new policies)
[hdfs@namenode conf]$ hdfs mover
(Caution run mover with paths to minimize the load at a given point of time)
hdfs mover [-p <files/dirs> | -f <local file name]
(All Steps are run as hdfs user)
Step 1 -
Go to DataNode where you want to make changes
Step 2 -Edit the hdfs-site.xml and add Disk Directives
cd $CONF
sudo vi hdfs-site.xml
Make Below Changes
<property>
<name>dfs.data.dir</name>
<value>[DISK]file:///opt/HDPV2/1/dfs/dn,[SSD]file:///opt/HDPV2/2/dfs/dn,[ARCHIVE]file:///opt/HDPV2/3/dfs/dn,[RAM_DISK]file:///opt/HDPV2/4/dfs/dn</value>
</property>
<property>
<name>dfs.storage.policy.enabled</name>
<value>true</value>
</property>
I have used all four directives in my example, though physically they all point to same directory
Step 3 - Restart hadoop DataNode Daemon
[hdfs@d1node conf]$ hadoop-daemon.sh stop datanode
stopping datanode
[hdfs@d1node conf]$ hadoop-daemon.sh start datanode
starting datanode, logging to /opt/HDPV2/logs/hadoop-hdfs-datanode-d1node.cluster.com.out
(Do Step 2 and 3 for all Datanodes you want changes)
Step 4 - Create 4 directories for different policies
hdfs dfs -mkdir /storage/
hdfs dfs -mkdir /storage/Hot
hdfs dfs -mkdir /storage/Cold
hdfs dfs -mkdir /storage/Warm
hdfs dfs -mkdir /storage/one_ssd
Step 5 - Setup Storage Policies
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Hot -policy Hot
Set storage policy Hot on /storage/Hot
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Cold -policy Cold
Set storage policy Cold on /storage/Cold
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/one_ssd -policy ONE_SSD
Set storage policy ONE_SSD on /storage/one_ssd
[hdfs@namenode conf]$ hdfs storagepolicies -setStoragePolicy -path /storage/Warm -policy Warm
Set storage policy Warm on /storage/Warm
Step 6 - List Policies Definition
[hdfs@namenode conf]$hdfs storagepolicies -listPolicies
Block Storage Policies:
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]}
BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
Step 7 - Get Storage Policy for the path
[hdfs@namenode conf]$ hdfs storagepolicies -getStoragePolicy -path /storage/Hot
The storage policy of /storage/Hot:
BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
Step 8 - Run mover (to distribute the data as per the new policies)
[hdfs@namenode conf]$ hdfs mover
(Caution run mover with paths to minimize the load at a given point of time)
hdfs mover [-p <files/dirs> | -f <local file name]
No comments:
Write comments