Advertisement

Monday, April 23, 2018

Hadoop V2 - Capacity Scheduler Configuration


In this blog I discuss how to do fair scheduler configuration for Hadoop 2

I will design Queues and Capacity as per below diagram.

(All the detailed configuration is present in the end of the blog)



Steps [on rm node]
1. Make Backup of Capacity Scheduler File
sudo cp capacity-scheduler.xml capacity-scheduler.xml.bkp

2. Configure /etc/hadoop/conf/capacity.scheduler.xml as in Appendix

3. Configure Properties in Yarn-site.xml
yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(Enable Capacity Scheduler)
yarn.resourcemanager.scheduler.monitor.enable = true
(Enable Preemption)

4. Stop and Start yarn-services

stop-yarn.sh
start-yarn.sh

5. Run any application


6. Verify from rm cluster UI applications, queues





7. Check Queues from RM using CMD


[yarn@rm ]$ hadoop queue -list

18/04/23 03:06:36 INFO client.RMProxy: Connecting to ResourceManager at rm/192.168.2.102:8032
======================
Queue Name : research
Queue State : running
Scheduling Info : Capacity: 30.000002, MaximumCapacity: 30.000002, CurrentCapacity: 0.0
    ======================
    Queue Name : analytics
    Queue State : running
    Scheduling Info : Capacity: 40.0, MaximumCapacity: 60.000004, CurrentCapacity: 0.0
    ======================
    Queue Name : data
    Queue State : running
    Scheduling Info : Capacity: 60.000004, MaximumCapacity: 60.000004, CurrentCapacity: 0.0
======================
Queue Name : support
Queue State : running
Scheduling Info : Capacity: 40.0, MaximumCapacity: 50.0, CurrentCapacity: 0.0
    ======================
    Queue Name : services
    Queue State : running
    Scheduling Info : Capacity: 40.0, MaximumCapacity: 40.0, CurrentCapacity: 0.0
    ======================
    Queue Name : training
    Queue State : running
    Scheduling Info : Capacity: 60.000004, MaximumCapacity: 70.0, CurrentCapacity: 0.0
======================
Queue Name : production
Queue State : running
Scheduling Info : Capacity: 30.000002, MaximumCapacity: 100.0, CurrentCapacity: 22.222223



Appendix
capacity-scheduler.xml, this file is created on yarn-rm node in /etc/hadoop/conf with permissions as 0755 owner root. 

This file governs configuration of queues on RM

<configuration>
<property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>1.0</value>
</property>

<property>
        <name>yarn.scheduler.capacity.maximum-applications</name>
        <value>2000</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.acl_administer_queue</name>
        <value>*</value>
</property>

<property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>research,support,production</value>
</property>


<property>
        <name>yarn.scheduler.capacity.root.research.capacity</name>
        <value>30</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.maximum-capacity</name>
        <value>30</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.state</name>
        <value>RUNNING</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.user-limit-factor</name>
        <value>1</value>
</property>

<property>
        <name>yarn-scheduler.capacity.root.research.minimum-user-limit-percent</name>
        <value>80</value>
</property>



<property>
        <name>yarn.scheduler.capacity.root.research.analytics.capacity</name>
        <value>40</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.analytics.maximum-capacity</name>
        <value>60</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.analytics.state</name>
        <value>RUNNING</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.analytics.user-limit-factor</name>
        <value>1</value>
</property>

<property>
        <name>yarn-scheduler.capacity.root.research.analytics.minimum-user-limit-percent</name>
        <value>20</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.queues</name>
        <value>analytics,data</value>
</property>



<property>
        <name>yarn.scheduler.capacity.root.research.data.capacity</name>
        <value>60</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.data.maximum-capacity</name>
        <value>60</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.data.state</name>
        <value>RUNNING</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.research.data.user-limit-factor</name>
        <value>1</value>
</property>

<property>
        <name>yarn-scheduler.capacity.root.research.data.minimum-user-limit-percent</name>
        <value>20</value>
</property>




<property>
        <name>yarn.scheduler.capacity.root.production.capacity</name>
        <value>30</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.production.maximum-capacity</name>
        <value>100</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.production.state</name>
        <value>RUNNING</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.production.user-limit-factor</name>
        <value>1</value>
</property>

<property>
        <name>yarn-scheduler.capacity.root.production.minimum-user-limit-percent</name>
        <value>20</value>
</property>



<property>
        <name>yarn.scheduler.capacity.root.support.capacity</name>
        <value>40</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.maximum-capacity</name>
        <value>50</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.state</name>
        <value>RUNNING</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.user-limit-factor</name>
        <value>1</value>
</property>

<property>
        <name>yarn-scheduler.capacity.root.support.minimum-user-limit-percent</name>
        <value>20</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.queues</name>
        <value>training,services</value>
</property>



<property>
        <name>yarn.scheduler.capacity.root.support.training.capacity</name>
        <value>60</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.training.maximum-capacity</name>
        <value>70</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.training.state</name>
        <value>RUNNING</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.training.user-limit-factor</name>
        <value>1</value>
</property>

<property>
        <name>yarn-scheduler.capacity.root.support.training.minimum-user-limit-percent</name>
        <value>20</value>
</property>




<property>
        <name>yarn.scheduler.capacity.root.support.services.capacity</name>
        <value>40</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.services.maximum-capacity</name>
        <value>40</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.services.state</name>
        <value>RUNNING</value>
</property>

<property>
        <name>yarn.scheduler.capacity.root.support.services.user-limit-factor</name>
        <value>1</value>
</property>

<property>
        <name>yarn-scheduler.capacity.root.support.services.minimum-user-limit-percent</name>
        <value>20</value>
</property>
<property>
        <name>yarn.scheduler.capacity.queue-mappings</name>
        <value>u:sqoop:production,u:hdfs:production,g:hadoop:services,u:%user:%user,g:analytics:analytics,g:data:data,g:training:training,g:services:services</value>
</property>
</configuration>

No comments:
Write comments