Advertisement

Showing posts with label Hadoop V1. Show all posts
Showing posts with label Hadoop V1. Show all posts

Sunday, March 4, 2018

Hadoop - ENOENT: No such file or directory (Mapred Task Start)

You get below Error in your Mapreduce job and there are directories which need permission fix. 
The directories are 

1. <HADDOP_LOG>/userlogs to have right permission for mapred
2. mr/userlogs - These are all the directories which are mentioned as mapreduce local directories in the configuration files.

Once you fix the permission such that user who is starting mapreduce job - tasktrackers. It will get fixed automatically on next start of job.

2018-03-03 15:02:40,011 WARN org.apache.hadoop.mapred.TaskTracker: Exception while localization ENOENT: No such file or directory
        at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
        at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
        at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
        at org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:240)
        at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:205)
        at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1336)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1311)
        at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1226)
        at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2603)
        at java.lang.Thread.run(Thread.java:748)

2018-03-03 15:02:40,012 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred cause:ENOENT: No such file or directory
2018-03-03 15:02:40,013 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201803031350_0001_m_000019_3:
ENOENT: No such file or directory

Hadoop V1/V2 - Administrative Commands Starter

In this blog we are going to see some basic administrative commands in Hadoop - 


1. Status Report of DFS (Distributed File Systems)
 hadoop dfsadmin -report
Configured Capacity: 133660540928 (124.48 GB)
Present Capacity: 133660540928 (124.48 GB)
DFS Remaining: 133660270592 (124.48 GB)
DFS Used: 270336 (264 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)

Name: 192.168.10.54:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415065600(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


Name: 192.168.10.57:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 61440 (60 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415073792(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


Name: 192.168.10.55:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 69632 (68 KB)
on DFS Used: 0 (0 KB)
DFS Remaining: 33415065600(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


Name: 192.168.10.58:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415065600(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Fri Mar 02 12:59:27 CET 2018


2. Check Contents of DFS
[hduser@namenode ~]$ hadoop dfs -ls /
Found 1 items
drwxr-xr-x   - hduser supergroup          0 2018-02-22 04:14 /tmp

3. Create Directory
[hduser@namenode ~]$ hadoop dfs -mkdir /user/vin

4. Copy  and Delete File / Directory in HDFS
[hduser@namenode hadoop]$ ls -l conf/
total 84
-rwxr-xr-x 1 root root 7457 Jul 23  2013 capacity-scheduler.xml
-rwxr-xr-x 1 root root 1095 Jul 23  2013 configuration.xsl
-rwxr-xr-x 1 root root  415 Feb 22 03:48 core-site.xml
-rwxr-xr-x 1 root root   76 Feb 22 03:48 dfs.hosts.include
-rwxr-xr-x 1 root root  327 Jul 23  2013 fair-scheduler.xml
-rwxr-xr-x 1 root root 2497 Feb 22 03:48 hadoop-env.sh
-rwxr-xr-x 1 root root 2052 Jul 23  2013 hadoop-metrics2.properties
-rwxr-xr-x 1 root root 4644 Jul 23  2013 hadoop-policy.xml
-rwxr-xr-x 1 root root 1357 Feb 22 03:48 hdfs-site.xml
-rwxr-xr-x 1 root root 5018 Jul 23  2013 log4j.properties
-rwxr-xr-x 1 root root 2033 Jul 23  2013 mapred-queue-acls.xml
-rwxr-xr-x 1 root root 1359 Feb 25 13:49 mapred-site.xml
-rwxr-xr-x 1 root root    4 Feb 22 03:48 masters
-rwxr-xr-x 1 root root   16 Feb 22 03:48 slaves
-rwxr-xr-x 1 root root 2042 Jul 23  2013 ssl-client.xml.example
-rwxr-xr-x 1 root root 1994 Jul 23  2013 ssl-server.xml.example
-rwxr-xr-x 1 root root  382 Jul 23  2013 taskcontroller.cfg
-rwxr-xr-x 1 root root 3890 Jul 23  2013 task-log4j.properties

Copy file to particular Directory hdfs
[hduser@namenode etc]$ hadoop dfs -put group /user/vin/


Copy file to Directory with target file name
[hduser@namenode etc]$ hadoop dfs -put group /user/vin/mygroupfile

Wild Card Listing of File
[hduser@namenode etc]$ hadoop dfs -ls /user/vin/*group*
-rw-r--r--   3 hduser supergroup        682 2018-03-02 13:19 /user/vin/group
-rw-r--r--   3 hduser supergroup        682 2018-03-02 13:19 /user/vin/mygroupfile


[hduser@namenode hadoop]$ hadoop dfs -put conf/ /user/vin/

[hduser@namenode hadoop]$ hadoop dfs -ls /user/vin/conf
Found 18 items
-rw-r--r--   3 hduser supergroup       7457 2018-03-02 13:08 /user/vin/conf/capacity-scheduler.xml
-rw-r--r--   3 hduser supergroup       1095 2018-03-02 13:08 /user/vin/conf/configuration.xsl
-rw-r--r--   3 hduser supergroup        415 2018-03-02 13:08 /user/vin/conf/core-site.xml
-rw-r--r--   3 hduser supergroup         76 2018-03-02 13:08 /user/vin/conf/dfs.hosts.include
-rw-r--r--   3 hduser supergroup        327 2018-03-02 13:08 /user/vin/conf/fair-scheduler.xml
-rw-r--r--   3 hduser supergroup       2497 2018-03-02 13:08 /user/vin/conf/hadoop-env.sh
-rw-r--r--   3 hduser supergroup       2052 2018-03-02 13:08 /user/vin/conf/hadoop-metrics2.properties
-rw-r--r--   3 hduser supergroup       4644 2018-03-02 13:08 /user/vin/conf/hadoop-policy.xml
-rw-r--r--   3 hduser supergroup       1357 2018-03-02 13:08 /user/vin/conf/hdfs-site.xml
-rw-r--r--   3 hduser supergroup       5018 2018-03-02 13:08 /user/vin/conf/log4j.properties
-rw-r--r--   3 hduser supergroup       2033 2018-03-02 13:08 /user/vin/conf/mapred-queue-acls.xml
-rw-r--r--   3 hduser supergroup       1359 2018-03-02 13:08 /user/vin/conf/mapred-site.xml
-rw-r--r--   3 hduser supergroup          4 2018-03-02 13:08 /user/vin/conf/masters
-rw-r--r--   3 hduser supergroup         16 2018-03-02 13:08 /user/vin/conf/slaves
-rw-r--r--   3 hduser supergroup       2042 2018-03-02 13:08 /user/vin/conf/ssl-client.xml.example
-rw-r--r--   3 hduser supergroup       1994 2018-03-02 13:08 /user/vin/conf/ssl-server.xml.example
-rw-r--r--   3 hduser supergroup       3890 2018-03-02 13:08 /user/vin/conf/task-log4j.properties
-rw-r--r--   3 hduser supergroup        382 2018-03-02 13:08 /user/vin/conf/taskcontroller.cfg

Delete Directory
[hduser@namenode ~]$ hadoop fs -rmr /tmp/o1
Moved to trash: hdfs://nn:8020/tmp/o1

Delete File(s)
[hduser@namenode ~]$ hadoop fs -rm /user/vin/conf/mapred-site.xml
Moved to trash: hdfs://nn:8020/user/vin/conf/mapred-site.xml

[hduser@namenode ~]$ hadoop fs -rm /user/vin/conf/hadoop*
Moved to trash: hdfs://nn:8020/user/vin/conf/hadoop-env.sh
Moved to trash: hdfs://nn:8020/user/vin/conf/hadoop-metrics2.properties
Moved to trash: hdfs://nn:8020/user/vin/conf/hadoop-policy.xml


5. Set Quota on Directory
[hduser@namenode hadoop]$ hadoop dfs -mkdir /user/nish
[hduser@namenode hadoop]$ hadoop dfsadmin -setSpaceQuota 1098304000 /user/nish #1000M Quota

6. Check Quota
[hduser@namenode hadoop]$ hadoop dfs -count -q /user/nish
        none             inf        10983040        10983040            1            0                  0 hdfs://nn:8020/user/nish

7. Put File After Quota
 [hduser@namenode conf]$ ls -l mapred-site.xml
-rwxr-xr-x 1 root root 1359 Feb 25 13:49 mapred-site.xml

[hduser@namenode conf]$ hadoop dfs -put mapred-site.xml /user/nish/
18/03/02 13:16:50 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/nish is exceeded: quota=10983040 diskspace consumed=384.0m

*** Quota Should be atleast Block Size * replication Factor

8. Add Quota to number of files
# First Increase space Quota to 1G
[hduser@namenode etc]$ hadoop dfsadmin -setSpaceQuota 1073741824 /user/nish
Set # of Files to 10
[hduser@namenode etc]$ hadoop dfsadmin -setQuota 10 /user/nish

Check Quota Output
[hduser@namenode etc]$ hadoop dfs -count -q /user/nish
          10               8      1073741824      1073741824            1            1                  0 hdfs://nn:8020/user/nish

9. Put 10 Files After Quota
[hduser@namenode hadoop]$ hadoop dfs -put conf/* /user/nish/
put: org.apache.hadoop.hdfs.protocol.NSQuotaExceededException: The NameSpace quota (directories and files) of directory /user/nish is exceeded: quota=10 file count=11

Check How many Files are placed?
 hadoop dfs -ls /user/nish
Found 9 items
-rw-r--r--   3 hduser supergroup       7457 2018-03-02 13:26 /user/nish/capacity-scheduler.xml
-rw-r--r--   3 hduser supergroup       1095 2018-03-02 13:26 /user/nish/configuration.xsl
-rw-r--r--   3 hduser supergroup        415 2018-03-02 13:26 /user/nish/core-site.xml
-rw-r--r--   3 hduser supergroup         76 2018-03-02 13:26 /user/nish/dfs.hosts.include
-rw-r--r--   3 hduser supergroup        327 2018-03-02 13:26 /user/nish/fair-scheduler.xml
-rw-r--r--   3 hduser supergroup       2497 2018-03-02 13:26 /user/nish/hadoop-env.sh
-rw-r--r--   3 hduser supergroup       2052 2018-03-02 13:26 /user/nish/hadoop-metrics2.properties
-rw-r--r--   3 hduser supergroup       4644 2018-03-02 13:26 /user/nish/hadoop-policy.xml
-rw-r--r--   3 hduser supergroup          0 2018-03-02 13:16 /user/nish/mapred-site.xml

10. Clear Quotas
[hduser@namenode hadoop]$ hadoop dfsadmin -clrSpaceQuota /user/nish
[hduser@namenode hadoop]$ hadoop dfsadmin -clrQuota /user/nish

Verify Quota
[hduser@namenode hadoop]$ hadoop dfs -count -q /user/nish
        none             inf            none             inf            1            9              18563 hdfs://nn:8020/user/nish

11.  FSCK
[hduser@namenode ~]$ hadoop fsck /
FSCK started by hduser from /192.168.10.51 for path / at Sat Mar 03 14:30:30 CET 2018
................................Status: HEALTHY
 Total size:    57952 B
 Total dirs:    13
 Total files:   32
 Total blocks (validated):      31 (avg. block size 1869 B)
 Minimally replicated blocks:   31 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          4
 Number of racks:               1
FSCK ended at Sat Mar 03 14:30:30 CET 2018 in 25 milliseconds


The filesystem under path '/' is HEALTHY

[hduser@namenode ~]$ hadoop fsck /user/vin/conf/capacity-scheduler.xml -files -blocks -locations
FSCK started by hduser from /192.168.10.51 for path /user/vin/conf/capacity-scheduler.xml at Sat Mar 03 14:31:36 CET 2018
/user/vin/conf/capacity-scheduler.xml 7457 bytes, 1 block(s):  OK
0. blk_4751168492478894034_1008 len=7457 repl=3 [192.168.10.55:50010, 192.168.10.57:50010, 192.168.10.54:50010]

Status: HEALTHY
 Total size:    7457 B
 Total dirs:    0
 Total files:   1
 Total blocks (validated):      1 (avg. block size 7457 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          4
 Number of racks:               1
FSCK ended at Sat Mar 03 14:31:36 CET 2018 in 1 milliseconds


The filesystem under path '/user/vin/conf/capacity-scheduler.xml' is HEALTHY


12. Killing a Job in Hadoop
Find out the job Id using super user and kill it  -

[hduser@namenode ~]$ hadoop job -list
hadoop 1 jobs currently running
JobId   State   StartTime       UserName        Priority        SchedulingInfo
job_201803041408_0003   1       1520169058581   mapred  NORMAL  NA

[hduser@namenode ~]$ hadoop job  -kill job_201803041408_0003
Killed job job_201803041408_0003

13. Killing Specific Task Attempt in Hadoop
[hduser@namenode ~]$ hadoop job -list
1 jobs currently running
JobId   State   StartTime       UserName        Priority        SchedulingInfo
job_201803041408_0006   4       1520169519215   mapred  NORMAL  NA
[hduser@namenode ~]$ hadoop job -list-attempt-ids job_201803041408_0006 reduce running
attempt_201803041408_0006_r_000000_0
attempt_201803041408_0006_r_000001_0
attempt_201803041408_0006_r_000002_0
attempt_201803041408_0006_r_000003_0
attempt_201803041408_0006_r_000004_0
attempt_201803041408_0006_r_000005_0
attempt_201803041408_0006_r_000006_0
attempt_201803041408_0006_r_000007_0
[hduser@namenode ~]$ hadoop job -kill-task attempt_201803041408_0006_r_000006_0
Killed task attempt_201803041408_0006_r_000006_0

Hadoop V1 - Fair Scheduler Configuration

This is in continuation with my previous blog on Mapreduce Config
In this blog I am going to  Discuss Scheduler Configuration Fair-Scheduler -

Below is my mapred-site.xml for Fair Scheduler Configuration - 


<property>
    <name></name>
    <value></value>
</property>
<property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.FairScheduler</value>
</property>

<property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>/etc/hadoop/conf/fair-scheduler.xml</value>
</property>

<property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>user.name</value>
</property>

<property>
    <name>mapred.fairscheduler.preemption</name>
    <value>true</value>
</property>

<property>
    <name>mapred.fairscheduler.sizebasedweight</name>
    <value>true</value>
</property>

<property>
    <name>mapred.fairscheduler.assignmultiple.maps</name>
    <value>5</value>
</property>

<property>
    <name>mapred.fairscheduler.assignmultiple.reduces</name>
    <value>3</value>
</property>


Fair-Scheduler configuration File
vi /etc/hadoop/conf/fair-scheduler.xml


<allocations>
    <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
    <pool name="prod-analytics">
        <minMaps>30<minMaps>
        <minReduces>10</minReduces>
    </pool>
    <pool name="dev-users">
        <weight>2</weight>
    </pool>
    <pool name="qa-users">
        <weight>1</weight>
    </pool>
    <user name="james">
        <maxRunningJobs>3</maxRunningJobs>
    </user>   

</allocations>


[As root]
# for i in $(cat /tmp/hosts) ;do scp mapred-site.xml fair-scheduler.xml ${i}:/etc/hadoop/conf/ ; done

[As root - Give Permissions[
# for i in $(cat /tmp/hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop ; done;

[Stop and Restart mapred-Services]

stop-mapred.sh
start-mapred.sh


From the log file

[mapred@namenode logs]$ less hadoop-mapred-jobtracker-namenode.cluster.com.log | grep -i Fair
2018-03-03 13:43:03,169 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler



Verify Java Processes on all nodes

[mapred@namenode ~]$ for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | grep -vi jps; echo' ;  done;
namenode.cluster.com
4722 JobTracker


d1node.cluster.com
30419 TaskTracker

d2node.cluster.com
1600 TaskTracker

d3node.cluster.com
26777 TaskTracker

d4node.cluster.com
10144 TaskTracker

Thursday, February 22, 2018

Hadoop V1 Install - Mapred Configuration

MapReduce  - Basic Configuration to Start Hadoop Daemons
 

Configuration of mapred-site.xml
### Only Properties are mentioned ### 


<property>
        <name>mapred.job.tracker</name>
        <value>nn:8021</value>
</property>
<property>
        <name>mapred.local.dir</name>
        <value>/opt/HDPV1/1/mr1,/opt/HDPV1/1/mr2</value>
</property>


Copy the Configuration to all the nodes
[As root]
# for i in $(cat /tmp/hosts) ;do scp mapred-site.xml ${i}:/etc/hadoop/conf/ ; done

[As root - Give Permissions]
# for i in $(cat /tmp/hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop ; done;


# for i in $(cat /tmp/hosts) ;do ssh ${i} chmod  775 /opt/HDPV1/1/ ; done;
# for i in $(cat /tmp/hosts) ;do ssh ${i} mkdir /opt/HDPV1/1/mr1   ; done;
# for i in $(cat /tmp/hosts) ;do ssh ${i} mkdir /opt/HDPV1/1/mr2   ; done;
# for i in $(cat /tmp/hosts) ;do ssh ${i} chown mapred:hadoop   /opt/HDPV1/1/mr1 ; done;
# for i in $(cat /tmp/hosts) ;do ssh ${i} chown mapred:hadoop   /opt/HDPV1/1/mr2 ; done;


[As mapred- on namenode]
Start mapred
start-mapred.sh


for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | grep -vi jps; echo' ;  done;


namenode.cluster.com
29378 JobTracker


d1node.cluster.com
4931 TaskTracker

d2node.cluster.com
7712 TaskTracker

d3node.cluster.com
2359 TaskTracker

d4node.cluster.com
17635 TaskTracker




stop-mapred.sh and start-mapred.sh

 To Optimize Performance You can use below configuration file for mapred-site.xml and restart the daemons using - 


MapReduce - Performance Configuration File


<property>
        <name>mapred.job.tracker</name>
        <value>nn:8021</value>
</property>
<property>
        <name>mapred.local.dir</name>
        <value>/opt/HDPV1/1/mr1,/opt/HDPV1/1/mr2</value>
</property>
<property>
        <name>mapred.java.child.opts</name>
        <value>-Xmx1024m</value>
</property>
<property>
        <name>mapred.child.ulimit</name>
        <value>1572864</value>
</property>
<property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>4</value>
</property>
<property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>2</value>
</property>
<property>
        <name>io.sort.mb</name>
        <value>200</value>
</property>

<property>
        <name>io.sort.factor</name>
        <value>32</value>
</property>
<property>
        <name>mapred.compress.map.output</name>
        <value>true</value>
</property>
<property>
        <name>mapred.map.output.compression.codec</name>
        <value>org.apache.io.compress.SnappyCodec</value>
</property>
<property>
        <name>mapred.jobtracker.taskScheduler</name>
        <value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
        <name>mapred.reduce.tasks</name>
        <value>8</value>
</property>
<property>
        <name>mapred.reduce.slowstart.completed.maps</name>
        <value>0.7</value>
</property>

 

Hadoop V1 Install - Hadoop - Software Configuration

This is in continuation with my last blog Hadoop Software Setup and Environment Configuration

cd /etc/hadoop/conf
[As root or sudo hduser] 

Make changes in  hadoop-env.sh as below
cat hadoop-env.sh
export JAVA_HOME=/usr/java/latest   ## CustomSet
export HADOOP_LOG_DIR=/opt/HDPV1/logs #CustomSet
export HADOOP_PID_DIR=/opt/HDPV1/pids #CustomSet

 

Contents of core-site.xml (Only property section)
<property>
        <name>fs.default.name</name>
        <value>hdfs://nn:8020</value>
</property>
<property>
        <name>io.file.buffer.size</name>
        <value>65536</value>
</property>
<property>
        <name>fs.trash.interval</name>
        <value>600</value>
</property>

Contents of hdfs-site.xml
(Only property section)
<property>
        <name>dfs.http.address</name>
        <value>nn:50070</value>
</property>
<property>
        <name>dfs.name.dir</name>
        <value>/opt/HDPV1/1/dfs/nn,/opt/HDPV1/2/dfs/nn</value>
</property>
<property>
        <name>dfs.data.dir</name>
        <value>/opt/HDPV1/1/dfs/dn,/opt/HDPV1/2/dfs/dn</value>
</property>
<property>
        <name>dfs.secondary.http.address</name>
        <value>snn:50090</value>
</property>
<property>
        <name>fs.checkpoint.dir</name>
        <value>/opt/HDPV1/1/dfs/snn</value>
</property>
<property>
        <name>dfs.block.size</name>
        <value>134217728</value>
</property>
<property>
        <name>dfs.balance.bandwidthPerSec</name>
        <value>1048576</value>
</property>
<property>
        <name>dfs.datanode.du.reserved</name>
        <value>4294967296</value>
</property>
<property>
        <name>dfs.namenode.handler.count</name>
        <value>20</value>
</property>
<property>
        <name>dfs.hosts</name>
        <value>/etc/hadoop/conf/dfs.hosts.include</value>
</property>
<property>
        <name>dfs.hosts.exclude</name>
        <value>/etc/hadoop/conf/dfs.hosts.exclude</value>
</property>
<property>
        <name>dfs.datanode.failed.volumes.tolerated</name>
        <value>0</value>
</property>
<property>
        <name>dfs.replication</name>
        <value>3</value>
</property>



Contents of slaves file

cat slaves
d1n
d2n
d3n
d4n


## The Contents of slaves and include are purposefully let in different formats. The contents of include must be in FQDN format, as this is how hadoop datanodes daemones register themselves with NN.
However Slaves file is used for ssh usage and start of hadoop daemones by start-dfs.sh (start-all.sh)

Contens of dfs.hosts.include

 cat dfs.hosts.include
d1node.cluster.com
d2node.cluster.com
d3node.cluster.com
d4node.cluster.com


Contents of Masters file
(Remember Masters on SNN should be NN - For Failover)
[hduser@namenode conf]$ cat masters
snn



[On Name Node - As hduser]

NameNode Format
hadoop namenode -format

18/02/20 10:19:44 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/HDPV1/1/dfs/nn/current/edits
18/02/20 10:19:44 INFO common.Storage: Storage directory /opt/HDPV1/1/dfs/nn has been successfully formatted.
18/02/20 10:19:44 INFO common.Storage: Image file /opt/HDPV1/2/dfs/nn/current/fsimage of size 112 bytes saved in 0 seconds.
18/02/20 10:19:44 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/HDPV1/2/dfs/nn/current/edits
18/02/20 10:19:44 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/HDPV1/2/dfs/nn/current/edits
18/02/20 10:19:44 INFO common.Storage: Storage directory /opt/HDPV1/2/dfs/nn has been successfully formatted.
18/02/20 10:19:44 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode.cluster.com/192.168.10.51
************************************************************/

[As root]
Copy the Configuration to all the nodes
# for i in $(cat /tmp/hosts) ;do scp hadoop-env.sh dfs.hosts.include core-site.xml hdfs-site.xml slaves masters ${i}:/etc/hadoop/conf/ ; done

[As root - Give Permissions[
# for i in $(cat /tmp/hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop ; done;

[As hduser - On NameNode]
start-dfs.sh

starting namenode, logging to /opt/HDPV1/logs/hadoop-hduser-namenode-namenode.cluster.com.out
d1n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d1node.cluster.com.out
d3n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d3node.cluster.com.out
d4n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d4node.cluster.com.out
d2n: starting datanode, logging to /opt/HDPV1/logs/hadoop-hduser-datanode-d2node.cluster.com.out
snn: starting secondarynamenode, logging to /opt/HDPV1/logs/hadoop-hduser-secondarynamenode-snamenode.cluster.com.out



Verify Java processes (Hadoop Processes)
[As hduser - On NameNode]


##for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | grep -vi jps; echo' ;  done;
namenode.cluster.com
28557 NameNode

snamenode.cluster.com
13643 SecondaryNameNode

d1node.cluster.com
4476 DataNode

d2node.cluster.com
7285 DataNode

d3node.cluster.com
1928 DataNode

d4node.cluster.com
17210 DataNode


@This point Hadoop Cluster is up and running with 1 NN, 1 SNN and 4 DN.

 hadoop dfsadmin -report
Configured Capacity: 133660540928 (124.48 GB)
Present Capacity: 133660540928 (124.48 GB)
DFS Remaining: 133660311552 (124.48 GB)
DFS Used: 229376 (224 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)

Name: 192.168.10.54:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018


Name: 192.168.10.57:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:49 CET 2018


Name: 192.168.10.55:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018


Name: 192.168.10.58:50010
Decommission Status : Normal
Configured Capacity: 33415135232 (31.12 GB)
DFS Used: 57344 (56 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 33415077888(31.12 GB)
DFS Used%: 0%
DFS Remaining%: 100%
Last contact: Thu Feb 22 04:03:50 CET 2018

 
# Finally Provide permissions 777 to /tmp/ on HDFS
[hduser@namenode ~]$ hadoop fs -chmod 777 /tmp

 

Hadoop V1 Install - Hadoop Software Setup and Evironment Configuration

This is continuation from last blog of Hadoop V1 Pre-req

Step 1
[As root -  Namenode - Send Hadoop Binaries]
# for i in $(cat hosts) ; do echo "scp hadoop-1.2.1.tar.gz ${i}:/tmp &" >> /tmp/sendhdpv1.bash ; done
bash /tmp/sendhdpv1.bash


Step 2
[As root - Extract Hadoop]
#for i in $(cat hosts) ;do ssh ${i} tar -xzf /tmp/hadoop-1.2.1.tar.gz -C /usr/local; done

Step 3 

[As root - Setup sudoers configuration]
#for i in $(cat hosts) ; do ssh ${i}  echo '"hduser        ALL=(ALL)       NOPASSWD: ALL" >> /etc/sudoers'; done
Step 4
[As root - Create Conf Directory]
#for i in $(cat hosts) ;do ssh ${i} mkdir /etc/hadoop; done



Step 5 - All Other Configurations
[As root - Move the conf directory]
# for i in $(cat hosts) ;do ssh ${i} mv /usr/local/hadoop-1.2.1/conf /etc/hadoop/conf; done

[As root - Give Permissions]
# for i in $(cat hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop ; done;


# for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | mkdir -p /opt/HDPV1/logs ; echo' ;  done;
# for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | chmod 777  /opt/HDPV1/logs; echo' ;  done;

# for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | mkdir -p /opt/HDPV1/pids ; echo' ;  done;
# for i in $(cat /tmp/hosts) ; do ssh ${i} 'hostname; jps | chmod 777  /opt/HDPV1/pids; echo' ;  done;


[As root - Create Soft Link to Hadoop]
# for i in $(cat hosts) ;do ssh ${i} ln -s /usr/local/hadoop-1.2.1 /usr/local/hadoop  ; done


[As root - Create Soft Link]
# for i in $(cat hosts) ;do ssh ${i} ln -s /etc/hadoop/conf /usr/local/hadoop-1.2.1/conf ; done


[As hduser and mapred - Set Environment Variables [Change hduser to mapred]]

# for i in $(cat hosts) ; do ssh ${i} echo 'export HADOOP_PREFIX=/usr/local/hadoop >> /home/hduser/.bashrc' ; done
#for i in $(cat hosts) ; do ssh ${i} echo 'export JAVA_HOME=/usr/java/latest >> /home/hduser/.bashrc' ; done

#for i in $(cat hosts) ; do ssh ${i} echo 'export LOG=/opt/HDPV1/logs >> /home/hduser/.bashrc' ; done

#for i in $(cat hosts) ; do ssh ${i} echo 'export CONF=/etc/hadoop/conf >> /home/hduser/.bashrc' ; done


#for i in $(cat hosts) ; do ssh ${i} echo 'PATH=\$JAVA_HOME/bin:\$HADOOP_PREFIX/bin:\$HADOOP_PREFIX/sbin:\$PATH >> /home/hduser/.bashrc' ; done
#for i in $(cat hosts) ; do ssh ${i} echo 'export PATH >> /home/hduser/.bashrc' ; done