In this blog I discuss setup of HttpFS in Hadoop
IN hadoop HttpFS
1. Acts as a proxy server for catering to REST requests
2. Acts as single point of contact for all the clients
Clients do not need connectivity to datanodes as in case of WebHDFS
3. Can work on a multi namenode cluster unlike WebHDFS
(All the step are run on server which will be proxy node or edge node as user root except where mentioned)
1. Create User httpfs
groupadd -g 1000 hadoop
useradd -u 1010 -g hadoop httpfs
2. Setup Java and Hadoop
rpm -Uvh /tmp/jdk-8u152-linux-x64.rpm
scp -r root@nn:/usr/local/hadoop-2.7.5 /usr/local/
rm -rf /usr/local/hadoop/etc/hadoop
mkdir -p /etc/hadoop
scp -r nn:/etc/hadoop/conf /etc/hadoop
chmod -R 755 /etc/hadoop/conf
Create Soft Links
ln -s /usr/local/hadoop-2.7.5 /usr/local/hadoop
ln -s /etc/hadoop/conf /usr/local/hadoop-2.7.5/etc/hadoop
3. Setup Profile
scp root@nn:/tmp/profile.sh /etc/profile.d
source /etc/profile.d/profile.sh
4. Setup Sudo
httpfs ALL=(ALL) NOPASSWD: ALL
5. Create Directories
mkdir -p /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp /opt/HDPV2/temp
chown -R httpfs:hadoop /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp /opt/HDPV2/temp
chmod -R 755 /opt/HDPV2
chmod 0755 /usr/local/hadoop/share/hadoop/httpfs/tomcat/conf/*
6. Change - core.site.xml [as hdfs - on NN, SNN and httpfs server]
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>192.168.1.71</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
Stop and Start Namenode and SecondaryNamenode
hadoop-daemon.sh stop secondarynamenode
hadoop-daemon.sh start secondarynamenode
hadoop-daemon.sh start namenode
hadoop-daemon.sh stop namenode
7. Edit httpfs-env.sh [as httpfs on httpfs server]
cd $CONF
Add below
sudo vi httpfs-env.sh
export HTTPFS_LOG=/opt/HDPV2/logs #Custom
export HTTPFS_TEMP=/opt/HDPV2/temp #Custom
8. Start httpfs [as httpfs on httpfs server]
httpfs.sh start
Test and your httpfs should be ready.
curl -sS 'http://192.168.1.71:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs'
{"Path":"\/user\/hdfs"}
You can use the same API as in webhdfs, except now you are using a proxy host.
IN hadoop HttpFS
1. Acts as a proxy server for catering to REST requests
2. Acts as single point of contact for all the clients
Clients do not need connectivity to datanodes as in case of WebHDFS
3. Can work on a multi namenode cluster unlike WebHDFS
(All the step are run on server which will be proxy node or edge node as user root except where mentioned)
1. Create User httpfs
groupadd -g 1000 hadoop
useradd -u 1010 -g hadoop httpfs
2. Setup Java and Hadoop
rpm -Uvh /tmp/jdk-8u152-linux-x64.rpm
scp -r root@nn:/usr/local/hadoop-2.7.5 /usr/local/
rm -rf /usr/local/hadoop/etc/hadoop
mkdir -p /etc/hadoop
scp -r nn:/etc/hadoop/conf /etc/hadoop
chmod -R 755 /etc/hadoop/conf
Create Soft Links
ln -s /usr/local/hadoop-2.7.5 /usr/local/hadoop
ln -s /etc/hadoop/conf /usr/local/hadoop-2.7.5/etc/hadoop
3. Setup Profile
scp root@nn:/tmp/profile.sh /etc/profile.d
source /etc/profile.d/profile.sh
4. Setup Sudo
httpfs ALL=(ALL) NOPASSWD: ALL
5. Create Directories
mkdir -p /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp /opt/HDPV2/temp
chown -R httpfs:hadoop /opt/HDPV2/logs /opt/HDPV2/pids /opt/HDPV2/1 /opt/HDPV2/2 /opt/HDPV2/tmp /opt/HDPV2/temp
chmod -R 755 /opt/HDPV2
chmod 0755 /usr/local/hadoop/share/hadoop/httpfs/tomcat/conf/*
6. Change - core.site.xml [as hdfs - on NN, SNN and httpfs server]
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>192.168.1.71</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
Stop and Start Namenode and SecondaryNamenode
hadoop-daemon.sh stop secondarynamenode
hadoop-daemon.sh start secondarynamenode
hadoop-daemon.sh start namenode
hadoop-daemon.sh stop namenode
7. Edit httpfs-env.sh [as httpfs on httpfs server]
cd $CONF
Add below
sudo vi httpfs-env.sh
export HTTPFS_LOG=/opt/HDPV2/logs #Custom
export HTTPFS_TEMP=/opt/HDPV2/temp #Custom
8. Start httpfs [as httpfs on httpfs server]
httpfs.sh start
Test and your httpfs should be ready.
curl -sS 'http://192.168.1.71:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs'
{"Path":"\/user\/hdfs"}
You can use the same API as in webhdfs, except now you are using a proxy host.
No comments:
Write comments