Advertisement

Monday, April 16, 2018

Hadoop V2 - WebHdfs

In this blog I discuss how to setup WebHDFS.


In your hdfs-site.xml
Setup below property on namenode in hdfs-site.xml

<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>

Distribute this on all nodes

This will need bounce of all Datanodes and Namenode (all Namenodes in case of Federated / Standby Cluster)

[As hdfs on namenode]
stop-dfs.sh
start-dfs.sh


Now in any browser of your choice try to open a file using below format 
 http://<namenode>:<port>/webhdfs/v1/<path_to_file>?op=OPEN&user.name=<username>

where username = user which has right permissions to read the file. 
 

http://nn:50070/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs 

You will see the request is redirected to one of the nodes which can cater your file. 
 (this is why you need a bounce)
  


IN order to do the same using CLI / CURL


[hdfs@nn tmp]$ curl -i -L "http://nn:50070/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs"
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Mon, 16 Apr 2018 09:20:33 GMT
Date: Mon, 16 Apr 2018 09:20:33 GMT
Pragma: no-cache
Expires: Mon, 16 Apr 2018 09:20:33 GMT
Date: Mon, 16 Apr 2018 09:20:33 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1523906433975&s=gI+p66RzmNMV1f7DKQM1oZ4aEoE="; Path=/; Expires=Mon, 16-Apr-2018 19:20:33 GMT; HttpOnly
Location: http://d1.novalocal:50075/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs&namenoderpcaddress=nn:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26)

HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 416

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.101 nn.novalocal      nn
192.168.2.102 rm.novalocal      rm
192.168.2.103 snn.novalocal     snn
192.168.2.104 d1.novalocal        d1n
192.168.2.105 d2.novalocal        d2n
192.168.2.106 d3.novalocal        d3n
192.168.2.107 d4.novalocal        d4n


Now to list Directory Status use 


curl -i -L "http://nn:50070/webhdfs/v1/data/conf/?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 16 Apr 2018 09:23:47 GMT
Date: Mon, 16 Apr 2018 09:23:47 GMT
Pragma: no-cache
Expires: Mon, 16 Apr 2018 09:23:47 GMT
Date: Mon, 16 Apr 2018 09:23:47 GMT
Pragma: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26)

{"FileStatuses":{"FileStatus":[
{"accessTime":1523854423644,"blockSize":134217728,"childrenNum":0,"fileId":16420,"group":"admingroup","length":4436,"modificationTime":1523854423825,"owner":"hdfs","pathSuffix":"capacity-scheduler.xml","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1523854423837,"blockSize":134217728,"childrenNum":0,"fileId":16421,"group":"admingroup","length":1335,"modificationTime":1523854423863,"owner":"hdfs","pathSuffix":"configuration.xsl","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1523854423870,"blockSize":134217728,"childrenNum":0,"fileId":16422,"group":"admingroup","length":318,"modificationTime":1523854423890,"owner":"hdfs","pathSuffix":"container-executor.cfg","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
.
.
.
]}} 



You can get full list of command information here (or as per version you use)

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

No comments:
Write comments