In this blog I discuss how to setup WebHDFS.
In your hdfs-site.xml
Setup below property on namenode in hdfs-site.xml
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
Distribute this on all nodes
This will need bounce of all Datanodes and Namenode (all Namenodes in case of Federated / Standby Cluster)
[As hdfs on namenode]
stop-dfs.sh
start-dfs.sh
Now in any browser of your choice try to open a file using below format
http://<namenode>:<port>/webhdfs/v1/<path_to_file>?op=OPEN&user.name=<username>
where username = user which has right permissions to read the file.
http://nn:50070/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs
You will see the request is redirected to one of the nodes which can cater your file.
(this is why you need a bounce)
IN order to do the same using CLI / CURL
[hdfs@nn tmp]$ curl -i -L "http://nn:50070/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs"
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Mon, 16 Apr 2018 09:20:33 GMT
Date: Mon, 16 Apr 2018 09:20:33 GMT
Pragma: no-cache
Expires: Mon, 16 Apr 2018 09:20:33 GMT
Date: Mon, 16 Apr 2018 09:20:33 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1523906433975&s=gI+p66RzmNMV1f7DKQM1oZ4aEoE="; Path=/; Expires=Mon, 16-Apr-2018 19:20:33 GMT; HttpOnly
Location: http://d1.novalocal:50075/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs&namenoderpcaddress=nn:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26)
HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 416
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.101 nn.novalocal nn
192.168.2.102 rm.novalocal rm
192.168.2.103 snn.novalocal snn
192.168.2.104 d1.novalocal d1n
192.168.2.105 d2.novalocal d2n
192.168.2.106 d3.novalocal d3n
192.168.2.107 d4.novalocal d4n
Now to list Directory Status use
curl -i -L "http://nn:50070/webhdfs/v1/data/conf/?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 16 Apr 2018 09:23:47 GMT
Date: Mon, 16 Apr 2018 09:23:47 GMT
Pragma: no-cache
Expires: Mon, 16 Apr 2018 09:23:47 GMT
Date: Mon, 16 Apr 2018 09:23:47 GMT
Pragma: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26)
{"FileStatuses":{"FileStatus":[
{"accessTime":1523854423644,"blockSize":134217728,"childrenNum":0,"fileId":16420,"group":"admingroup","length":4436,"modificationTime":1523854423825,"owner":"hdfs","pathSuffix":"capacity-scheduler.xml","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1523854423837,"blockSize":134217728,"childrenNum":0,"fileId":16421,"group":"admingroup","length":1335,"modificationTime":1523854423863,"owner":"hdfs","pathSuffix":"configuration.xsl","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1523854423870,"blockSize":134217728,"childrenNum":0,"fileId":16422,"group":"admingroup","length":318,"modificationTime":1523854423890,"owner":"hdfs","pathSuffix":"container-executor.cfg","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
.
.
.
]}}
You can get full list of command information here (or as per version you use)
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
In your hdfs-site.xml
Setup below property on namenode in hdfs-site.xml
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
Distribute this on all nodes
This will need bounce of all Datanodes and Namenode (all Namenodes in case of Federated / Standby Cluster)
[As hdfs on namenode]
stop-dfs.sh
start-dfs.sh
Now in any browser of your choice try to open a file using below format
http://<namenode>:<port>/webhdfs/v1/<path_to_file>?op=OPEN&user.name=<username>
where username = user which has right permissions to read the file.
http://nn:50070/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs
You will see the request is redirected to one of the nodes which can cater your file.
(this is why you need a bounce)
IN order to do the same using CLI / CURL
[hdfs@nn tmp]$ curl -i -L "http://nn:50070/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs"
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Mon, 16 Apr 2018 09:20:33 GMT
Date: Mon, 16 Apr 2018 09:20:33 GMT
Pragma: no-cache
Expires: Mon, 16 Apr 2018 09:20:33 GMT
Date: Mon, 16 Apr 2018 09:20:33 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1523906433975&s=gI+p66RzmNMV1f7DKQM1oZ4aEoE="; Path=/; Expires=Mon, 16-Apr-2018 19:20:33 GMT; HttpOnly
Location: http://d1.novalocal:50075/webhdfs/v1/data/conf/hosts?op=OPEN&user.name=hdfs&namenoderpcaddress=nn:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26)
HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 416
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.101 nn.novalocal nn
192.168.2.102 rm.novalocal rm
192.168.2.103 snn.novalocal snn
192.168.2.104 d1.novalocal d1n
192.168.2.105 d2.novalocal d2n
192.168.2.106 d3.novalocal d3n
192.168.2.107 d4.novalocal d4n
Now to list Directory Status use
curl -i -L "http://nn:50070/webhdfs/v1/data/conf/?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Mon, 16 Apr 2018 09:23:47 GMT
Date: Mon, 16 Apr 2018 09:23:47 GMT
Pragma: no-cache
Expires: Mon, 16 Apr 2018 09:23:47 GMT
Date: Mon, 16 Apr 2018 09:23:47 GMT
Pragma: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26)
{"FileStatuses":{"FileStatus":[
{"accessTime":1523854423644,"blockSize":134217728,"childrenNum":0,"fileId":16420,"group":"admingroup","length":4436,"modificationTime":1523854423825,"owner":"hdfs","pathSuffix":"capacity-scheduler.xml","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1523854423837,"blockSize":134217728,"childrenNum":0,"fileId":16421,"group":"admingroup","length":1335,"modificationTime":1523854423863,"owner":"hdfs","pathSuffix":"configuration.xsl","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1523854423870,"blockSize":134217728,"childrenNum":0,"fileId":16422,"group":"admingroup","length":318,"modificationTime":1523854423890,"owner":"hdfs","pathSuffix":"container-executor.cfg","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
.
.
.
]}}
You can get full list of command information here (or as per version you use)
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
No comments:
Write comments