In this blog I will demonstrate how to import data using sqoop from Oracle to HDFS
If you have followed my last blog, you have your sqoop installation ready.
Step 1 - Download Jar
Download ojdbc8.jar from below (This is for release 12c, so use drivers jar as per your release)
http://www.oracle.com/technetwork/database/features/jdbc/jdbc-ucp-122-3110062.html
Step 2 - Copy Jar to lib
[root@oem13cr2 conf]# cd /usr/local/sqoop/lib
[root@oem13cr2 lib]# cp /tmp/ojdbc8.jar .
Provide read permissions to others if required
Step 3 - Create directory for import [As hdfs on nn]
hdfs dfs -mkdir /sqoop
hdfs dfs -chown sqoop:admingroup /sqoop
Step 4 - Import
sqoop-import --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--table SCOTT.EMP \
--fields-terminated-by '\t' --lines-terminated-by '\n' \
--username SCOTT --password TIGER \
--target-dir /sqoop/scott
18/04/20 05:37:51 INFO mapreduce.Job: map 75% reduce 0%
18/04/20 05:37:56 INFO mapreduce.Job: map 100% reduce 0%
18/04/20 05:37:57 INFO mapreduce.Job: Job job_1524137551614_0005 completed successfully
18/04/20 05:37:57 INFO mapreduce.Job: Counters: 31
File Output Format Counters
Bytes Written=817
18/04/20 05:37:57 INFO mapreduce.ImportJobBase: Transferred 817 bytes in 50.9334 seconds (16.0406 bytes/sec)
18/04/20 05:37:57 INFO mapreduce.ImportJobBase: Retrieved 14 records.
hdfs dfs -ls /sqoop/scott/
Found 5 items
-rw-r--r-- 3 sqoop admingroup 0 2018-04-20 05:37 /sqoop/scott/_SUCCESS
-rw-r--r-- 3 sqoop admingroup 115 2018-04-20 05:37 /sqoop/scott/part-m-00000
-rw-r--r-- 3 sqoop admingroup 117 2018-04-20 05:37 /sqoop/scott/part-m-00001
-rw-r--r-- 3 sqoop admingroup 238 2018-04-20 05:37 /sqoop/scott/part-m-00002
-rw-r--r-- 3 sqoop admingroup 347 2018-04-20 05:37 /sqoop/scott/part-m-00003
Import All Tables Sqoop
sqoop import-all-tables --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--username SCOTT --password TIGER -m 1 -as-avrodatafile
hdfs dfs -ls /user/sqoop
Found 6 items
drwx------ - sqoop admingroup 0 2018-04-20 05:35 /user/sqoop/.Trash
drwx------ - sqoop admingroup 0 2018-04-20 06:01 /user/sqoop/.staging
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:00 /user/sqoop/BONUS
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:00 /user/sqoop/DEPT
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:00 /user/sqoop/EMP
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:01 /user/sqoop/SALGRADE
How does it all happen
1. Sqoop is responsible for connecting to RDBMS and fetches metadata of the table.
2. It generates a java class , compiles and connects to hadoop cluster (Resource Manager) and then submits an import job.
3. MR job does the task and data is imported to HDFS directories
Sqoop only does the monitoring work and oversess the completion
Import into specific datafile type
Avro : -as-avrodatafile
sqoop import-all-tables -Dmapreduce.job.user.classpath.first=true --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--username SCOTT --password TIGER -m 1 -as-avrodatafile
Data is now imported as avro format
[hdfs@nn ~]$ hdfs dfs -ls /user/sqoop/EMP
Found 2 items
-rw-r--r-- 3 sqoop admingroup 0 2018-04-20 09:29 /user/sqoop/EMP/_SUCCESS
-rw-r--r-- 3 sqoop admingroup 1525 2018-04-20 09:29 /user/sqoop/EMP/part-m-00000.avro
Sequence: -as-sequencefile
Text: -as-textfile
Import Using Options file
All the options can be specified into an options file and then can be re-used for ease and management
Create Options file with content as below
--connect
jdbc:oracle:thin:@192.168.1.71:6633:EMPRD
--username
SCOTT
--password
TIGER
-m
1
-as-avrodatafile
Import using Options file
sqoop import-all-tables -Dmapreduce.job.user.classpath.first=true \
--options-file /tmp/sqoop_import.txt
If you have followed my last blog, you have your sqoop installation ready.
Step 1 - Download Jar
Download ojdbc8.jar from below (This is for release 12c, so use drivers jar as per your release)
http://www.oracle.com/technetwork/database/features/jdbc/jdbc-ucp-122-3110062.html
Step 2 - Copy Jar to lib
[root@oem13cr2 conf]# cd /usr/local/sqoop/lib
[root@oem13cr2 lib]# cp /tmp/ojdbc8.jar .
Provide read permissions to others if required
Step 3 - Create directory for import [As hdfs on nn]
hdfs dfs -mkdir /sqoop
hdfs dfs -chown sqoop:admingroup /sqoop
Step 4 - Import
sqoop-import --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--table SCOTT.EMP \
--fields-terminated-by '\t' --lines-terminated-by '\n' \
--username SCOTT --password TIGER \
--target-dir /sqoop/scott
18/04/20 05:37:51 INFO mapreduce.Job: map 75% reduce 0%
18/04/20 05:37:56 INFO mapreduce.Job: map 100% reduce 0%
18/04/20 05:37:57 INFO mapreduce.Job: Job job_1524137551614_0005 completed successfully
18/04/20 05:37:57 INFO mapreduce.Job: Counters: 31
File Output Format Counters
Bytes Written=817
18/04/20 05:37:57 INFO mapreduce.ImportJobBase: Transferred 817 bytes in 50.9334 seconds (16.0406 bytes/sec)
18/04/20 05:37:57 INFO mapreduce.ImportJobBase: Retrieved 14 records.
hdfs dfs -ls /sqoop/scott/
Found 5 items
-rw-r--r-- 3 sqoop admingroup 0 2018-04-20 05:37 /sqoop/scott/_SUCCESS
-rw-r--r-- 3 sqoop admingroup 115 2018-04-20 05:37 /sqoop/scott/part-m-00000
-rw-r--r-- 3 sqoop admingroup 117 2018-04-20 05:37 /sqoop/scott/part-m-00001
-rw-r--r-- 3 sqoop admingroup 238 2018-04-20 05:37 /sqoop/scott/part-m-00002
-rw-r--r-- 3 sqoop admingroup 347 2018-04-20 05:37 /sqoop/scott/part-m-00003
Import All Tables Sqoop
sqoop import-all-tables --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--username SCOTT --password TIGER -m 1 -as-avrodatafile
hdfs dfs -ls /user/sqoop
Found 6 items
drwx------ - sqoop admingroup 0 2018-04-20 05:35 /user/sqoop/.Trash
drwx------ - sqoop admingroup 0 2018-04-20 06:01 /user/sqoop/.staging
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:00 /user/sqoop/BONUS
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:00 /user/sqoop/DEPT
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:00 /user/sqoop/EMP
drwxr-xr-x - sqoop admingroup 0 2018-04-20 06:01 /user/sqoop/SALGRADE
How does it all happen
1. Sqoop is responsible for connecting to RDBMS and fetches metadata of the table.
2. It generates a java class , compiles and connects to hadoop cluster (Resource Manager) and then submits an import job.
3. MR job does the task and data is imported to HDFS directories
Sqoop only does the monitoring work and oversess the completion
Import into specific datafile type
Avro : -as-avrodatafile
sqoop import-all-tables -Dmapreduce.job.user.classpath.first=true --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--username SCOTT --password TIGER -m 1 -as-avrodatafile
Data is now imported as avro format
[hdfs@nn ~]$ hdfs dfs -ls /user/sqoop/EMP
Found 2 items
-rw-r--r-- 3 sqoop admingroup 0 2018-04-20 09:29 /user/sqoop/EMP/_SUCCESS
-rw-r--r-- 3 sqoop admingroup 1525 2018-04-20 09:29 /user/sqoop/EMP/part-m-00000.avro
Sequence: -as-sequencefile
Text: -as-textfile
Import Using Options file
All the options can be specified into an options file and then can be re-used for ease and management
Create Options file with content as below
--connect
jdbc:oracle:thin:@192.168.1.71:6633:EMPRD
--username
SCOTT
--password
TIGER
-m
1
-as-avrodatafile
Import using Options file
sqoop import-all-tables -Dmapreduce.job.user.classpath.first=true \
--options-file /tmp/sqoop_import.txt
Hi do we need to give split-by for ojdbc8?
ReplyDelete