Advertisement

Monday, April 23, 2018

Hadoop V2 - Sqoop Import from Oracle Database 12c

In this blog I will demonstrate how to import data using sqoop from Oracle to HDFS

If you have followed my last blog, you have your sqoop installation ready.


Step 1 - Download Jar
Download ojdbc8.jar from below (This is for release 12c, so use drivers jar as per your release)

http://www.oracle.com/technetwork/database/features/jdbc/jdbc-ucp-122-3110062.html

Step 2 - Copy Jar to lib
[root@oem13cr2 conf]# cd /usr/local/sqoop/lib
[root@oem13cr2 lib]# cp /tmp/ojdbc8.jar .

Provide read permissions to others if required

Step 3 - Create directory for import [As hdfs on nn]
hdfs dfs -mkdir /sqoop
hdfs dfs -chown sqoop:admingroup /sqoop


Step 4 - Import

sqoop-import --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--table SCOTT.EMP \
--fields-terminated-by '\t' --lines-terminated-by '\n' \
--username SCOTT --password TIGER \
--target-dir /sqoop/scott       


18/04/20 05:37:51 INFO mapreduce.Job:  map 75% reduce 0%
18/04/20 05:37:56 INFO mapreduce.Job:  map 100% reduce 0%
18/04/20 05:37:57 INFO mapreduce.Job: Job job_1524137551614_0005 completed successfully
18/04/20 05:37:57 INFO mapreduce.Job: Counters: 31

        File Output Format Counters
                Bytes Written=817
18/04/20 05:37:57 INFO mapreduce.ImportJobBase: Transferred 817 bytes in 50.9334 seconds (16.0406 bytes/sec)
18/04/20 05:37:57 INFO mapreduce.ImportJobBase: Retrieved 14 records.

 hdfs dfs -ls /sqoop/scott/
Found 5 items
-rw-r--r--   3 sqoop admingroup          0 2018-04-20 05:37 /sqoop/scott/_SUCCESS
-rw-r--r--   3 sqoop admingroup        115 2018-04-20 05:37 /sqoop/scott/part-m-00000
-rw-r--r--   3 sqoop admingroup        117 2018-04-20 05:37 /sqoop/scott/part-m-00001
-rw-r--r--   3 sqoop admingroup        238 2018-04-20 05:37 /sqoop/scott/part-m-00002
-rw-r--r--   3 sqoop admingroup        347 2018-04-20 05:37 /sqoop/scott/part-m-00003


Import All Tables Sqoop

sqoop import-all-tables --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--username SCOTT --password TIGER -m 1 -as-avrodatafile



hdfs dfs -ls /user/sqoop
Found 6 items
drwx------   - sqoop admingroup          0 2018-04-20 05:35 /user/sqoop/.Trash
drwx------   - sqoop admingroup          0 2018-04-20 06:01 /user/sqoop/.staging
drwxr-xr-x   - sqoop admingroup          0 2018-04-20 06:00 /user/sqoop/BONUS
drwxr-xr-x   - sqoop admingroup          0 2018-04-20 06:00 /user/sqoop/DEPT
drwxr-xr-x   - sqoop admingroup          0 2018-04-20 06:00 /user/sqoop/EMP
drwxr-xr-x   - sqoop admingroup          0 2018-04-20 06:01 /user/sqoop/SALGRADE



How does it all happen
1. Sqoop is responsible for connecting to RDBMS and fetches metadata of the table.
2. It generates a java class , compiles and connects to hadoop cluster (Resource Manager) and then submits an import job.
3. MR job does the task and data is imported to HDFS directories

Sqoop only does the monitoring work and oversess the completion

Import into specific datafile type
Avro : -as-avrodatafile
sqoop  import-all-tables -Dmapreduce.job.user.classpath.first=true --connect jdbc:oracle:thin:@192.168.1.71:6633:EMPRD \
--username SCOTT --password TIGER -m 1 -as-avrodatafile


Data is now imported as avro format

[hdfs@nn ~]$ hdfs dfs -ls /user/sqoop/EMP
Found 2 items
-rw-r--r--   3 sqoop admingroup          0 2018-04-20 09:29 /user/sqoop/EMP/_SUCCESS
-rw-r--r--   3 sqoop admingroup       1525 2018-04-20 09:29 /user/sqoop/EMP/part-m-00000.avro

Sequence: -as-sequencefile
Text: -as-textfile


Import Using Options file
All the options can be specified into an options file and then can be re-used for ease and management

Create Options file with content as below
--connect
jdbc:oracle:thin:@192.168.1.71:6633:EMPRD
--username
SCOTT
--password
TIGER
-m
1
-as-avrodatafile


Import using Options file

sqoop  import-all-tables -Dmapreduce.job.user.classpath.first=true \
--options-file /tmp/sqoop_import.txt

1 comment:
Write comments