Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. Sqoop Helps in efficiently transferring bulk data between Hadoop and the Oracle database.

Prerequisites : 

1.Oracle Database 10g express edition should be installed.

2.Oracle connector(ojdbc6_g.jar) ,The jar file can be downloaded from http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html

To import data from oracle database to HDFS through Sqoop follow these steps :

Step 1: Create a table in the oracle database and Load data into the table .

For Example :

create table monthlypassengers (Activity_Period varchar(50),Operating_Airline varchar(50),Operating_Airline_IATA_Code varchar(50),Published_Airline varchar(50),Published_Airline_IATA_Code varchar(50),GEO_Summary varchar(50),GEO_Region varchar(50),Activity_Type_Code varchar(50),Cargo_Type_Code varchar(50),Cargo_Aircraft_Type varchar(50),Cargo_Weight_LBS varchar(50),Cargo_Metric_TONS varchar(50));

The data set used in the example is:

Activity

 Period    

Operating    AirlineOperating Airline IATA CodePublished AirlinePublished Airline IATA CodeGEO SummaryGEO RegionActivity Type CodeCargo Type CodeCargo Aircraft TypeCargo Weight LBS

Cargo Metric TONS

200507

ABX AirGBABX AirGBDomesticUSDeplanedCargoFreighter4542320.60387

200507

ABX AirGBABX AirGBDomesticUSEnplanedCargoFreighter10686948.47578

200507

ATA AirlinesTZATA AirlinesTZDomesticUSDeplanedCargoPassenger55427

25.14169

200507

ATA AirlinesTZATA AirlinesTZDomesticUSDeplanedMailPassenger50278

22.8061

200507    ATA        Airlines      TZ     ATA      Airlines     TZ Domestic   USEnplanedCargoPassenger7418333.64941

 

Step 2 : Import the data of the table emp present in Oracle database to HDFS. Note that the oracle connector shoud be present in the sqoop directory and the command should be executed from the sqoop library.

Syntax :

/usr/bin/sqoop import --connect jdbc:oracle:thin:system/system@<IP address>:1521:xe  --username <username> -P--table <database name>.<table name> --columns "<column names>" --target-dir  <target directory path> -m 1

Example :

[cloudera@localhost sqoop]$ /usr/bin/sqoop import --connect jdbc:oracle:thin:system/system@192.168.197.1:1521:xe 
--username system -P --table system.monthlypassengers --columns "Activity_Period,Operating_Airline,Operating_Airline_IATA_Code,Published_Airline,Published_Airline_IATA_Code,GEO_Summary,GEO_Region,Activity_Type_Code,Cargo_Type_Code,Cargo_Aircraft_Type,Cargo_Weight_LBS,Cargo_Metric_TONS"--target-dir /user/cloudera/sqoop_out -m 1