Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. Sqoop Helps in efficiently transferring bulk data between Hadoop and the Oracle database.
Prerequisites :
1.Oracle Database 10g express edition should be installed.
2.Oracle connector(ojdbc6_g.jar) ,The jar file can be downloaded from http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html
To import data from oracle database to HDFS through Sqoop follow these steps :
Step 1: Create a table in the oracle database and Load data into the table .
For Example :
create table monthlypassengers (Activity_Period varchar(50),Operating_Airline varchar(50),Operating_Airline_IATA_Code varchar(50),Published_Airline varchar(50),Published_Airline_IATA_Code varchar(50),GEO_Summary varchar(50),GEO_Region varchar(50),Activity_Type_Code varchar(50),Cargo_Type_Code varchar(50),Cargo_Aircraft_Type varchar(50),Cargo_Weight_LBS varchar(50),Cargo_Metric_TONS varchar(50));
The data set used in the example is:
Activity Period | Operating Airline | Operating Airline IATA Code | Published Airline | Published Airline IATA Code | GEO Summary | GEO Region | Activity Type Code | Cargo Type Code | Cargo Aircraft Type | Cargo Weight LBS | Cargo Metric TONS |
200507 | ABX Air | GB | ABX Air | GB | Domestic | US | Deplaned | Cargo | Freighter | 45423 | 20.60387 |
200507 | ABX Air | GB | ABX Air | GB | Domestic | US | Enplaned | Cargo | Freighter | 106869 | 48.47578 |
200507 | ATA Airlines | TZ | ATA Airlines | TZ | Domestic | US | Deplaned | Cargo | Passenger | 55427 | 25.14169 |
200507 | ATA Airlines | TZ | ATA Airlines | TZ | Domestic | US | Deplaned | Passenger | 50278 | 22.8061 | |
200507 | ATA Airlines | TZ | ATA Airlines | TZ | Domestic | US | Enplaned | Cargo | Passenger | 74183 | 33.64941 |
Step 2 : Import the data of the table emp present in Oracle database to HDFS. Note that the oracle connector shoud be present in the sqoop directory and the command should be executed from the sqoop library.
Syntax :
/usr/bin/sqoop import --connect jdbc:oracle:thin:system/system@<IP address>:1521:xe --username <username> -P--table <database name>.<table name> --columns "<column names>" --target-dir <target directory path> -m 1
Example :
[cloudera@localhost sqoop]$ /usr/bin/sqoop import --connect jdbc:oracle:thin:system/system@192.168.197.1:1521:xe --username system -P --table system.monthlypassengers --columns "Activity_Period,Operating_Airline,Operating_Airline_IATA_Code,Published_Airline,Published_Airline_IATA_Code,GEO_Summary,GEO_Region,Activity_Type_Code,Cargo_Type_Code,Cargo_Aircraft_Type,Cargo_Weight_LBS,Cargo_Metric_TONS"--target-dir /user/cloudera/sqoop_out -m 1