HBase tables can be integrated with HIVE, so that querying can be done using IMPALA. IMPALA queries are pretty fast and as easy as any standard SQL queries. We shall load transactional data into HBase table integrated with HIVE using ImportTSV method, and then query the corresponding HIVE table from IMPALA.
The transactional data set has the following fields:
Step 1: Create a table in HIVE and map it to HBase using org.apache.hadoop.hive.hbase.HBaseStorageHandler property.Note that the name of the HIVE table is different from that of HBase table for convenience. Here ‘transactions’ is a HIVE table and is mapped with the HBase table ‘transactions_hbase’.
CREATE TABLE transactions(
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
Step 2: Verify whether the table(s) is created in HIVE and HBase from their corresponding shells.
Step 3: Load the transactions data from HDFS into the HBase table created i.e transactions_hbase, using ImportTSV method. You may also load this data into HBase table using a pig script. To load using PIG script refer to this post. Note that the data gets automatically loaded into HIVE table which was integrated with this HBase table.
Step 5: Now lets try to access this HIVE table (transactions) from IMAPLA shell. Note that invalidate metadata command is needed to refresh the metadata and to make sure all the available HIVE tables reflect in IMPALA.