Technology Tips


How to Configure Replication Factor and Block Size for HDFS?

Hadoop Distributed File System (HDFS) stores files as data blocks and distributes these blocks across the entire cluster. As HDFS was designed to be fault-tolerant and to run on commodity hardware, blocks are replicated a number of times to ensure high data availability. The replication factor is a property...

Read More
Loading Customer Data


Loading Customer Data into HBase using a PIG script

There are different ways to load data into HBase tables like: ‘put’ to manually load data records into HBase, ImportTSV and bulk load options. Alternatively, lets try to load huge customer data file into HBase using Apache PIG. The data set has the following fields:

Read More