How to Configure Replication Factor and Block Size for HDFS?

By Ravi Karamsetty | September 1, 2014 |

Hadoop Distributed File System (HDFS) stores files as data blocks and distributes these blocks across the entire cluster. As HDFS was designed to be fault-tolerant and to run on commodity hardware, blocks are replicated a number of times to ensure high data availability. The replication factor is a property that can be set in the … Continue reading “How to Configure Replication Factor and Block Size for HDFS?”

Read More

Loading Customer Data into HBase using a PIG script

By Ravi Karamsetty | August 29, 2014 |

There are different ways to load data into HBase tables like: ‘put’ to manually load data records into HBase, ImportTSV and bulk load options. Alternatively, lets try to load huge customer data file into HBase using Apache PIG. The data set has the following fields:

Read More

Using FileSystem API to read and write data to HDFS

By Ravi Karamsetty | August 27, 2014 |

Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a lot of ways. Now let us start by using the FileSystem API to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local … Continue reading “Using FileSystem API to read and write data to HDFS”

Read More

33 Frequently Used HDFS Shell Commands

By Ravi Karamsetty | August 26, 2014 |

# Open a terminal window to the current working directory. # /home/training # 1. Print the Hadoop version hadoop version # 2. List the contents of the root directory in HDFS # hadoop fs -ls / # 3. Report the amount of space used and # available on currently mounted filesystem # hadoop fs -df … Continue reading “33 Frequently Used HDFS Shell Commands”

Read More