There are different ways to load data into HBase tables like: ‘put’ to manually load data records into HBase, ImportTSV and bulk load options. Alternatively, lets try to load huge customer data file into HBase using Apache PIG. The data set has the following fields:
Conceptually, MapReduce jobs are relatively simple. In the map phase, each input record has a function applied to it, resulting in one or more key-value pairs. The reduce phase receives a group of the key-value pairs and performs some function over that group. Testing mappers and reducers should be as easy as testing any other […]
Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a lot of ways. Now let us start by using the FileSystem API to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local […]
# Open a terminal window to the current working directory. # /home/training # 1. Print the Hadoop version hadoop version # 2. List the contents of the root directory in HDFS # hadoop fs -ls / # 3. Report the amount of space used and # available on currently mounted filesystem # hadoop fs -df […]
General Sense of Moving Average: Moving Average is a widely used indicator in technical analysis that helps smooth out price action by filtering out the “noise” from random price fluctuations. A moving average (MA) is a trend-following or lagging indicator because it is based on past prices.
Hadoop fsimage is an “Image” file and its contents cannot be read easily using normal unix file system tools like cat, more etc. At times, it is very important to read the clear text version of the fsimage which holds the meta data of the file system. You can perform NameSpace Analysis, find out health […]
MapReduce has a feature known as Hadoop Streaming that gives the flexibility to write code in your favorite language other than Java. You can use Ruby, Perl, Python or even quickly write a MapReduce job using shell script.
The following Python code mapper.py will take the text file as input and tokenize it to create a set of <key, value> pairs. The key will be a number reflecting the no. of characters in each word, and the value will be the word itself.