Loading Customer Data into HBase using a PIG script

There are different ways to load data into HBase tables like: ‘put’ to manually load data records into HBase, ImportTSV and bulk load options. Alternatively, lets try to load huge customer data file into HBase using Apache PIG. The data set has the following fields:

Read More


Using MRUnit to Develop and Test MapReduce Jobs

Conceptually, MapReduce jobs are relatively simple. In the map phase, each input record has a function applied to it, resulting in one or more key-value pairs. The reduce phase receives a group of the key-value pairs and performs some function over that group. Testing mappers and reducers should be...

Read More


Computing Moving-Average of Stocks in Hadoop HIVE

General Sense of Moving Average: Moving Average is a widely used indicator in technical analysis that helps smooth out price action by filtering out the “noise” from random price fluctuations. A moving average (MA) is a trend-following or lagging indicator because it is based on past prices.

Read More


Princeton IT Service is hosting NJ Hadoop – Apache Storm meetup.

Storm is a distributed and high-performance real-timecomputation system used by Twitter, Yahoo, Spotify, WebMD. Storm is a top level Apache Project which brings brand, governance and large community of the Apache Software Foundation. Storm scales linearly, fault-tolerant, provides areliable processing semantics, and is language agnostic (e.g. Java, Ruby, Python,...

Read More