TECHNOLOGY BLOGS


JOIN OUR COMMUNITY NOW! AND STAY CONNECTED!

Technology Tips

TECHNOLOGY TIPS

Integrate HIVE with HBase and Query using IMPALA

HBase tables can be integrated with HIVE, so that querying can be done using IMPALA. IMPALA queries are pretty fast and as easy as any standard SQL queries. We shall load transactional data into HBase table integrated with HIVE using ImportTSV method, and then query the corresponding HIVE table...

Read More
Technology Tips

TECHNOLOGY TIPS

How to Configure Replication Factor and Block Size for HDFS?

Hadoop Distributed File System (HDFS) stores files as data blocks and distributes these blocks across the entire cluster. As HDFS was designed to be fault-tolerant and to run on commodity hardware, blocks are replicated a number of times to ensure high data availability. The replication factor is a property...

Read More
Loading Customer Data

LOADING CUSTOMER DATA

Loading Customer Data into HBase using a PIG script

There are different ways to load data into HBase tables like: ‘put’ to manually load data records into HBase, ImportTSV and bulk load options. Alternatively, lets try to load huge customer data file into HBase using Apache PIG. The data set has the following fields:

Read More
Hadoop Framework

HADOOP FRAMEWORK

Using MRUnit to Develop and Test MapReduce Jobs

Conceptually, MapReduce jobs are relatively simple. In the map phase, each input record has a function applied to it, resulting in one or more key-value pairs. The reduce phase receives a group of the key-value pairs and performs some function over that group. Testing mappers and reducers should be...

Read More
Software Development

SOFTWARE DEVELOPMENT

Computing Moving-Average of Stocks in Hadoop HIVE

General Sense of Moving Average: Moving Average is a widely used indicator in technical analysis that helps smooth out price action by filtering out the “noise” from random price fluctuations. A moving average (MA) is a trend-following or lagging indicator because it is based on past prices.

Read More
Company News

COMPANY NEWS

Princeton IT Service is hosting NJ Hadoop – Apache Storm meetup.

Storm is a distributed and high-performance real-timecomputation system used by Twitter, Yahoo, Spotify, WebMD. Storm is a top level Apache Project which brings brand, governance and large community of the Apache Software Foundation. Storm scales linearly, fault-tolerant, provides areliable processing semantics, and is language agnostic (e.g. Java, Ruby, Python,...

Read More