What is HIVE – Learn Hadoop Online

In continuation of our blog to learn Hadoop after the few blogs on Introduction to Big Data Hadoop Online Training  and  Learn Hadoop Ecosystem Basics  and Learn  Spark a Hadoop Ecosystem component through our Blog post.

This Hadoop Online training tutorial is the Apache HIVE Tutorial for Beginners

Big Data Hadoop Tutorial - What is HIVE

After this training you would be able to answer following question: What are the features of Apache HIVE? Why use Apache HIVE? Uses of Apache HIVE? Use case which can be considered for Apache HIVE?

Brief History about Apache HIVE

Facebook in 2007 started making APACHE HIVE component, when their Data started growing from a few GB to few TB of data per day which included text, Images, Videos and may other formats. Traditional database were not able to handle this volume of data. They were able to process huge datasets using Hadoop Storage and MapReduce processes in parallel. However it faced the challenge to convert basic SQL data into MapReduce for processing, hence FB analyzed this problem and came up with HIVE which provided SQL like interface which will get converted into MapReduce Job. Facebook handed HIVE component to Apache as an open source

Note – Yahoo developed PIG to solve the same issue of data growing exponentially

Using HIVE Query Language we can create tables, Database, Read data and Partitions and buckets to restructure the database. It has a Schema Flexibility, JDBC/ODBC drivers are available in HIVE to read the data. Table creation in HIVE is quite easy. Storing and Processing data is very easy with Hive Query Language (HQL).

Advantages of HIVE.

Dataware package built on top of Hadoop. All the dataware house functions to create database, tables, views etc. can be performed in HIVE. Mainly used for Data analysis, for Business analyst by using SQL Expertise. Targeted towards SQL Experts. Can be used without knowing Java or Hadoop APIs.

Limitations of HIVE

ONLY used for managing and querying structured data. Not designed for online transaction processing (doesn’t provide selective insert / update). Does not offer real time queries and row level updates. Latency for Hive Query is very high.

Advantages of HQL

Filter data using Where clause. Partitioning supported to speed up process of reading and consuming data (create, drop, alter). Ability to store results of one query into another table. Ability to store results of one query into HDFS Directory

Difference between HIVE and PIG (you may ask if PIG is there , Why use HIVE?)

HIVE used by Analysts generating daily reports PIG preferred by Programmers and Researchers
SQL Query like Language PIG Latin procedural Language
Supports Partitioning for better processing of data No Partitioning Support
support  Limited JDBC/ODBC No support for JDBC/ODBC
Web Interface Supported Web Interface NOT Supported
Shells / Streaming / Java Supported Shells / Streaming / Java Supported

Click to Join Hadoop Online Training Now.  Learn Hadoop Online from the best training consultant.