In continuation of our blog to learn Hadoop after the few blogs on Introduction to Big Data Hadoop Online Training and Learn Hadoop Ecosystem Basics and Learn Spark a Hadoop Ecosystem component through our Blog post.
This Hadoop Online training tutorial is the Apache HIVE Tutorial for Beginners
After this training you would be able to answer following question: What are the features of Apache HIVE? Why use Apache HIVE? Uses of Apache HIVE? Use case which can be considered for Apache HIVE?
Brief History about Apache HIVE
Facebook in 2007 started making APACHE HIVE component, when their Data started growing from a few GB to few TB of data per day which included text, Images, Videos and may other formats. Traditional database were not able to handle this volume of data. They were able to process huge datasets using Hadoop Storage and MapReduce processes in parallel. However it faced the challenge to convert basic SQL data into MapReduce for processing, hence FB analyzed this problem and came up with HIVE which provided SQL like interface which will get converted into MapReduce Job. Facebook handed HIVE component to Apache as an open source
Note – Yahoo developed PIG to solve the same issue of data growing exponentially
Using HIVE Query Language we can create tables, Database, Read data and Partitions and buckets to restructure the database. It has a Schema Flexibility, JDBC/ODBC drivers are available in HIVE to read the data. Table creation in HIVE is quite easy. Storing and Processing data is very easy with Hive Query Language (HQL).
Advantages of HIVE.
Dataware package built on top of Hadoop. All the dataware house functions to create database, tables, views etc. can be performed in HIVE. Mainly used for Data analysis, for Business analyst by using SQL Expertise. Targeted towards SQL Experts. Can be used without knowing Java or Hadoop APIs.
Limitations of HIVE
ONLY used for managing and querying structured data. Not designed for online transaction processing (doesn’t provide selective insert / update). Does not offer real time queries and row level updates. Latency for Hive Query is very high.
Advantages of HQL
Filter data using Where clause. Partitioning supported to speed up process of reading and consuming data (create, drop, alter). Ability to store results of one query into another table. Ability to store results of one query into HDFS Directory
Difference between HIVE and PIG (you may ask if PIG is there , Why use HIVE?)
|HIVE used by Analysts generating daily reports||PIG preferred by Programmers and Researchers|
|SQL Query like Language||PIG Latin procedural Language|
|Supports Partitioning for better processing of data||No Partitioning Support|
|support Limited JDBC/ODBC||No support for JDBC/ODBC|
|Web Interface Supported||Web Interface NOT Supported|
|Shells / Streaming / Java Supported||Shells / Streaming / Java Supported|
Click to Join Hadoop Online Training Now. Learn Hadoop Online from the best training consultant.