HCatalog Create View and Index

posted on Nov 20th, 2016

HCatalog

Apache HCatalog is a table management layer that exposes Hive metadata to other Hadoop applications. HCatalog's table abstraction presents users with a relational view of data in the Hadoop Distributed File System (HDFS) and ensures that users need not worry about where or in what format their data is stored. HCatalog displays data from RCFile format, text files, or sequence files in a tabular view. It also provides REST APIs so that external systems can access these table's metadata.

HCatalog is built on top of the Hive metastore and incorporates components from the Hive DDL. HCatalog provides read and write interfaces for Pig and MapReduce and uses Hive's command line interface for issuing data definition and metadata exploration commands. It also presents a REST interface to allow external tools access to Hive DDL (Data Definition Language) operations, such as "create table" and "describe table".

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system installed.

2) Apache Hive 2.1.0 Pre Installed (How to Install Hive on Ubuntu 14.04)

3) Apache HCatalog merged with Hive (in March of 2013) HCatalog is now released as part of Hive. Here we are using latest version of HCatalog merged with Hive. (How to Install Hcatalog on Ubuntu 14.04)

Creating View and Indexes

This post describes how to create, manage a view and indexes in HCatalog. Database views are created using the CREATE VIEW statement. Views can be created from a single table, multiple tables, or another view. To create a view, a user must have appropriate system privileges according to the specific implementation.

Step 1 - Open a new terminal (CTRL + ALT + T) and Change the directory to /usr/local/hive/hcatalog/bin

$ cd $HCAT_HOME/bin

Step 2 - Creating a new employee table

$ ./hcat -e "CREATE TABLE IF NOT EXISTS employee( eid int, name String, salary String, destination String) \
COMMENT 'Employee details' \
ROW FORMAT DEIMITED \
FIELDS TERMINATED BY ' ' \
LINES TERMINATED BY '\n' \
STORED AS TEXTFILE;"

Step 3 - Check whether it is created or not. It shows all the tables that are present.

$ ./hcat -e "show tables;"

Step 4 - Create a new sample.txt file to load into employee table.

$ gedit sample.txt

Add the following lines to sample.txt save and close.

sample.txt

1201 Gopal 45000 Technicalmanager
1202 Manisha 45000 Proofreader
1203 Masthanvali 40000 Technicalwriter
1204 Kiran 40000 HrAdmin
1205 Kranthi 30000 OpAdmin

Step 5 - Execute the load operation. In my case the sample.txt file is saved in /home/hduser/Desktop/HCATALOG/ folder.

$ ./hcat -e "LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HCATALOG/sample.txt' OVERWRITE INTO TABLE employee;"

Step 6 - Create View

$ ./hcat -e "CREATE VIEW Emp_Deg_View (salary COMMENT 'salary more than
35,000')AS SELECT id, name, salary, designation FROM employee WHERE salary >=35000;"

Step 7 - Drop View

$ ./hcat -e "DROP VIEW Emp_Deg_View";

Creating View

An Index is nothing but a pointer on a particular column of a table. Creating an index means creating a pointer on a particular column of a table.

Step 8 - Create Index

$ ./hcat -e "CREATE INDEX index_salary ON TABLE employee(salary) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';"

Step 10 - Drop Index

$ ./hcat -e "DROP INDEX index_salary ON employee;"

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : HCatalog Installation on Ubuntu   HCatalog Command Line Interface (CLI) Usage   HCatalog Creating Table   HCatalog Script   HCatalog Load Operation   HCatalog Alter Table   HCatalog Drop Table