HCatalog Load Operation

posted on Nov 20th, 2016

Apache HCatalog

Apache HCatalog is a table management layer that exposes Hive metadata to other Hadoop applications. HCatalog's table abstraction presents users with a relational view of data in the Hadoop Distributed File System (HDFS) and ensures that users need not worry about where or in what format their data is stored. HCatalog displays data from RCFile format, text files, or sequence files in a tabular view. It also provides REST APIs so that external systems can access these table's metadata.

HCatalog is built on top of the Hive metastore and incorporates components from the Hive DDL. HCatalog provides read and write interfaces for Pig and MapReduce and uses Hive's command line interface for issuing data definition and metadata exploration commands. It also presents a REST interface to allow external tools access to Hive DDL (Data Definition Language) operations, such as "create table" and "describe table".

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system installed.

2) Apache Hive 2.1.0 Pre Installed (How to Install Hive on Ubuntu 14.04)

3) Apache HCatalog merged with Hive (in March of 2013) HCatalog is now released as part of Hive. Here we are using latest version of HCatalog merged with Hive. (How to Install Hcatalog on Ubuntu 14.04)

HCatalog Load Operation

Generally, after creating a table in SQL, we can insert data using the Insert statement. But in HCatalog, we insert data using the LOAD DATA statement. While inserting data into HCatalog, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system.

Step 1 - Open a new terminal (CTRL + ALT + T) and Change the directory to /usr/local/hive/hcatalog/bin

$ cd $HCAT_HOME/bin

Step 2 - Creating a new employee table

$ ./hcat -e "CREATE TABLE IF NOT EXISTS employee( eid int, name String, salary String, destination String) \
COMMENT 'Employee details' \
ROW FORMAT DEIMITED \
FIELDS TERMINATED BY ' ' \
LINES TERMINATED BY '\n' \
STORED AS TEXTFILE;"

Step 3 - Check whether it is created or not. It shows all the tables that are present.

$ ./hcat -e "show tables;"

Step 4 - Create a new sample.txt file to load into employee table.

$ gedit sample.txt

Add the following lines to sample.txt save and close.

sample.txt

1201 Gopal 45000 Technicalmanager
1202 Manisha 45000 Proofreader
1203 Masthanvali 40000 Technicalwriter
1204 Kiran 40000 HrAdmin
1205 Kranthi 30000 OpAdmin

Step 5 - Execute the load operation. In my case the sample.txt file is saved in /home/hduser/Desktop/HCATALOG/ folder.

$ ./hcat -e "LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HCATALOG/sample.txt' OVERWRITE INTO TABLE employee;"

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : HCatalog Installation on Ubuntu   HCatalog Command Line Interface (CLI) Usage   HCatalog Creating Table   HCatalog Script   HCatalog Alter Table   HCatalog Drop Table   HCatalog Creating View and Index