In Part I we looked at Apache Hive, Cloudera Impala, and IBM Big SQL. We now continue with Pivotal HAWQ, Apache Drill, Spark SQL, Presto, ..
Apache Zeppelin is a multipurpose notebook for interactive data analytics. It supports Scala and Python (with Spark), SparkSQL, Hive, Shel..
Offloading relational data to HDFS is a common use-case for Hadoop known as "online archive". This approach allows companies to solve a qu..
We can attribute the first attempt at getting SQL functionality on top of Hadoop to Apache Hive. Initially developed at Facebook, this ope..
Apache Flume is a scalable, high-volume data ingestion system that allows users to load streaming data into HDFS. Typical use cases for Fl..
In the previous post I wrote about the importance of the Open Data Platform initiative. In this tutorial I will go over the steps of insta..
"Can we put it in Hadoop?" is a question I get asked more and more often as customers start to recognize "big data" as something that migh..
Apache Sqoop is a bulk data transferring tool that can link traditional relational databases (like Oracle Database) and Apache Hadoop (HDF..