Apache Zeppelin is a multipurpose notebook for interactive data analytics. It supports Scala and Python (with Spark), SparkSQL, Hive, Shel..
Offloading relational data to HDFS is a common use-case for Hadoop known as "online archive". This approach allows companies to solve a qu..
We can attribute the first attempt at getting SQL functionality on top of Hadoop to Apache Hive. Initially developed at Facebook, this ope..
Apache Flume is a scalable, high-volume data ingestion system that allows users to load streaming data into HDFS. Typical use cases for Fl..
In the previous post I wrote about the importance of the Open Data Platform initiative. In this tutorial I will go over the steps of insta..
"Can we put it in Hadoop?" is a question I get asked more and more often as customers start to recognize "big data" as something that migh..
Apache Sqoop is a bulk data transferring tool that can link traditional relational databases (like Oracle Database) and Apache Hadoop (HDF..
Several weeks ago I had to compare three machine learning algorithm implementations and decide if one of them performed significantly bett..
If you are planning to run Hadoop on a 64-bit OS you might want to compile it from source instead of using the pre-built 32-bit i386-Linux..