This is a quick tutorial on installing Jupyter and setting up the PySpark and the R kernel (IRkernel) for Spark development. The pre-reqs ..
It's been over a month since IBM released version 4.2 of their Hadoop distribution (BigInsights), so I decided to do a quick wirte up on t..
Apache Zeppelin is a multipurpose notebook for interactive data analytics. It supports Scala and Python (with Spark), SparkSQL, Hive, Shel..
Offloading relational data to HDFS is a common use-case for Hadoop known as "online archive". This approach allows companies to solve a qu..
Apache Flume is a scalable, high-volume data ingestion system that allows users to load streaming data into HDFS. Typical use cases for Fl..
In the previous post I wrote about the importance of the Open Data Platform initiative. In this tutorial I will go over the steps of insta..
"Can we put it in Hadoop?" is a question I get asked more and more often as customers start to recognize "big data" as something that migh..