This is a quick tutorial on installing Jupyter and setting up the PySpark and the R kernel (IRkernel) for Spark development. The pre-reqs ..
It's been over a month since IBM released version 4.2 of their Hadoop distribution (BigInsights), so I decided to do a quick wirte up on t..
Here is a quick recording about Apache SystemML - the declarative large-scale machine learning platform.
If you feel like installing an..
Apache SystemML is a declarative, large-scale machine learning platform that provides automatic optimisation for custom machine learning a..
This is a brief note on doing some rough estimates for sizing Hadoop worker nodes.
This post is in no way exhaustive and there is much..
In Part I we looked at Apache Hive, Cloudera Impala, and IBM Big SQL. We now continue with Pivotal HAWQ, Apache Drill, Spark SQL, Presto, ..
Apache Zeppelin is a multipurpose notebook for interactive data analytics. It supports Scala and Python (with Spark), SparkSQL, Hive, Shel..
Offloading relational data to HDFS is a common use-case for Hadoop known as "online archive". This approach allows companies to solve a qu..
We can attribute the first attempt at getting SQL functionality on top of Hadoop to Apache Hive. Initially developed at Facebook, this ope..