This is a quick tutorial on installing Jupyter and setting up the PySpark and the R kernel (IRkernel) for Spark development. The pre-reqs ..
It's been over a month since IBM released version 4.2 of their Hadoop distribution (BigInsights), so I decided to do a quick wirte up on t..
It's been a couple of weeks since I got accepted in the closed beta testing programme for IBM Data Science Experience (DSX), and it is abo..
Here is a quick recording about Apache SystemML - the declarative large-scale machine learning platform.
If you feel like installing an..
Apache SystemML is a declarative, large-scale machine learning platform that provides automatic optimisation for custom machine learning a..
This is a brief note on doing some rough estimates for sizing Hadoop worker nodes.
This post is in no way exhaustive and there is much..
The Curse of Dimensionality, a term initially introduced by Richard Bellman, is a phenomena that arises when applying machine learning alg..
I recently needed to generate some data for as a function of , with some added Gaussian noise. This comes in handy when you wan..
In Part I we looked at Apache Hive, Cloudera Impala, and IBM Big SQL. We now continue with Pivotal HAWQ, Apache Drill, Spark SQL, Presto, ..