Installing Apache Zeppelin 0.5.5

Installing Apache Zeppelin 0.5.5


Apache Zeppelin is a multipurpose notebook for interactive data analytics. It supports Scala and Python (with Spark), SparkSQL, Hive, Shell, and Markdown interpreters.

In this tutorial I'll quickly show you how to install Apache Zeppelin in the IBM BigInsights Hadoop distribution (using the QSE VMware Image).

Power on your BigInsights VM and make sure the relevant services are up and running (e.g. HDFS, YARN, MapReduce, Spark etc.)

BigInsights Ambari Dashboard

Log in as root using and create a dedicated account for running Zeppelin.

[root@rvm ~]# useradd zeppelin -g hadoop
[root@rvm ~]# passwd zeppelin
Changing password for user zeppelin.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@rvm ~]#

Download Apache Zeppelin in the newly created account's home directory.

[root@rvm ~]# su - zeppelin
[zeppelin@rvm ~]$ wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/incubator/zeppelin/0.5.5-incubating/zeppelin-0.5.5-incubating-bin-all.tgz
--2015-12-10 21:50:42--  http://mirrors.ukfast.co.uk/sites/ftp.apache.org/incubator/zeppelin/0.5.5-incubating/zeppelin-0.5.5-incubating-bin-all.tgz
Resolving mirrors.ukfast.co.uk... 78.109.175.117
Connecting to mirrors.ukfast.co.uk|78.109.175.117|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 499990209 (477M) [application/x-gzip]
Saving to: “zeppelin-0.5.5-incubating-bin-all.tgz”

100%[===========================================================================================================================>] 499,990,209  630K/s   in 12m 52s

2015-12-10 22:03:34 (632 KB/s) - “zeppelin-0.5.5-incubating-bin-all.tgz” saved [499990209/499990209]

[zeppelin@rvm ~]$

Unpack the archive, placing Zeppelin in /home/zeppelin/zeppelin-0.5.5.

[zeppelin@rvm ~]$ tar xzf zeppelin-0.5.5-incubating-bin-all.tgz
[zeppelin@rvm ~]$ mv zeppelin-0.5.5-incubating-bin-all zeppelin-0.5.5
[zeppelin@rvm ~]$

Create a zeppelin-env.sh script based on the template provided in zeppelin-0.5.5/conf.

[zeppelin@rvm ~]$ cp /home/zeppelin/zeppelin-0.5.5/conf/zeppelin-env.sh.template /home/zeppelin/zeppelin-0.5.5/conf/zeppelin-env.sh
[zeppelin@rvm ~]$

Edit /home/zeppelin/zeppelin-0.5.5/conf/zeppelin-env.sh, adding the following entries to it:

export ZEPPELIN_PORT=8090
export SPARK_HOME=/usr/iop/4.1.0.0/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export PYSPARK_PYTHON=/usr/bin/python

This is what your zeppelin-env.sh should look like after the changes:

…
# export ZEPPELIN_NOTEBOOK_S3_USER      # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
# export ZEPPELIN_IDENT_STRING                  # A string representing this instance of zeppelin. $USER by default.
# export ZEPPELIN_NICENESS                      # The scheduling priority for daemons. Defaults to 0.
export ZEPPELIN_PORT=8090 # Change the default Zeppelin port as 8080 is used by Ambari 
…
#### Spark interpreter configuration ####
## Use provided spark installation ##
## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit
##
export SPARK_HOME=/usr/iop/4.1.0.0/spark        # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries
# export SPARK_SUBMIT_OPTIONS                   # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G".

## Use embedded spark binaries ##
## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries.
## however, it is not encouraged when you can define SPARK_HOME
##
# Options read in YARN client mode
export HADOOP_CONF_DIR=/etc/hadoop/conf         # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
# Pyspark (supported with Spark 1.2.1 and above)
# To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI
export PYSPARK_PYTHON=/usr/bin/python           # path to the python command. must be the same path on the driver(Zeppelin) and all workers.
# export PYTHONPATH
…

Start Zeppelin using the zeppelin-daemon.sh script.

[zeppelin@rvm ~]$ /home/zeppelin/zeppelin-0.5.5/bin/zeppelin-daemon.sh start
Log dir doesn't exist, create /home/zeppelin/zeppelin-0.5.5/logs
Pid dir doesn't exist, create /home/zeppelin/zeppelin-0.5.5/run
Zeppelin start                                             [  OK  ]
[zeppelin@rvm ~]$

Open a browser and use the IP address/hostname of the BigInsights VM with port 8090. You should see the home screen of Apache Zeppelin.

Apache Zeppelin Home Screen

Open the Zeppelin Tutorial notebook and run the "Load Data Into Table" example to confirm everything is working properly.

Apache Zeppelin Tutorial Notebook

Enjoy using Apache Zeppelin.