Installation of IBM Open Platform with Apache Hadoop (version 4.1)

Installation of IBM Open Platform with Apache Hadoop (version 4.1)


In a previous post I wrote about the importance of IBM Open Data Platform and I also wrote a brief HOWTO on installing BigInsights 4.0. Well, BigInsights 4.1 has been out for some time now and it is about time I put together a quick and dirty installation guide on 4.1.

Prerequisites

  • You will need a machine with RedHat Enterprise Linux Server 6 or 7. These two are the only supported operating systems for BigInsights 4.1 as you can confirm by looking at the detailed system requirements.
    The machine can be a VM, just make sure it has enough resources. According to IBM’s documentation you will need a minimum of 24 GB of RAM and a minimum of 80 GB of free disk space.
    I have successfully deployed the Open Platform with some of the value adds on top in a VM with 12 GB of RAM and I haven’t encountered any issues. This was however a test configuration so I would recommend you follow the official documentation for production deployments.
  • You’ll need to download the IBM Open Platform RPMs from https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=iopah4. You will need to register for an IBMid but it’s a quick and straightforward process.
    IOP 4.1 Download
  • Note that the machine I will be using in this tutorial is called big.example.com, its IP address is 192.168.59.200, and it is running RedHat Enterprise Linux Server 6.7.

System preparation

Start by making sure that all devices have assigned UUIDs and confirm that /etc/fstab is using the UUIDs for identifying the devices.

[root@big ~]# blkid
/dev/mapper/vg_big-lv_root: UUID="35757d7d-a8d8-4433-be20-e72141dab488" TYPE="ext4"
/dev/sda1: UUID="aef9dc20-b6f7-447b-95ef-45d0f7c9df36" TYPE="ext4"
/dev/sda2: UUID="tIQ4ff-IjOs-Sp9H-nR5M-SIu7-7ijC-RLjGT5" TYPE="LVM2_member"
/dev/mapper/vg_big-lv_swap: UUID="e24d88ef-5a30-4fc3-abff-975a8c42c8f1" TYPE="swap"
/dev/mapper/vg_big-lv_home: UUID="22a70f06-2381-47d1-8930-693c069b25ae" TYPE="ext4"
[root@big ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Tue Sep  8 19:43:07 2015
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=35757d7d-a8d8-4433-be20-e72141dab488 /                       ext4    defaults        1 1
UUID=aef9dc20-b6f7-447b-95ef-45d0f7c9df36 /boot                   ext4    defaults        1 2
UUID=22a70f06-2381-47d1-8930-693c069b25ae /home                   ext4    defaults        1 2
UUID=e24d88ef-5a30-4fc3-abff-975a8c42c8f1 swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
[root@big ~]#

Confirm that /etc/host has an entry for your host name and that you can resolve the name to an IP address.

[root@big ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.59.200  big.example.com big
[root@big ~]# ping big.example.com
PING big.example.com (192.168.59.200) 56(84) bytes of data.
64 bytes from big.example.com (192.168.59.200): icmp_seq=1 ttl=64 time=0.015 ms
64 bytes from big.example.com (192.168.59.200): icmp_seq=2 ttl=64 time=0.018 ms
^C
--- big.example.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1201ms
rtt min/avg/max/mdev = 0.015/0.016/0.018/0.004 ms
[root@big ~]#

Make sure that the hostname command returns correct values for the long and short host name. Do not ignore this as you won’t be able to install the Big SQL value add later – its monitoring script relies on getting correct long and short host names.

[root@big ~]# hostname --long
big.example.com
[root@big ~]# hostname --short
big
[root@big ~]#

Next we have to configure password-less login with SSH keys.

[root@big ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
a5:39:a1:71:fe:92:4f:5d:4b:66:ae:f9:0b:49:84:08 root@big
The key's randomart image is:
+--[ RSA 2048]----+
|      E          |
|       . . .     |
|      . + o .    |
|       = = .     |
|      . S   . =  |
|         + o B . |
|        o o + o  |
|         +   +   |
|          . o.o. |
+-----------------+
[root@big ~]# cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
[root@big ~]# chmod 700 /root/.ssh
[root@big ~]# chmod 640 /root/.ssh/authorized_keys
[root@big ~]#

Make sure both the short and long host names are added to known_hosts.

[root@big ~]# ssh root@big.example.com date
The authenticity of host 'big.example.com (192.168.59.200)' can't be established.
RSA key fingerprint is 26:15:71:25:08:a9:56:c7:06:c4:a0:e6:d1:57:45:b0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'big.example.com,192.168.59.200' (RSA) to the list of known hosts.
Tue Sep  1 09:10:16 BST 2015
[root@big ~]# ssh root@big date
The authenticity of host 'big (192.168.59.200)' can't be established.
RSA key fingerprint is 26:15:71:25:08:a9:56:c7:06:c4:a0:e6:d1:57:45:b0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'big' (RSA) to the list of known hosts.
Tue Sep  1 09:10:23 BST 2015
[root@big ~]#

Disable SELinux permanently by setting SELINUX to disabled in /etc/selinux/config.

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Don’t forget to reboot the machine for the changes to take effect.

Disable the firewall (again, the assumption is that this is an isolated test system).

[root@big ~]# chkconfig iptables off
[root@big ~]# service iptables stop
[root@big ~]#

Next we have to disable Transparent Huge Pages. Append the following to /etc/rc.local:

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
    echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi

To avoid another restart we can disable Huge Pages in the running OS:

[root@big ~]# echo never > /sys/kernel/mm/transparent_hugepage/enabled
[root@big ~]#

We also have to disable IPv6. Set the following parameters in /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Use sysctl to load the new settings from systcl.conf.

[root@big ~]# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
[root@big ~]#

Finally we have to configure and enable NTPD. If NTP isn’t already installed start by adding the ntp package.

[root@big ~]# yum install -y ntp
Loaded plugins: fastestmirror, refresh-packagekit
Setting up Install Process
...
Complete!
[root@big ~]#

Make sure you’ve got a set of NTP servers listed in /etc/ntp.conf then enable and start the ntpd daemon.

[root@big ~]# chkconfig --add ntpd
[root@big ~]# chkconfig ntpd on
[root@big ~]# service ntpd start
Starting ntpd:                                             [  OK  ]
[root@big ~]# chkconfig ntpd on
Make sure that the host is now synchronized:
[root@big ~]# ntpstat
synchronised to NTP server (94.125.132.7) at stratum 3
   time correct to within 354 ms
   polling server every 64 s
[root@big ~]#

Installing additional packages

With 4.1 we don’t have to manually install MySQL and PostgreSQL anymore. The only additional packages we need are nc and OpenJDK 1.8.0.

[root@big ~]# yum install -y nc
Loaded plugins: fastestmirror, refresh-packagekit
Setting up Install Process
...
Installed:
  nc.x86_64 0:1.84-24.el6
Complete!
[root@big ~]# yum install -y java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64
Loaded plugins: fastestmirror, refresh-packagekit
Setting up Install Process
...
Installed:
  java-1.8.0-openjdk.x86_64 1:1.8.0.51-1.b16.el6_7                                           java-1.8.0-openjdk-devel.x86_64 1:1.8.0.51-1.b16.el6_7
Dependency Installed:
  giflib.x86_64 0:4.1.6-3.1.el6          java-1.8.0-openjdk-headless.x86_64 1:1.8.0.51-1.b16.el6_7       jpackage-utils.noarch 0:1.7.5-3.14.el6       ttmkfdir.x86_64 0:3.0.9-32.1.el6
  tzdata-java.noarch 0:2015f-1.el6       xorg-x11-fonts-Type1.noarch 0:7.2-11.el6
Complete!
[root@big ~]#

Installing IBM Open Platform 4.1

Copy the iop-4.1.0.0-1.el6.x86_64.rpm package, which you’ve downloaded from the IBM website, onto the host and install it.

[root@big ~]# yum install -y iop-4.1.0.0-1.el6.x86_64.rpm
Loaded plugins: fastestmirror, refresh-packagekit
Setting up Install Process
…
Installed:
  IOP.x86_64 0:4.1.0.0-1
Complete!
[root@big ~]#

This package adds the IBM Open Platform repository to the local Yum configuration. We should now be able to use Yum to install Ambari.

[root@big ~]# yum clean all
Loaded plugins: fastestmirror, refresh-packagekit
Cleaning repos: BI_AMBARI-2.1.0 base extras updates
Cleaning up Everything
Cleaning up list of fastest mirrors
[root@big ~]# yum install -y ambari-server
Loaded plugins: fastestmirror, refresh-packagekit
Setting up Install Process
Determining fastest mirrors
…
Installed:
  ambari-server.x86_64 0:2.1.0_IBM-4

Dependency Installed:
  postgresql.x86_64 0:8.4.20-3.el6_6                        postgresql-libs.x86_64 0:8.4.20-3.el6_6                        postgresql-server.x86_64 0:8.4.20-3.el6_6

Complete!
[root@big ~]#

Run ambari-server setup to configure Ambari and don't forget to set the path to the OpenJDK 1.8.0 JRE.

[root@big ~]# ambari-server setup -j /usr/lib/jvm/java-1.8.0-openjdk.x86_64/
Using python  /usr/bin/python2.6
Setup ambari-server
Checking SELinux...
SELinux status is 'disabled'
Customize user account for ambari-server daemon [y/n] (n)?
Adjusting ambari-server permissions and ownership...
Checking firewall status...
Checking JDK...
WARNING: JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk must be valid on ALL hosts
WARNING: JCE Policy files are required for configuring Kerberos security. If you plan to use Kerberos,please make sure JCE Unlimited Strength Jurisdiction Policy Files are valid on all hosts.
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)?
Configuring database...
Default properties detected. Using built-in database.
Configuring ambari database...
Checking PostgreSQL...
Running initdb: This may take upto a minute.
Initializing database: [  OK  ]

About to start PostgreSQL
Configuring local database...
Connecting to local database...done.
Configuring PostgreSQL...
Restarting PostgreSQL
Extracting system views...
.ambari-admin-2.1.0_IBM_4.jar
.....
Adjusting ambari-server permissions and ownership...
Ambari Server 'setup' completed successfully.
[root@big ~]#

Start Ambari.

[root@big ~]# ambari-server start
Using python  /usr/bin/python2.6
Starting ambari-server
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start....................
Ambari Server 'start' completed successfully.
[root@big ~]#

Open a web browser and connect to the IP address of the target host at port 8080.

IOP 4.1 Ambari Login

Log in as admin/admin (the default Ambari administrative account).

IOP 4.1 Installation Wizard

At the Welcome page click Launch Install Wizard.

IOP 4.1 Get Started

Provide a name for your IBM Open Platform cluster and click Next.

IOP 4.1 Select Stack

Select the BigInsights 4.1 services stack and click Next.

IOP 4.1 Install Options

Provide the fully qualified name of your target host. Copy the contents of the SSH private key (/root/.ssh/id_rsa) and paste it in the Host Registration Information section. Click Register and Confirm.

IOP 4.1 Confirm Hosts

Select the target host and click Next. The system will run a set of checks and after the validation completes you will be presented with a list of services you can deploy as part of the installation.

IOP 4.1 Select Services

Leave the services selection by default and click Next.

IOP 4.1 Assign Masters

You can’t actually assign masters as you’ve only got a single machine so leave everything as it is and click Next.

IOP 4.1 Assign Slaves

Again, no role separation is possible in a single node installation so just click Next.

IOP 4.1 Customize Services

Select the individual tabs and provide the missing configuration information for Ooozie and Knox. This is about default administrative users and passwords. Once you’ve set credentials for all four the red markers will go away and you can click Next.

IOP 4.1 Review

Review the summary and click Deploy. This initiates the services deployment. Once the services are installed and configured they will start automatically.

IOP 4.1 Install and Start

Click Next.

IOP 4.1 Summary

Review the installation summary and click Complete.

IOP 4.1 Ambari Dashboard

Congratulations! You now have a working installation of the IBM Open Platform 4.1 with Apache Hadoop. You can see the available services on the left-hand side of the Ambari console and the Dashboard tab provides a general overview of the system.

Test your installation (optional)

Open a console and become the ambari-qa user.

[root@big ~]# su - ambari-qa
[ambari-qa@big ~]$

Run the following commands to initiate a series of test jobs. Make sure every single job completes successfully.

[ambari-qa@big ~]$ export HADOOP_MR_DIR=/usr/iop/current/hadoop-mapreduce-client
[ambari-qa@big ~]$ yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teragen 1000 /tmp/tgout
15/09/01 10:50:23 INFO impl.TimelineClientImpl: Timeline service address: http://big.example.com:8188/ws/v1/timeline/
...
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=100000
[ambari-qa@big ~]$ yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar terasort /tmp/tgout /tmp/tsout
15/09/01 10:51:08 INFO terasort.TeraSort: starting
…
15/09/01 10:51:33 INFO mapreduce.Job: Job job_1441100130926_0008 completed successfully
…
        File Input Format Counters
                Bytes Read=100000
        File Output Format Counters
                Bytes Written=100000
15/09/01 10:51:33 INFO terasort.TeraSort: done
[ambari-qa@big ~]$ yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teravalidate /tmp/tsout /tmp/tvout
15/09/01 10:53:36 INFO impl.TimelineClientImpl: Timeline service address: http://big.example.com:8188/ws/v1/timeline/
...
15/09/01 10:53:56 INFO mapreduce.Job: Job job_1441100130926_0009 completed successfully
...
        File Input Format Counters
                Bytes Read=100000
        File Output Format Counters
                Bytes Written=21
[ambari-qa@big ~]$