Monday, September 24, 2018

Data Science environment with Docker and Jupyter on the IBM Mainframe

Guide to getting started with Docker, Python and Jupyter Notebook on zLinux.

Here, I'm using Red Hat Enterprise Linux 7.5 to build and deploy Jupyter notebook in an Ubuntu container. I will go over the steps used to build and run a Docker container.

Oh, and in case you're wondering: why would anyone do this - check out this snippet from the z14 announcement: "Microservices can be built on z14 with Node.js, Java, Go, Swift, Python, Scala, Groovy, Kotlin, Ruby, COBOL, PL/I, and more. They can be deployed in Docker containers where a single z14 can scale out to 2 million Docker containers".
A few basic commands:
Establish the OS release and version. We're running on RHEL 7.5 for s390x.
[cmihai@rh74s390x ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)

[cmihai@rh74s390x ~]$ uname -a
Linux rh74s390x.novalocal 3.10.0-693.17.1.el7.s390x #1 SMP Sun Jan 14 10:38:29 EST 2018 s390x s390x s390x GNU/Linux

[cmihai@rh74s390x ~]$ docker --version
Docker version 17.05.0-ce, build 89658be

Setup regular user access, sudo and SSH keys

Create a regular user account

useradd cmihai
passwd cmihai
usermod -aG wheel cmihai
su - cmihai

Add your SSH public key to authorized_hosts

mkdir -p ~/.ssh
echo "YOURKEYHERE" >> ~/.ssh/authorized_keys

Log in as your new user, and forward port 9000:

ssh -L 9000:https://www.linkedin.com/redir/invalid-link-page?url=127%2e0%2e0%2e1%3A9000 -i cmihai.pem cmihai@myzLinux

Setup docker

Create the Docker group

sudo groupadd docker
sudo usermod -aG docker cmihai

Start Docker

sudo systemctl enable docker
sudo systemctl restart docker.service
sudo systemctl status docker.service

Test docker

docker run s390x/hello-world

Let’s run a simple Ubuntu interactive shell:

docker run --name s390x-ubuntu --hostname s390x-ubuntu --interactive --tty s390x/ubuntu /bin/bash

Building a Docker container for Jupyter Notebook

Create a Dockerfile from the s390x/ubuntu base image.
FROM s390x/ubuntu
MAINTAINER Mihai Criveti

# ADD AND RUN
RUN apt-get update \
    && apt-get install -y python3 python3-pip \
    && pip3 install jupyter \
    && apt-get clean

# COMMAND and ENTRYPOINT:
CMD ["jupyter","notebook","--allow-root","--ip=0.0.0.0","--port=9000"]

# NETWORK
EXPOSE 9000

Build the container:

docker build . --tag "cmihai/jupyter-lite:v1" -f Dockerfile

Run your new container:

docker run --name jupyter --hostname jupyter -p 9000:9000 cmihai/jupyter-lite:v1

Connect to Jupyter Notebook

You can now install depedencies directly from Jupyter:

!apt-get install --yes zlib1g-dev libjpeg-dev
Potential next steps:
  • Consider setting up persistence for your notebooks (ex: VOLUME ["/notebooks"] in Dockerfile)
  • Setup Docker Compose and build multi-tiered applications specifications - such as connecting your Jupyter Notebook to PostgreSQL, Redis, Spark, etc.
  • Set up other programming languages or kernels (Java, R) even Zeppelin Notebook
For an interactive tutorial of using Docker for Data Science, check out: https://github.com/crivetimihai/docker-data-science