Configuring JupyterHub

Posted on Mon 04 December 2017 in JupyterHub

The process of creating a JupyterHub using Kubernetes is very well documented via http://zero-to-jupyterhub.readthedocs.io/en/latest/. That said, if are the slightest bit opinionated, that is to select Google Cloud, as our cloud provider, and GitHub OAuth as our authentication protocol, the instructions can be significantly streamlined.

Components

The Zero-to-JupyterHub method is already opinionated in that it is built around the use of Kubernetes as its orchestration software. We take this a step further. This guide will use:

  • Google Cloud and the Google Container Engine (GKE)
  • Google Cloud works better for managing a Kubernetes cluster because Kubernetes was created by Google
  • we'll say "geek" for ease
  • Kubernetes to manage resources on the cloud
  • Helm to configure and control Kubernetes
  • Helm is the package manager for Kubernetes
  • Docker to use containers that standardize computing environments
  • JupyterHub to manage users and deploy Jupyter notebooks
  • Github OAuth for authentication

Configure Google Cloud

We will first need to configure a Google Cloud Account.

  1. Go to https://console.cloud.google.com.
  2. If this is a new Google Cloud Account, click "Sign Up for Free Trial".
  3. You will be prompted to enter a credit card. The free trial will give us $300 in credits. This should be sufficient for our purposes. If you are uncomfortable entering a credit card, you may use Joshua's.
  4. Click the hamburger icon in the top left (the icon has three horizontal lines in one button). Go to “APIS & Services > Dashboard”.
  5. Click "Enable APIs & Services".
  6. Click "Google Compute Engine API". Click "Enable".
  7. Return to the Dashboard and Enable the following APIs:
  8. Google Container Engine API
  9. Google Container Registry API
  10. NOTE You may need to search for these APIs

Install and Configure gcloud command line tools

We will be doing all of our cluster management via the gcloud command line tool. We next need to install and configure the gcloud command line. We can do so via the command line. These tools send commands to Google Cloud and lets you do things like create and delete clusters. Go to the gcloud downloads page or the gcloud documentation for more information on the gcloud SDK.

Install gcloud via Interactive Installer

We recommend the use of the Interactive Installer to install the tools:

curl https://sdk.cloud.google.com | bash
exec -l $SHELL

Here we curl a script from google.com and pipe (|) this script to bash for immediate execution.

When this process has been completed the gcloud tools will also need to be initialized.

Initialize the gcloud command-line tools.

gcloud init
  1. Visit the suggested link in your browser. Retrieve the verification code and paste it into your shell session.

  2. Select the automatically created project.

  3. Configure Google Compute Engine with a default zone. I chose us-west1-a.

Install kubectl for controlling Kubernetes.

gcloud components install kubectl

Create a Kubernetes cluster

Create a Kubernetes cluster on Google Cloud, by typing in the following command:
gcloud container clusters create jhub \
    --num-nodes=4 \
    --machine-type=n1-standard-2 \
    --zone=us-west1-a
You should see something like this:
NAME  ZONE        MASTER_VERSION  MASTER_IP     MACHINE_TYPE   NODE_VERSION  NUM_NODES  STATUS
jhub  us-west1-a  1.6.9           35.197.83.28  n1-standard-2  1.6.9         4          RUNNING
To test if your cluster is initialized, run:
kubectl get node

You should see something like this:

NAME                                  STATUS    AGE       VERSION
gke-jhub-default-pool-065452e9-0w73   Ready     6m        v1.6.9
gke-jhub-default-pool-065452e9-4hpb   Ready     6m        v1.6.9
gke-jhub-default-pool-065452e9-4lpj   Ready     6m        v1.6.9
gke-jhub-default-pool-065452e9-pct2   Ready     6m        v1.6.9

Set up Helm

Helm is a package manager for Kubernetes Applications. In the words of the developers behind Helm:

Helm is the best way to find, share, and use software built for Kubernetes.

Here, we will be using it to install our JupyterHub software. Helm uses "Charts" to describe a Kubernetes application. In a moment we will download the Helm Chart defining JupyterHub.

Helm provides an installer script we can run in order to install the software:

curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash

As before we curl the script and pipe it to bash.

Once the installation has completed, Helm can be initialized with

helm init

Now we are ready to use Helm to install our application.

Install Jupyterhub

In order to install JupyterHub, we first create a config.yaml file to define the configuration of our system.

  1. Run the folowing commands to create the directory and files we will need for our project. :
mkdir -p ~/src/jhub
cd ~/src/jhub
touch config.yaml

We will also need to random hex strings as security tokens. These can be generated by running the following commands:

openssl rand -hex 32
openssl rand -hex 32

Using the two strings you just generated, add the following to config.yaml using your favorite text editor:

data:
    admin-users: #YOUR-GITHUB-HANDLE#
hub:
  cookieSecret: #STRING1#
proxy:
  secretToken: #STRING2#
# auth:
#    type: github
#    github:
#       clientId: 
#       clientSecret: 
#       callbackUrl: 
singleuser:
    image:
       name: jupyter/scipy-notebook
       tag: ae885c0a6226
    memory:
       limit: 600M
       guarantee: 600M
    cpu:
       limit: 0.6
       guarantee: 0.6

Next, we download the Helm Chart.

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/

You should see

"jupyterhub" has been added to your repositories

We next sync our local Helm cache with the remote by running

helm repo update

You should see

Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "jupyterhub" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈

Finally, we use the Helm Chart to install JupyterHub on our Kubernetes Cluster

helm install jupyterhub/jupyterhub \
    --name=jhub \
    --namespace=jhub \
    -f config.yaml

When this has completed we can view the status of our nodes via

kubectl --namespace=jhub get pod

You should see:

NAME                              READY     STATUS    RESTARTS   AGE
hub-deployment-2708999902-zt8zs   1/1       Running   0          6m
proxy-deployment-51742714-8dgxp   1/1       Running   0          6m

We can obtain our cluster's public IP address via

kubectl --namespace=jhub get svc proxy-public

You should see:

NAME           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
proxy-public   10.11.243.94   104.196.245.19   80:32443/TCP   6m

If you navigate to this IP in your browser, you should see the following:

Note that we do not have a login as we have not yet configured one. This was intentional, as we will be using Github Oauth to provide authentication into our Hub.

Github OAuth Authentication

Go to your personal Github to register a new Github OAuth Application.

  1. Visit Settings.
  2. On the Settings Page, under Developer Settings, select OAuth Apps.
  3. Under Developer Applications, click "Register a new application".
  4. Give the Application a name such as "JHub-#LOCATION#"
  5. Under Homepage URL enter http://#EXTERNAL_IP# replacing #EXTERNAL_IP# with the External-IP for the JupyterHub Cluster.
  6. Under Application Description enter the following text from the JupyterHub documentation:

JupyterHub, a multi-user Hub, spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. JupyterHub can be used to serve notebooks to a class of students, a corporate data science group, or a scientific research group.

  1. Under Authorization callback URL enter http://#EXTERNAL_IP#/hub/oauth_callback.
  2. Click "Register Application".

Obtain the Security Credentials for the new Application. From the Application page copy the Client ID and Client Secret.

In the config.yaml file, add these strings, and the Callback URL you entered.

Run this:

helm upgrade jhub jupyterhub/jupyterhub -f config.yaml