The process of creating a JupyterHub using Kubernetes is very well documented via http://zero-to-jupyterhub.readthedocs.io/en/latest/. That said, if are the slightest bit opinionated, that is to select Google Cloud, as our cloud provider, and GitHub OAuth as our authentication protocol, the instructions can be significantly streamlined.
The Zero-to-JupyterHub method is already opinionated in that it is built around the use of Kubernetes as its orchestration software. We take this a step further. This guide will use:
- Google Cloud and the Google Container Engine (GKE)
- Google Cloud works better for managing a Kubernetes cluster because Kubernetes was created by Google
- we'll say "geek" for ease
- Kubernetes to manage resources on the cloud
- Helm to configure and control Kubernetes
- Helm is the package manager for Kubernetes
- Docker to use containers that standardize computing environments
- JupyterHub to manage users and deploy Jupyter notebooks
- Github OAuth for authentication
Configure Google Cloud
We will first need to configure a Google Cloud Account.
- Go to https://console.cloud.google.com.
- If this is a new Google Cloud Account, click "Sign Up for Free Trial".
- You will be prompted to enter a credit card. The free trial will give us $300 in credits. This should be sufficient for our purposes. If you are uncomfortable entering a credit card, you may use Joshua's.
- Click the hamburger icon in the top left (the icon has three horizontal lines in one button). Go to “APIS & Services > Dashboard”.
- Click "Enable APIs & Services".
- Click "Google Compute Engine API". Click "Enable".
- Return to the Dashboard and Enable the following APIs:
- Google Container Engine API
- Google Container Registry API
- NOTE You may need to search for these APIs
Install and Configure
gcloud command line tools
We will be doing all of our cluster management via the
gcloud command line tool. We next need to install and configure the
gcloud command line. We can do so via the command line. These tools send commands to Google Cloud and lets you do things like create and delete clusters. Go to the gcloud downloads page or the gcloud documentation for more information on the gcloud SDK.
gcloud via Interactive Installer
We recommend the use of the Interactive Installer to install the tools:
curl https://sdk.cloud.google.com | bash exec -l $SHELL
curl a script from
google.com and pipe (
|) this script to
bash for immediate execution.
When this process has been completed the
gcloud tools will also need to be initialized.
gcloud command-line tools.
Visit the suggested link in your browser. Retrieve the verification code and paste it into your shell session.
Select the automatically created project.
Configure Google Compute Engine with a default zone. I chose
kubectl for controlling Kubernetes.
gcloud components install kubectl
Create a Kubernetes cluster
Create a Kubernetes cluster on Google Cloud, by typing in the following command:
gcloud container clusters create jhub \ --num-nodes=4 \ --machine-type=n1-standard-2 \ --zone=us-west1-a
You should see something like this:
NAME ZONE MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS jhub us-west1-a 1.6.9 22.214.171.124 n1-standard-2 1.6.9 4 RUNNING
To test if your cluster is initialized, run:
kubectl get node
You should see something like this:
NAME STATUS AGE VERSION gke-jhub-default-pool-065452e9-0w73 Ready 6m v1.6.9 gke-jhub-default-pool-065452e9-4hpb Ready 6m v1.6.9 gke-jhub-default-pool-065452e9-4lpj Ready 6m v1.6.9 gke-jhub-default-pool-065452e9-pct2 Ready 6m v1.6.9
Set up Helm
Helm is a package manager for Kubernetes Applications. In the words of the developers behind Helm:
Helm is the best way to find, share, and use software built for Kubernetes.
Here, we will be using it to install our JupyterHub software. Helm uses "Charts" to describe a Kubernetes application. In a moment we will download the Helm Chart defining JupyterHub.
Helm provides an installer script we can run in order to install the software:
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
As before we
curl the script and pipe it to
Once the installation has completed, Helm can be initialized with
Now we are ready to use Helm to install our application.
In order to install JupyterHub, we first create a
config.yaml file to define the configuration of our system.
- Run the folowing commands to create the directory and files we will need for our project. :
mkdir -p ~/src/jhub cd ~/src/jhub touch config.yaml
We will also need to random hex strings as security tokens. These can be generated by running the following commands:
openssl rand -hex 32 openssl rand -hex 32
Using the two strings you just generated, add the following to
config.yaml using your favorite text editor:
data: admin-users: #YOUR-GITHUB-HANDLE# hub: cookieSecret: #STRING1# proxy: secretToken: #STRING2# # auth: # type: github # github: # clientId: # clientSecret: # callbackUrl: singleuser: image: name: jupyter/scipy-notebook tag: ae885c0a6226 memory: limit: 600M guarantee: 600M cpu: limit: 0.6 guarantee: 0.6
Next, we download the Helm Chart.
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
You should see
"jupyterhub" has been added to your repositories
We next sync our local Helm cache with the remote by running
helm repo update
You should see
Hang tight while we grab the latest from your chart repositories... ...Skip local chart repository ...Successfully got an update from the "jupyterhub" chart repository ...Successfully got an update from the "stable" chart repository Update Complete. ⎈ Happy Helming!⎈
Finally, we use the Helm Chart to install JupyterHub on our Kubernetes Cluster
helm install jupyterhub/jupyterhub \ --name=jhub \ --namespace=jhub \ -f config.yaml
When this has completed we can view the status of our nodes via
kubectl --namespace=jhub get pod
You should see:
NAME READY STATUS RESTARTS AGE hub-deployment-2708999902-zt8zs 1/1 Running 0 6m proxy-deployment-51742714-8dgxp 1/1 Running 0 6m
We can obtain our cluster's public IP address via
kubectl --namespace=jhub get svc proxy-public
You should see:
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE proxy-public 10.11.243.94 126.96.36.199 80:32443/TCP 6m
If you navigate to this IP in your browser, you should see the following:
Note that we do not have a login as we have not yet configured one. This was intentional, as we will be using Github Oauth to provide authentication into our Hub.
Github OAuth Authentication
Go to your personal Github to register a new Github OAuth Application.
- Visit Settings.
- On the Settings Page, under Developer Settings, select OAuth Apps.
- Under Developer Applications, click "Register a new application".
- Give the Application a name such as "JHub-#LOCATION#"
- Under Homepage URL enter
#EXTERNAL_IP#with the External-IP for the JupyterHub Cluster.
- Under Application Description enter the following text from the JupyterHub documentation:
JupyterHub, a multi-user Hub, spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. JupyterHub can be used to serve notebooks to a class of students, a corporate data science group, or a scientific research group.
- Under Authorization callback URL enter
- Click "Register Application".
Obtain the Security Credentials for the new Application. From the Application page copy the
Client ID and
config.yaml file, add these strings, and the Callback URL you entered.
helm upgrade jhub jupyterhub/jupyterhub -f config.yaml