Build a Simple Dataset Web App on EMBL Cloud Infrastructure

Learn to create and deploy a web application using Python, Streamlit, Docker/Podman, and Kubernetes on EMBL cloud computing infrastructure

Stable: This tutorial is considered complete and reliable.

Overview

Time estimation: 2H

Version: 1.0

Last update: 2026-02-03

Questions:
  • How do I create a simple web application for data visualization?

  • How do I containerize an application using Docker or Podman?

  • How do I deploy a web application on Kubernetes cloud infrastructure?

  • What are the basic components needed for cloud deployment?

Objectives:
  • Create a simple Python web app using Streamlit

  • Build a container image for your application

  • Deploy the application on EMBL Kubernetes infrastructure

  • Access your web app through a public URL

People hiking in a row on the ice of Perito Moreno glacier, Los Glaciares national park, Santa Cruz province, Patagonia Argentina

Introduction

In this tutorial we will create a simple web app for a toy dataset in Python, using a CSV file as the dataset, then we will run this web app on EMBL infrastructure so it will be available online.

The goal of this tutorial is to give you the basics of making a simple webapp that can run on cloud! It’s not an in-depth tutorial, it’s just enough for you to get an intuition of what is possible and hopefully enough to get inspired to learn how to make a real academic grade webapp.

It only uses 6 files and will do the bare minimum, but these steps can be the backbone of any cloud application.

In this tutorial, we will cover:

  1. Introduction
  2. Prerequisites
  3. Create Your Git Project
  4. Set Up Your Local Environment
  5. Create the Application Files
  6. Test the App Locally
  7. Containerize Your Application
    1. Create the Containerfile
    2. Build and Test the Container
    3. Upload the Image to Git Registry
  8. Deploy to Kubernetes
    1. Set Up Kubernetes Access
    2. Create the Deployment Configuration
    3. Deploy and Test
  9. Conclusion

Prerequisites

Before starting this tutorial, you will need:

  • A computer with bash, git, and Python installed
  • Access to EMBL infrastructure (VPN if working remotely)
  • Basic familiarity with the command line
  • An EMBL account
Installing required software

If bash, git, or Python are not installed on your system, try this installation guide.

Create Your Git Project

Version control is essential for tracking changes and sharing your code. We’ll start by setting up a Git repository.

Set up Git repository
  1. Go to https://git.embl.de/
  2. Click New project
  3. Select Create blank project
  4. Configure your project:
    • Project name: my-data-dashboard
    • Visibility level: Public
    Why public visibility?

    The repository needs to be public so Kubernetes can access your container image later in the tutorial.

  5. Click Create project

Set Up Your Local Environment

Now we’ll clone the repository to your computer and set up the project structure.

Clone and configure the project locally
  1. Create or navigate to your projects folder:

    Bash
    cd ~/projects
    # Or create it if it doesn't exist
    mkdir -p ~/projects
    cd ~/projects
    
  2. Clone your repository:

    Bash
    git clone https://git.embl.de/<username>/my-data-dashboard.git
    cd my-data-dashboard
    

    Replace <username> with your EMBL username.

Create the Application Files

We’ll create three core files for our web application: the Python app, a data file, and a requirements file.

Create application files
  1. Create app.py with the following content:

    Python (app.py)
    import streamlit as st
    import pandas as pd
    
    st.write("# Data Dashboard")
    
    df = pd.read_csv("data.csv")
    st.dataframe(df)
    

    This simple app uses Streamlit to quickly build a web interface that displays your data.

  2. Create data.csv with sample data:

    CSV (data.csv)
    a,b,c
    1,2,3
    
  3. Create requirements.txt to specify dependencies:

    Text (requirements.txt)
    streamlit
    pandas
    

Test the App Locally

Before deploying to the cloud, it’s important to test your application locally.

Run the webapp on your computer
  1. Create a conda environment:

    Bash
    conda create -n my-data-dashboard-env python -y
    conda activate my-data-dashboard-env
    python -m pip install -r requirements.txt
    
  2. Test Streamlit installation:

    Bash
    streamlit hello
    

    If everything went well, this will open a colorful webpage about Streamlit in your browser.

  3. Stop the test server:
    • Press Ctrl+C in the terminal
  4. Run your application:

    Bash
    streamlit run app.py
    

    If successful, a simple webpage will open showing “Data Dashboard” with your data table.

  1. What happens when you run streamlit run app.py?
  2. How would you modify the app to display different data?
  1. Streamlit starts a local web server and opens your default browser to display the app, typically at http://localhost:8501
  2. You can modify data.csv with your own data, as long as it’s in CSV format that pandas can read
Congratulations!

If you made it this far, you’ve successfully run a webapp on your computer! This is a great achievement!

Now you can experiment on your computer before deploying to the cloud. Feel free to modify the dataset or try different Streamlit features. When you’re ready, commit your changes to git and continue with the tutorial.

Containerize Your Application

Now we’ll create a container image of your application. A container is like a snapshot of your code and its environment, ensuring it runs consistently anywhere.

Understanding Containers and Images

An image is like a snapshot or template that contains your code and all its dependencies. A container is a running instance of that image, providing an isolated environment for your application.

We’ll use Podman, which is free and open source. Podman commands are fully compatible with Docker, so you can replace podman with docker in any command if you prefer.

Install Podman
  1. Install Podman following these instructions

  2. Create and start your first Podman machine

  3. Test your installation:

    Bash
    podman info
    

Create the Containerfile

The containerfile tells Podman how to build your image.

Create the containerfile
  1. Create a file named containerfile (no extension) with the following content:

    Containerfile
    FROM --platform=linux/amd64 python:3.13-slim
    
    WORKDIR /app
    # Expose port you want your app on
    EXPOSE 8501
    
    # Upgrade pip and install requirements
    COPY requirements.txt requirements.txt
    RUN pip install -U pip
    RUN pip install -r requirements.txt
    
    # Copy app code and set working directory
    COPY . .
    
    ENTRYPOINT []
    # Run
    CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
    
Understanding the Containerfile

Let’s break down what each part does:

  • FROM --platform=linux/amd64 python:3.13-slim: Start from a Python image, specifying the platform for compatibility
  • WORKDIR /app: Set the working directory to avoid issues with the root folder
  • EXPOSE 8501: Open port 8501 for web traffic
  • COPY requirements.txt and RUN pip install: Install Python dependencies
  • COPY . .: Copy all project files into the container
  • CMD [...]: Command to run when the container starts, including server configuration

Build and Test the Container

Build and run the container locally
  1. Build the image:

    Bash
    podman build -t my-data-dashboard:0.0.1 -f containerfile .
    
    First build takes time

    The first time you run this, it might take a while because it’s downloading base images and dependencies. Subsequent builds will be much faster.

  2. Run the container:

    Bash
    podman run -p 8501:8501 localhost/my-data-dashboard:0.0.1
    
    Accessing the containerized app

    Unlike running locally, this command doesn’t automatically open a browser. Open your browser manually and navigate to localhost:8501 to see your webapp.

  1. What is the difference between building an image and running a container?
  2. Why do we need to specify port mapping with -p 8501:8501?
  1. Building creates the image (template), running creates a container (active instance) from that image
  2. Port mapping connects the container’s internal port 8501 to your computer’s port 8501, allowing you to access the app through your browser

Upload the Image to Git Registry

Once your image works locally, upload it to the Git container registry so Kubernetes can access it.

Push image to registry
  1. Navigate to your project at https://git.embl.de/<username>/my-data-dashboard

  2. Go to DeployContainer Registry

  3. Follow the instructions shown, adapting them for Podman:

    Bash
    podman login registry.git.embl.de
    podman build -t registry.git.embl.de/<username>/my-data-dashboard:0.1.0 -f containerfile .
    podman push registry.git.embl.de/<username>/my-data-dashboard:0.1.0
    

    Replace <username> with your EMBL username.

  4. Verify the upload at https://git.embl.de/<username>/my-data-dashboard/container_registry

Deploy to Kubernetes

The final step is deploying your application to the cloud using Kubernetes.

What is Kubernetes?

Kubernetes (K8s) is a system for running applications on clusters of computers, often in the cloud. You use kubectl (a command-line tool) to communicate with the Kubernetes cluster and ask it to run your application.

Think of Kubernetes as a manager that ensures your application runs smoothly on cloud infrastructure.

Set Up Kubernetes Access

Configure kubectl
  1. Log in to https://kubeportal.embl.de/

    Remote access requirements

    If accessing remotely, you’ll need:

  2. Create a tenant named my-data-dashboard with default cores and RAM

    What is a tenant?

    A tenant tells Kubernetes how many computing resources (CPU, RAM) your project needs. You can have multiple tenants for different projects, each with different resource requirements.

  3. Install kubectl:
    1. Download this setup script
    2. Make it executable and run it:

      Bash
      chmod +x setup-prod.sh
      ./setup-prod.sh
      
  4. Verify installation:

    Bash
    kubectl version --short
    
  5. Create a namespace for your tenant:

    Bash
    kubectl create namespace my-data-dashboard-ns1
    
    Understanding namespaces

    Namespaces help organize resources within a tenant. They’re useful for separating different environments (development, testing, production). The namespace name should start with your tenant name to help administrators track resources.

  6. Verify your namespace at https://kubeportal.embl.de/tenants/tenants

Create the Deployment Configuration

Now we’ll create a YAML file that tells Kubernetes how to deploy your application.

Create deploy-image.yaml
  1. Create a file named deploy-image.yaml

  2. Add a comment at the top (optional but helpful):

    YAML comment
    # Replace the following strings:
    # <username> x6 times - your EMBL username
    # <username2> x2 times - your supervisor's EMBL username
    # <appname> x17 times - my-data-dashboard
    # After testing, create a ticket at https://itsupport.embl.de
    # to enable www.my-data-dashboard.embl.de
    
  3. Add the deployment section:

    YAML (deploy-image.yaml - Part 1)
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        owner-username: <username>  # your EMBL username
        fallback-username: <username2>  # supervisor's username
      name: <appname>-<username>
      namespace: <appname>-ns1
    spec:
      selector:
        matchLabels:
          app: <appname>
      replicas: 1
      template:
        metadata:
          labels:
            app: <appname>
        spec:
          containers:
            - name: <appname>
              image: registry.git.embl.de/<username>/<appname>:0.1.0
              imagePullPolicy: "Always"
              ports:
                - name: http
                  containerPort: 8501
                  protocol: TCP
              resources:
                limits:
                  cpu: 1
                  memory: 512Mi
                requests:
                  cpu: 300m
                  memory: 128Mi
    ---
    
  4. Add the ingress section (network access configuration):

    YAML (deploy-image.yaml - Part 2)
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      annotations:
        traefik.ingress.kubernetes.io/router.tls.certresolver: sectigo
      name: <appname>-<username>
      namespace: <appname>-ns1
    spec:
      ingressClassName: internal-users
      rules:
      - host: <appname>.embl.de
        http:
          paths:
          - backend:
              service:
                name: <appname>
                port:
                  name: http
            path: /
            pathType: Prefix
      tls:
      - hosts:
        - <appname>.embl.de
    ---
    
    Enabling the URL

    To enable the URL www.<appname>.embl.de, you need to create a ticket at https://itsupport.embl.de/ requesting web support. The name might already be taken, so be prepared to choose an alternative.

  5. Add the service section:

    YAML (deploy-image.yaml - Part 3)
    apiVersion: v1
    kind: Service
    metadata:
      name: <appname>
      namespace: <appname>-ns1
    spec:
      ports:
      - name: http
        port: 8501
        protocol: TCP
        targetPort: 8501
      selector:
        app: <appname>
    
  6. Replace all instances of <username>, <username2>, and <appname> with your actual values

Deploy and Test

Run your app in the cloud
  1. Apply the deployment configuration:

    Bash
    kubectl apply -f deploy-image.yaml
    
  2. Check if your container is running:

    Bash
    kubectl get pods -n my-data-dashboard-ns1
    

    Note the NAME of your pod (something like my-data-dashboard-<username>-<random-id>). You’ll need this for the next steps.

  3. Test the service using port-forwarding:

    Bash
    kubectl port-forward pod/<podname> 8501:8501 -n my-data-dashboard-ns1
    

    Replace <podname> with the pod name from step 2.

  4. Open your browser and navigate to localhost:8501

    Success!

    If you can see your Data Dashboard, congratulations! Your app is running in the cloud!

  5. To check the pod status and details:

    Bash
    kubectl describe pod <podname> -n my-data-dashboard-ns1
    
  1. What is the purpose of port-forwarding in Kubernetes?
  2. How would you update your app with new changes?
  1. Port-forwarding creates a tunnel from your local computer to the pod in Kubernetes, allowing you to test the app before making it publicly accessible
  2. To update: modify your code, rebuild the container image with a new version number, push it to the registry, update the version in deploy-image.yaml, and run kubectl apply -f deploy-image.yaml again

Conclusion

Congratulations on completing this tutorial! You’ve successfully:

  • Created a simple web application using Python and Streamlit
  • Containerized the application with Podman
  • Deployed it to EMBL’s Kubernetes cloud infrastructure
  • Made your application accessible (with port-forwarding, and potentially via a public URL)

This tutorial covered the fundamental workflow for cloud deployment. While we used a minimal example, these same principles apply to more complex applications.

Next Steps

Now that you understand the basics, consider:

  • Expanding your dataset and adding more visualizations
  • Exploring Streamlit’s advanced features
  • Learning more about Kubernetes resource management
  • Experimenting with different container configurations
  • Building more sophisticated data dashboards for your research

Take some time to review each step and understand how they connect. In the future, you might want to deploy different types of applications using this same workflow!

Key Points

  • A simple web app requires only 6 files and minimal code

  • Containers ensure your app runs consistently across different environments

  • Kubernetes enables cloud deployment and makes your app publicly accessible

  • Git is essential for version control and sharing your container images

  • Testing locally before deploying to the cloud saves time and catches errors early

💬 Feedback: Found something unclear or want to suggest an improvement? Open a feedback issue.

👥 Contribution: We also welcome contributions when you spot an opportunity to improve the training materials. Please review the contribution page first. Then, edit this material on GitHub to suggest your improvements.

Contributions

Author(s): author 1, author 2