BilderSkript

Engineering Documentation


About

BilderSkript

BilderSkript is a compound created from the following words.

Bilder [ˈbɪldɐ], (German), images

Skript [skʁɪpt], (German), a written document or notes

BilderSkript automatically summarizes a lecture as a sequence of interesting images scenes.

Technically, it trains a deep neural net on objection recognition and extracts interesting scenes from a large sequence of still image recordings. Theses scenes compile to the BilderSkript

Project Definition

Motivation and Goal

A lecturer usually provides a lot of supplemental course material such as a written script, slides, reading material, exercises.

However, students visiting the course’s lecture experience an additional channel, which helps them to sort and weight the supplemental material. The way how a lecturer presents the content will let attendees realize the weight and importance of certain parts of the content. Hence, it supports the creation of a red thread throughout entire supplemental material.

In contrast to full lecture recording, the BilderSkript approach condenses the recording to find those parts within a lecture which contribute to the understanding of the supplemental material.

Primarily, BilderSkript aims at course attendees, however, it could also be valuable for remote attentees.

BilderSkript has the following goals

  • re-create partially the course experience
  • support attenting students to follow-up on the content
  • support students who have not attended all lessions to catch-up on the course
  • improve the efficiency when working with the supplemental course material

Objectives

There are a couple of activities to perform

  • record lectures as a sequence of wide angle still images
  • train a computer to identify objects within still images
  • define a metric of interestingness from the sequence of recognized objects
  • extract interesting images and compile a lecture notebook from which ultimately creates the BilderSkript

Research Questions

BilderSkript relies on widely available open-source ML software. However, there are a couple of interesting research questions on the way.

  • How to automate image preprocessing? - Image recording activity does not happen in a controlled environment, i.e. lecture rooms change, the light varies, the camera positions are not fixed (resulting in varying field of views), varying colors of background, cloths etc.
  • Can we successfully transfer other object identfication models to lecture recordings? Object identfication utilizes labeled data from one or few lectures for training. Since labeling is laborious, we are interested to transfer trained models from other areas and apply for object identfication in our lecture recordings.

Introductionary Example

BilderSkript takes a series of still images as input and compiles visual lecture notes as output. Below we illustrate a step-by-step walkthrough how the software processes the data.

Image recording: A 360 camera records the entire room. However, the lens towards the audience is covered to maintain privacy. The images have the typical distorted appearance due to the camera’s fisheye lenses. Nevertheless, it creates an approx. 200 degrees, wide-angle recording.

fisheye raw image

Image preparation: The BilderSkript projects fisheye images as equirectangular images to correct for the distortion.

fisheye equirectangular projections

Finally, the perspective control corrects the deformation of vertical and horizontal lines when the camera records the projection wall or blackboard from an inclinded position.

perspective control

Object identfication: In this step BilderSkript identifies typical objects, such as blackboard, lecturer, video projection, within each image.

tbd. include image

Classification: In particular, if there is a blackboard, it is interesting to know whether the board is empty or not. The classifiers utilizes the identified blackboard from the previous step and assigns class labels, which correspond to the blackboard’s state. This step may filter images before it inputs them to the classifier.

Label Original Image Filtered Image
empty empty blackboard filtered empty blackboard
full full blackboard filtered full blackboard

Sequencing: Utilizing the object id, it transforms the set of images into a sequence of object compositions. A step consists of all identified objects in a single image. Please note, each step links back to the original image.

tbd. include image

Interesting sequences: This steps only operates on the sequence of identified objects. Based on a definition on interestingness, it quantifies the previously created sequences on this metric. The idea is that interestingness emerges from changes in the composition of identified objects.

tbd. include image

Compilation: By the degree of interestingness BilderSkript applies a threshold to extract the best ones. Once it has found an interesting sequence, it links each step within this sequence back to its image where it originates from. Finally, it compiles the scene from these images.

tbd. include image

System Design

This use case diagram depicts BilderSkript’s main services. Note, the shown use cases do not imply an execution order.

system's dataflow

The engineer performs the system setup, configures and trains the system for the user to be beneficial.

Docker Container Toolchain

Docker Images and Volumes

Docker images contain the various software pipelines from the tool chain. The responsibilities within the overall system design motivate the system boundaries induced by the distribution of pipelines across docker images. The following table list the docker images and their respective pipeline functionalities.

Image Pipeline Notes
vscode n/a IDE VS code
builder Build makefile, doc, versioning via git and dvc, mlflow exp.
hugin Data prep image data preparation using hugin
hugin-vnc n/a like hugin, but provides VNC server for GUI interaction
mrcnn ML training object detection
ludwig ML training classification
cicd Deployment pipeline not yet implemented

When started as docker containers, they run scripts and communicate with each other utilizing shared volumes on the filesystem. The image below illustrates the docker toolchain.

docker toolchain

The most important volumes are:

  • ${APP_ROOT}/ipc : used to store sockets for IPCs between containers
  • ${APP_ROOT}/pipelines : contains all BilderSkript pipelines
  • ${APP_ROOT}/images : data directory
  • ${APP_ROOT}/src : stores source codes
  • ${APP_ROOT}/scripts : scripts
  • ${APP_ROOT}/ludwig : stores experiments, models and prediction
  • ${APP_ROOT}/vscode : maintains the state of VS code
  • ${APP_ROOT}/docs : github’s webserver serves the docu blog from this directory
  • ${APP_ROOT}/docs_site : the docu blog’s source

By default ${APP_ROOT} is set to /bilderskript. For instance, one accesses the pipeline scripts under /bilderskript/pipelines.

Build and Startup

Docker Compose creates all images, containers and volumes. The configuration is defined in the docker-compose.yml file

Building all images using the following command

docker-compose build

Start a container from builder image and get a bash shell. The default container name is bilderskript_builder_1

docker-compose up -d builder
docker exec -it bilderskript_builder_1 /bin/bash

Pipelines

Pipelines are the fundamental building blocks of the BilderSkript system.

Pipeline Definitions

The term pipeline is heavily used in machine learning (ML). It generally refers to a sequence of steps to run in order to perform transformations. There are various kinds of pipelines, e.g. data pipelines, machine learning pipelines, deployment pipelines and others. Depending on the pipeline type, it takes a a certain ressource type as input and produces output ressources by the application of the pipeline’s transformation steps. For instance, a data pipeline takes data files as an input and prepares them to be fed into a machine learning algorithm of the ML pipeline.

BilderSkript stores pipelines in the pipelines volume usually accessed under /bilderscript/pipelines. They are encoded as snakemake makefiles for version control.

Snakemake

Snakemake is a workflow management system to implement data analysis pipelines.

Snakemake compares the input and output ressources. These ressources are files. If the modification date of any of the input files is newer than the output file, snakemake runs the shell command. This behavior is encoded as rules, which transform input files into output files. All rules of a pipeline are defined in a snakefile. We list some important snakemake commands.

Run the snakemake pipeline

snakemake <pipeline name>

Generate a report

snakemake <pipeline name> --report <snakefile.html>

Generate a summary table displaying the current state of input and output files.

snakemake <pipeline name> --summary

Before the snakefile is run, snakemake generates a DAG which shows the dependencies. This command visualizes the DAG.

snakemake <pipeline name> --dag | dot -Tpng > <pipeline name>.png

BilderSkript Pipelines

The BilderSkript pipelines follow the guidelines of The Snakemake-Workflows project. There is a central Snakemake file which includes the configuration and the concrete workflows for the data and ML pipelines.

The workflow’s design separates the pipeline specific parameters from the dataset specific ones. A separate file stores each parameter set

  • config.yaml contains the pipeline specific parameters. This file exists only once and contains the parameters for all pipelines.
  • dataset.csv stores the parameters relevant for the dataset a pipeline processes. There may be several of these files, one for each class of data for a pipeline. There is no fixed schema. The file’s name depends on the content and the pipeline, which sources the file. A pipeline which processes various ML models may utilize a models.csv file to store the set of available models and additional parameters for each model.

The approach increases the flexibility to apply the pipelines to various datasets. The figure below depicts this separation

workflow design for all pipelines

The pipelines are named after their snakefile. All pipelines files stay in the /bilderskript/pipelines directory relative to the BilderSkript’s project dir.

  • doc.snakefile: describes the pipeline for documentation generations. You may view the pipeline’s report.

  • data_prep.snakefile: prepares the image files for the ML pipeline. It’s a complex pipeline because it utilizes interprocess communication (IPC) with the hugin container. You may view the pipeline’s report. data_prep.snakefile requires parameters in config.yaml:

    • datasets_idx: line in datasets.csv defining the data to process
    • out_dir: the directory where prep’ed images as a result of the pipeline execution are stored

    The pipeline’s default behavior is started by the data_prep.sh script. The pipeline’s rule DAG is shown in data_prep.snakefile.png

  • ludwig.snakefile: detects the blackboard’s state of writing, i.e. whether the writing on the blackboard fills out the blackboard completely, partially, or alternatively, the blackboard is empty.

    The pipeline’s default behavior is started by the [ludwig.sh]

Run the following script to create reports for all BilderSkript pipelines

snakemake_report.sh

[data_prep.snakefile] Pipeline

The data preparation pipeline, data_prep.snakefile, comprises of two phases

  1. Configure the pipeline’s parameters
    • config.yaml defines pipeline specific parameters
    • datasets.csv defines the data input and data-specific processing parameters
  2. Run the pipeline on the images from the lecture recording

1. Configuration

The following activity diagram describes the steps to configure the data preparation pipeline. Configuration is stored in config.yaml and dataset.csv.

Precondition:

  • builder and hugin-vnc container up and running
  • at least one image from lecture recording available

Postcondition:

  • images directories set
  • .pto files created
  • (optionally) crop specification defined

The parameter values from above need to be stored in the config.yaml and dataset.csv file. The activity diagram indicates to which file each parameter belongs to.

Commands

Spin up the hugin-vnc container.

docker-compose up -d hugin-vnc

Afterwards, use a vnc client to get Hugin’s display and create the .pto files. Modify the pipeline’s dataset.csv and config.yaml to tell the scripts where to look for the files. The activity diagram below displays the steps to create the files.

data prep configuration

2. Run the Pipeline

Finally, the engineer starts the data prep pipeline and the pipeline processes the input images from the img_dir and places it in the out_dir.

Precondition:

  • builder and hugin container up and running
  • .pto files created
  • img_dir and out_dir image directories set

Parameter values from above are sourced from the the dataset.csv file.

Postcondition:

  • images processed placed in out_dir

Pipeline Start

The following commands start the pipeline. The activity diagram belows depicts the details of the pipeline run.

Spin up the builder and hugin container and get an interactive shell to the builder container.

docker-compose up -d builder hugin
docker exec -it bilderskript_builder_1 /bin/bash

Within the builder container run

cd BilderSkript/pipelines
./data_prep.sh

data prep run

[ludwig.snakefile] Pipeline for Blackboard Classification

The ludwig.snakefile pipeline takes blackboard images as input and assigns labels like full, partial, empty. It indicates the whether the writing on the blackboard fills out the blackboard completely, partially, or alternatively, the blackboard is empty.

[doc.snakefile] Pipeline Generating the Blog

The pipeline doc.snakefile generates

  1. UML diagrams using plantuml
  2. the project’s website using hugo

Run the pipeline from within the builder container in /bilderskript/pipelines

snakemake doc

Train the Blackboard Image Classifier

The image classifier detects the blackboard’s state of writing. It is an interesting BilderSkript context to know whether the writing on the blackboard fills out the blackboard completely, partially, or alternatively, the blackboard is empty.

Experiment

We learn a model, a classifier, which maps images to class labels, e.g. full, partial, empty.

  • utilizes the Ludwig docker container
  • runs from command line;
  • script tests whether the container already runs and if appropriate spins it up
  • utilizes comet.ml to store experiment results
  • manually shutdown the container

For utilizing comet.ml you need to provide a .comet.env file in the project’s directory. The file conatins the API key and the comet’s project name.

Walkthrough

Start an experiment from scripts/ludwig.

STEP 1: Prepare the input dataset

Precondition:

  • `.png’ images as input for training the model
  • within the <training image path> directory the images are organized in subdirs, where subdir names are the class labels
  • experiment name, e.g. stats_lecture

Command:

./ludwig_experiment.sh datacsv <experiment name> <training image path>

Postcondition:

  • directory created: ludwig/experiements/<experiment name>
  • directory contains <experiment name>.csv, which links images to class labels

STEP 2: Learn the model

Precondition: see above

Command:

./ludwig_experiment.sh experiment <experiment name> <training image path>

Postcondition:

  • Results available in ludwig/experiements/<experiment name>/results
  • comet.ml website contains experiment’s results

STEP 3: Visualize and Shutdown

Precondition: the experiment name from above

Command:

./ludwig_experiment.sh visualize <experiment name> 
./ludwig_experiment.sh shutdown <experiment name> 

Postcondition:

  • comet.ml website contains for addtional images from the experiment run
  • the Ludwig container is removed.

Multi Experiments

Multi experiements are repeated experiements which varying parameters. Particularly, we vary the parameters for the image pre-processing.

Start a multi experiment from scripts/ludwig. All directories created during an regular experiment are prefixed by exp<num>_ indicating the run number within the multi experiment.

Precondition:

  • see initial preconditions of a single experiment
  • edit ludwig_multi_experiments.sh for
    • training image path
    • parameters to to loop through

Command:

./ludwig_multi_experiments.sh <experiment root name>

Postcondition:

  • directories with results created: ludwig/experiements/exp<num>_<experiment root name>
  • comet.ml website contains all experiments’ results

Training Process

The following activity diagram illustrates the overall sequence of actions for running one or more experiments. The swimlane Experiments depicts all artifacts which relate to an experiment. When an experiment is repeated with different parameters the filtered images, the datacsv file and the resulting model are created under the new experiment name.

Ludwig Multi Experiments

Labeling

This activity assigns names or descriptors to components within an image. Any supervised ML algorithm requires labeled data for training. Successfully trained on the labeled data, the ML algorithm is afterwards enabled to identify and name similar components in images which do not have labels.

Labeling is usually done manually. A user marks regions within images and assigns labels therewith identifying objects in the image. Labeled data is not questioned and provides the algorithm with ground truth data.

Problem

BilderSkript delivers images from various lectures. We will need to repeat the labeling process to find appropriate features which let the ML based object identification algorithm generalize sufficiently well in order to identify objects in recordings from other lectures.

Tool Support

Labeling objects within images has a long history in the computer vision research community. For ML-based products there are a couple of online services available which support labeling activties distributed across a crowd workers, integrate semi-automated quality checks and other functions when it come to large-scale applications.

Quora list not-complete list of labeling tools and services; some are comercial, some are open-source. Check out https://www.quora.com/What-is-the-best-image-labeling-tool-for-object-detection.

For BilderSkript we found the following open tools tools attractive and discuss them briefly.

ImgLab to annotate Labels

BilderSkript utilizes the ImgLab for label annotations. The web based tool exports label annotation in multiple formats

  • dlib XML
  • dlib pts
  • Pascal VOC: standardized, however, the annotation data export only works for the current image. One needs to proceed to next image to export the next image’s annotation.
  • COCO: standardized, however, the exportet annotation text seems to be incomplete

The dlib XML format originates from the dlib toolkit containing various machine learning algorithms. Here is an example using the python API for using dlib tools.

The XML format is reduced to the very basic tags required. An example is shown below.

<?xml version='1.0' encoding='ISO-8859-1'?>
<?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?>
<dataset>
<name>dlib face detection dataset generated by ImgLab</name>
<comment>
    This dataset is manually crafted or adjusted using ImgLab web tool
    Check more detail on https://github.com/NaturalIntelligence/imglab
</comment>
<images>	<image file='IMG_20191219_094800_001_flat_pc_resize_mirror.png'>
		<box top='36' left='119' width='278' height='208'>
			<label>slide</label>
		</box>
		<box top='31' left='-2' width='206' height='319'>
			<label>person</label>
		</box>
	</image>
	<image file='IMG_20191219_094802_002_flat_pc_resize_mirror.png'>
		<box top='44' left='1' width='204' height='304'>
			<label>person</label>
		</box>
	</image>
</images></dataset>

BilderSkript Labels

We create rectangular shapes and define the following labels to annotate the shapes:

Development IDE - VS code

We use VS code as the BilderSkript’s development IDE. A docker image encapsulates the IDE and makes it accessible through the web browser. An developer’s software and programming effort focuses mostly on writing and editing shell scripts. As a consequence, the docker image provides appropriate extensions to support this actitity.

Web-based VS code

Start VS code

docker-compose up -d vscode

and point the web browser to http://localhost:8080.

The vscode container starts with the BilderSkript repo mounted and opened. It stores its state, e.g. last file open, extensions, other settings, in the vscode directory within the BilderSkript’s project dir.

Extensions

BilderSkript’s vscode docker image and repo comes pre-installed with following extentions especially for working with shell scripts.

The shellchecker runs on-the-fly and provides quick fixes for better coding quality of shell scripts. The shfmt tool reformats the shell script. Use shift + alt + f to reformat the script.

Git

Git version control within the vscode image misses user.name and user.email for git actions like commit. There is an extensive discussion on https://stackoverflow.com/questions/42318673/changing-the-git-user-inside-visual-studio-code. The main message is

Changing the git user inside Visual Studio Code, is not inside rather outside.

This would require a developer to run git config --global user.... commands on within vscode’s CLI.

BilderSkript’s docker-compose.yml file maps a .gitconfig file from the project dir onto the vscode’s container $HOME/.gitconfig. As a result, it makes git configuration data available to the vscode container.

By default, the .gitconfig is included in the project’s .gitignore file to avoid accidentially committing private git configuration details into the public repo.

Versioning and Repository Management

Pipeline Code and Data Versioning

The builder image takes care of versioning using [git]() and [dvc]().

Pipelines are commonly shared in various volumes.

Repository Management

The project’s repository is mounted as separate volume in the builder container. As a consequence, only makefiles in the project’s root dir are able to commit changes to the repository. Scripts in subdirectories are not aware that the project’s root is under git version control. The volumes are organized under the following two mount points.

Mount point: /BilderSkriptRepo

Scripts which perform pipeline and data versioning start under this mount point. It contains all files and directories from the project repository:

  • .git/
  • Dockerfiles/
  • docker_compose.yml
  • README.md

Mount point: /bilderskript

Scripts which actually execute pipelines start under this mount point. It contains the following directories:

  • docs/
  • docs_site/
  • images/
  • scripts/
  • src/

Documentation Generation

Docu Blog

The project’s documentation is a form of a single-page blog using the Hugo static webpage generator. It lists notes taken during the development and justifies design decisions. The builder docker image takes care of the generation utilizing the snakemake script, pipelines/doc.snakemake.

The blog utilizes the OneDly theme. This theme is excluded from git versioning because it originates from a separate repository. One include it as a git submodule.

cd docs_site
git add submodule https://github.com/cdeck3r/OneDly-Theme.git themes/onedly

The pipeline doc.snakemake generates the complete documentation calling

cd /bilderskript/pipelines
snakemake -s doc.snakemake

The docu blog is available under http://cdeck3r.com/BilderSkript/.

APPENDIX: Pipeline Interprocess Communication (IPC)

The distribution of pipelines across various docker containers requires interprocess communication between the docker containers. Let’s discuss the design for the BilderSkript pipelines.

IPC Control Problem

We have the builder container which controls the execution of all pipelines. A pipeline implements a sequence of processs executions, where some of the processes run in their respective containers. As an example consider the data preparation pipeline. The pipeline in the builder container initiates the image data preparation in the hugin container. Once the process in the hugin container completes the pipeline can continues with the next step.

ipc sequence diagram

Basically, the sequence diagram above depicts a synchronous remote procedure call (RPC). This is a classical function in many programming languages, e.g. Java RMI.

IPC control problem formulation:

Implement a synchronous RPC to start processes across containers and that seamlessly integrates into pipelined workflows.

Contraints / requirements:

  • low effort for additional network setup
  • low effort for access and securing the access control
  • script based, preferrably in bash, as it makes it easy to integrate the RPC as shell command.

In the following sections, we discuss some IPC approaches for their qualification as synchronous RPC.

File-based IPC

A very simple, but effective approach is the use of files for exchanging messages between container processes. The process in container A writes a message character string into a file on a shared volume from which the process in container B can read the file’s content. In this scenario, a single file serves as a buffer between those processes. However, the file access needs to be coordinated, otherwise the processes would overwrite each other’s messages. Additionally, to get informed about new messages, the processes need to poll the commonly shared file.

Client / Server IPC

Process within different containers may communicate as networked client and servers. Subsequently, some options:

  • REST or similar RPC-alike calls across the network
  • SSH remote script execution
  • Shared UNIX sockets

REST requests are easy to implement, addtionally, a webserver such as nginx or apache is required to receive the requests. Alternatively, one could implement a simple webserver using python and flask. SSH remote script execution needs accounts to login to the remote containers. One needs to enter a password before execution, which limits automation capabilities. Public key access can resolve this problem, but needs caution in handling public and private keys. Both options expose containers to the network. As a consequence, it requires network definition between containers and a firewall to block unauthorized access.

Finally, UNIX sockets appear to be an ideal approach to interprocess communication. Sockets are stored on a shared volume, therewith enabling a local access control. However, this type of interprocess communication is limited between containers on the same host.

Message Queues and Brokers

A message broker maintains a message queue (MQ), where distirbuted processes can register themselve to exchange messages without directly knowing each other.

Message brokers are a prefered way for decoupled distributed processes, in particular for long running processes executed asynchronously. However, the setup and managment of a MQ takes effort. Processes need to agree on the message format and containers need to reach the broker’s queue via the network.

Databases for IPC

Databases store information, which are accessible by clients. As such, this is related to the file-based IPC. Process information is organized in tables and table attributes store process states, e.g. process execution start or whether a process successfully completed.

Similar to file-based IPC, processes are required to poll the database to receive updates on the process state. Addtionally, the setup and maintenance of even simple DBMS is significant. A simple file based database, e.g. sqlite, is an attractive solution for this last issue. However, write and read access must be coordinated to avoid (orphaned) file locks.

Final Design Decision

As a result of the discussion, BilderSkript implements an interprocess communication using shared UNIX sockets. It combines the charm of file based approaches with the advantage of bi-directional communication. Tool support for scripting is very mature. The next section provides more details how this approach implemented.

Limitation. The use of UNIX limits IPC between containers on the same host. To overcome one may use socat to forward the UNIX socket to a TCP one. See this stackoverflow post to get an idea how this can be implemented. However, this would re-open the discussion on network access issues, but enables a migration path to evolve the local host approach to distributed network approach.

Further Ressources:

There are a couple of ressources to follow up on these issues.

An interesting op-ed from Bozho’s tech blog is You Probably Don’t Need a Message Queue. It inspired some of the disussion above.

APPENDIX: IPC via Shared UNIX Sockets

This is a technical design description on the usage of shared UNIX socket for docker container IPC. Both containers connect to a socket stored on a shared volume, e.g. under the mount point ipc. They exchange simple control messages to run scripts and read and write data from the shared volume.

ipc socket between docker containers

The approach utilizes

  • socat for creating sockets and socket communication
  • ss to check for the presence of the listening socket.

It follows a classical client / server approach where the server initializes the socket and waits for the client to start the communication. The following sequence diagrams depicts the interaction over the socket.

ipc socket sequence diagram

Here are the script calls for the server and client.

Server:

./ipc_socket_server.sh <socket name> <path/to/server_app.sh>

Client:

./ipc_socket_client.sh <socket name> <path/to/client_app.sh>

Cross Container IPC Example

We illustrate the container IPC between the builder container and the hugin container. The hugin container processes all images taken with the 360 camera and converts the fisheye lens images in regular ones using a equirectangular projection. The server script runs on the hugin container while the client script runs on the builder container. The socket between both resides on a shared volume. It starts by the snakemake file on the builder calling the hugin.sh script.

ipc socket communication between builder and hugin

Ressources:

A brief tutorial on socat is the Socat Cheatsheet from Travis Clarke.