About
BilderSkript
BilderSkript is a compound created from the following words.
Bilder [ˈbɪldɐ], (German), images
Skript [skʁɪpt], (German), a written document or notes
BilderSkript automatically summarizes a lecture as a sequence of interesting images scenes.
Technically, it trains a deep neural net on objection recognition and extracts interesting scenes from a large sequence of still image recordings. Theses scenes compile to the BilderSkript
Project Definition
Motivation and Goal
A lecturer usually provides a lot of supplemental course material such as a written script, slides, reading material, exercises.
However, students visiting the course’s lecture experience an additional channel, which helps them to sort and weight the supplemental material. The way how a lecturer presents the content will let attendees realize the weight and importance of certain parts of the content. Hence, it supports the creation of a red thread throughout entire supplemental material.
In contrast to full lecture recording, the BilderSkript approach condenses the recording to find those parts within a lecture which contribute to the understanding of the supplemental material.
Primarily, BilderSkript aims at course attendees, however, it could also be valuable for remote attentees.
BilderSkript has the following goals
- re-create partially the course experience
- support attenting students to follow-up on the content
- support students who have not attended all lessions to catch-up on the course
- improve the efficiency when working with the supplemental course material
Objectives
There are a couple of activities to perform
- record lectures as a sequence of wide angle still images
- train a computer to identify objects within still images
- define a metric of interestingness from the sequence of recognized objects
- extract interesting images and compile a lecture notebook from which ultimately creates the BilderSkript
Research Questions
BilderSkript relies on widely available open-source ML software. However, there are a couple of interesting research questions on the way.
- How to automate image preprocessing? - Image recording activity does not happen in a controlled environment, i.e. lecture rooms change, the light varies, the camera positions are not fixed (resulting in varying field of views), varying colors of background, cloths etc.
- Can we successfully transfer other object identfication models to lecture recordings? Object identfication utilizes labeled data from one or few lectures for training. Since labeling is laborious, we are interested to transfer trained models from other areas and apply for object identfication in our lecture recordings.
Introductionary Example
BilderSkript takes a series of still images as input and compiles visual lecture notes as output. Below we illustrate a step-by-step walkthrough how the software processes the data.
Image recording: A 360 camera records the entire room. However, the lens towards the audience is covered to maintain privacy. The images have the typical distorted appearance due to the camera’s fisheye lenses. Nevertheless, it creates an approx. 200 degrees, wide-angle recording.
Image preparation: The BilderSkript projects fisheye images as equirectangular images to correct for the distortion.
Finally, the perspective control corrects the deformation of vertical and horizontal lines when the camera records the projection wall or blackboard from an inclinded position.
Object identfication: In this step BilderSkript identifies typical objects, such as blackboard, lecturer, video projection, within each image.
tbd. include image
Classification: In particular, if there is a blackboard, it is interesting to know whether the board is empty or not. The classifiers utilizes the identified blackboard from the previous step and assigns class labels, which correspond to the blackboard’s state. This step may filter images before it inputs them to the classifier.
Label | Original Image | Filtered Image |
---|---|---|
empty | ![]() |
![]() |
full | ![]() |
![]() |
Sequencing: Utilizing the object id, it transforms the set of images into a sequence of object compositions. A step consists of all identified objects in a single image. Please note, each step links back to the original image.
tbd. include image
Interesting sequences: This steps only operates on the sequence of identified objects. Based on a definition on interestingness, it quantifies the previously created sequences on this metric. The idea is that interestingness emerges from changes in the composition of identified objects.
tbd. include image
Compilation: By the degree of interestingness BilderSkript applies a threshold to extract the best ones. Once it has found an interesting sequence, it links each step within this sequence back to its image where it originates from. Finally, it compiles the scene from these images.
tbd. include image
System Design
This use case diagram depicts BilderSkript’s main services. Note, the shown use cases do not imply an execution order.
The engineer performs the system setup, configures and trains the system for the user to be beneficial.
Docker Container Toolchain
Docker Images and Volumes
Docker images contain the various software pipelines from the tool chain. The responsibilities within the overall system design motivate the system boundaries induced by the distribution of pipelines across docker images. The following table list the docker images and their respective pipeline functionalities.
Image | Pipeline | Notes |
---|---|---|
vscode | n/a | IDE VS code |
builder | Build | makefile, doc, versioning via git and dvc, mlflow exp. |
hugin | Data prep | image data preparation using hugin |
hugin-vnc | n/a | like hugin, but provides VNC server for GUI interaction |
mrcnn | ML training | object detection |
ludwig | ML training | classification |
cicd | Deployment pipeline | not yet implemented |
When started as docker containers, they run scripts and communicate with each other utilizing shared volumes on the filesystem. The image below illustrates the docker toolchain.
The most important volumes are:
- ${APP_ROOT}/ipc : used to store sockets for IPCs between containers
- ${APP_ROOT}/pipelines : contains all BilderSkript pipelines
- ${APP_ROOT}/images : data directory
- ${APP_ROOT}/src : stores source codes
- ${APP_ROOT}/scripts : scripts
- ${APP_ROOT}/ludwig : stores experiments, models and prediction
- ${APP_ROOT}/vscode : maintains the state of VS code
- ${APP_ROOT}/docs : github’s webserver serves the docu blog from this directory
- ${APP_ROOT}/docs_site : the docu blog’s source
By default ${APP_ROOT}
is set to /bilderskript
. For instance, one accesses the pipeline scripts under /bilderskript/pipelines
.
Build and Startup
Docker Compose creates all images, containers and volumes. The configuration is defined in the docker-compose.yml file
Building all images using the following command
docker-compose build
Start a container from builder
image and get a bash
shell. The default container name is bilderskript_builder_1
docker-compose up -d builder
docker exec -it bilderskript_builder_1 /bin/bash
Pipelines
Pipelines are the fundamental building blocks of the BilderSkript system.
Pipeline Definitions
The term pipeline is heavily used in machine learning (ML). It generally refers to a sequence of steps to run in order to perform transformations. There are various kinds of pipelines, e.g. data pipelines, machine learning pipelines, deployment pipelines and others. Depending on the pipeline type, it takes a a certain ressource type as input and produces output ressources by the application of the pipeline’s transformation steps. For instance, a data pipeline takes data files as an input and prepares them to be fed into a machine learning algorithm of the ML pipeline.
BilderSkript stores pipelines in the pipelines
volume usually accessed under /bilderscript/pipelines
. They are encoded as snakemake makefiles for version control.
Snakemake
Snakemake is a workflow management system to implement data analysis pipelines.
Snakemake compares the input and output ressources. These ressources are files. If the modification date of any of the input files is newer than the output file, snakemake runs the shell command. This behavior is encoded as rules, which transform input files into output files. All rules of a pipeline are defined in a snakefile. We list some important snakemake commands.
Run the snakemake pipeline
snakemake <pipeline name>
Generate a report
snakemake <pipeline name> --report <snakefile.html>
Generate a summary table displaying the current state of input and output files.
snakemake <pipeline name> --summary
Before the snakefile is run, snakemake generates a DAG which shows the dependencies. This command visualizes the DAG.
snakemake <pipeline name> --dag | dot -Tpng > <pipeline name>.png
BilderSkript Pipelines
The BilderSkript pipelines follow the guidelines of The Snakemake-Workflows project. There is a central Snakemake
file which includes the configuration and the concrete workflows for the data and ML pipelines.
The workflow’s design separates the pipeline specific parameters from the dataset specific ones. A separate file stores each parameter set
config.yaml
contains the pipeline specific parameters. This file exists only once and contains the parameters for all pipelines.dataset.csv
stores the parameters relevant for the dataset a pipeline processes. There may be several of these files, one for each class of data for a pipeline. There is no fixed schema. The file’s name depends on the content and the pipeline, which sources the file. A pipeline which processes various ML models may utilize amodels.csv
file to store the set of available models and additional parameters for each model.
The approach increases the flexibility to apply the pipelines to various datasets. The figure below depicts this separation
The pipelines are named after their snakefile. All pipelines files stay in the /bilderskript/pipelines
directory relative to the BilderSkript’s project dir.
doc.snakefile: describes the pipeline for documentation generations. You may view the pipeline’s report.
data_prep.snakefile: prepares the image files for the ML pipeline. It’s a complex pipeline because it utilizes interprocess communication (IPC) with the
hugin
container. You may view the pipeline’s report.data_prep.snakefile
requires parameters inconfig.yaml
:- datasets_idx: line in
datasets.csv
defining the data to process - out_dir: the directory where prep’ed images as a result of the pipeline execution are stored
The pipeline’s default behavior is started by the
data_prep.sh
script. The pipeline’s rule DAG is shown indata_prep.snakefile.png
- datasets_idx: line in
ludwig.snakefile: detects the blackboard’s state of writing, i.e. whether the writing on the blackboard fills out the blackboard completely, partially, or alternatively, the blackboard is empty.
The pipeline’s default behavior is started by the [
ludwig.sh
]
Run the following script to create reports for all BilderSkript pipelines
snakemake_report.sh
[data_prep.snakefile] Pipeline
The data preparation pipeline, data_prep.snakefile
, comprises of two phases
- Configure the pipeline’s parameters
config.yaml
defines pipeline specific parametersdatasets.csv
defines the data input and data-specific processing parameters
- Run the pipeline on the images from the lecture recording
1. Configuration
The following activity diagram describes the steps to configure the data preparation pipeline. Configuration is stored in config.yaml
and dataset.csv
.
Precondition:
builder
andhugin-vnc
container up and running- at least one image from lecture recording available
Postcondition:
- images directories set
.pto
files created- (optionally) crop specification defined
The parameter values from above need to be stored in the config.yaml
and dataset.csv
file. The activity diagram indicates to which file each parameter belongs to.
Commands
Spin up the hugin-vnc
container.
docker-compose up -d hugin-vnc
Afterwards, use a vnc client to get Hugin’s display and create the .pto
files. Modify the pipeline’s dataset.csv
and config.yaml
to tell the scripts where to look for the files. The activity diagram below displays the steps to create the files.
2. Run the Pipeline
Finally, the engineer starts the data prep pipeline and the pipeline processes the input images from the img_dir and places it in the out_dir.
Precondition:
builder
andhugin
container up and running.pto
files created- img_dir and out_dir image directories set
Parameter values from above are sourced from the the dataset.csv
file.
Postcondition:
- images processed placed in out_dir
Pipeline Start
The following commands start the pipeline. The activity diagram belows depicts the details of the pipeline run.
Spin up the builder
and hugin
container and get an interactive shell to the builder
container.
docker-compose up -d builder hugin
docker exec -it bilderskript_builder_1 /bin/bash
Within the builder
container run
cd BilderSkript/pipelines
./data_prep.sh
[ludwig.snakefile] Pipeline for Blackboard Classification
The ludwig.snakefile
pipeline takes blackboard images as input and assigns labels like full, partial, empty. It indicates the whether the writing on the blackboard fills out the blackboard completely, partially, or alternatively, the blackboard is empty.
Train the Blackboard Image Classifier
The image classifier detects the blackboard’s state of writing. It is an interesting BilderSkript context to know whether the writing on the blackboard fills out the blackboard completely, partially, or alternatively, the blackboard is empty.
Experiment
We learn a model, a classifier, which maps images to class labels, e.g. full, partial, empty.
- utilizes the Ludwig docker container
- runs from command line;
- script tests whether the container already runs and if appropriate spins it up
- utilizes comet.ml to store experiment results
- manually shutdown the container
For utilizing comet.ml you need to provide a .comet.env
file in the project’s directory. The file conatins the API key and the comet’s project name.
Walkthrough
Start an experiment from scripts/ludwig
.
STEP 1: Prepare the input dataset
Precondition:
- `.png’ images as input for training the model
- within the
<training image path>
directory the images are organized in subdirs, where subdir names are the class labels - experiment name, e.g. stats_lecture
Command:
./ludwig_experiment.sh datacsv <experiment name> <training image path>
Postcondition:
- directory created:
ludwig/experiements/<experiment name>
- directory contains
<experiment name>.csv
, which links images to class labels
STEP 2: Learn the model
Precondition: see above
Command:
./ludwig_experiment.sh experiment <experiment name> <training image path>
Postcondition:
- Results available in
ludwig/experiements/<experiment name>/results
- comet.ml website contains experiment’s results
STEP 3: Visualize and Shutdown
Precondition: the experiment name from above
Command:
./ludwig_experiment.sh visualize <experiment name>
./ludwig_experiment.sh shutdown <experiment name>
Postcondition:
- comet.ml website contains for addtional images from the experiment run
- the Ludwig container is removed.
Multi Experiments
Multi experiements are repeated experiements which varying parameters. Particularly, we vary the parameters for the image pre-processing.
Start a multi experiment from scripts/ludwig
. All directories created during an regular experiment are prefixed by exp<num>_
indicating the run number within the multi experiment.
Precondition:
- see initial preconditions of a single experiment
- edit
ludwig_multi_experiments.sh
for- training image path
- parameters to to loop through
Command:
./ludwig_multi_experiments.sh <experiment root name>
Postcondition:
- directories with results created:
ludwig/experiements/exp<num>_<experiment root name>
- comet.ml website contains all experiments’ results
Training Process
The following activity diagram illustrates the overall sequence of actions for running one or more experiments. The swimlane Experiments
depicts all artifacts which relate to an experiment. When an experiment is repeated with different parameters the filtered images, the datacsv file and the resulting model are created under the new experiment name.
Labeling
This activity assigns names or descriptors to components within an image. Any supervised ML algorithm requires labeled data for training. Successfully trained on the labeled data, the ML algorithm is afterwards enabled to identify and name similar components in images which do not have labels.
Labeling is usually done manually. A user marks regions within images and assigns labels therewith identifying objects in the image. Labeled data is not questioned and provides the algorithm with ground truth data.
Problem
BilderSkript delivers images from various lectures. We will need to repeat the labeling process to find appropriate features which let the ML based object identification algorithm generalize sufficiently well in order to identify objects in recordings from other lectures.
Tool Support
Labeling objects within images has a long history in the computer vision research community. For ML-based products there are a couple of online services available which support labeling activties distributed across a crowd workers, integrate semi-automated quality checks and other functions when it come to large-scale applications.
Quora list not-complete list of labeling tools and services; some are comercial, some are open-source. Check out https://www.quora.com/What-is-the-best-image-labeling-tool-for-object-detection.
For BilderSkript we found the following open tools tools attractive and discuss them briefly.
- ImgLab: web based tool; Online service for immediate use available under https://imglab.in
- LabelImg: the classic one, often mentioned on towardsdatascience.com. Google finds approx. 45 hits. It is a desktop tool.
- Alp’s Labeling Tool (ALT): more than just a labeling tool; it’s a desktop tool for Windows as well as Linux / Ubuntu.
- Computer Vision Annotation Tool (CVAT): web based online tool. Works with videos, too. Apart from the web GUI, there is also a REST API to programmatically access the tool.
ImgLab to annotate Labels
BilderSkript utilizes the ImgLab for label annotations. The web based tool exports label annotation in multiple formats
- dlib XML
- dlib pts
- Pascal VOC: standardized, however, the annotation data export only works for the current image. One needs to proceed to next image to export the next image’s annotation.
- COCO: standardized, however, the exportet annotation text seems to be incomplete
The dlib XML format originates from the dlib toolkit containing various machine learning algorithms. Here is an example using the python API for using dlib tools.
The XML format is reduced to the very basic tags required. An example is shown below.
<?xml version='1.0' encoding='ISO-8859-1'?>
<?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?>
<dataset>
<name>dlib face detection dataset generated by ImgLab</name>
<comment>
This dataset is manually crafted or adjusted using ImgLab web tool
Check more detail on https://github.com/NaturalIntelligence/imglab
</comment>
<images> <image file='IMG_20191219_094800_001_flat_pc_resize_mirror.png'>
<box top='36' left='119' width='278' height='208'>
<label>slide</label>
</box>
<box top='31' left='-2' width='206' height='319'>
<label>person</label>
</box>
</image>
<image file='IMG_20191219_094802_002_flat_pc_resize_mirror.png'>
<box top='44' left='1' width='204' height='304'>
<label>person</label>
</box>
</image>
</images></dataset>
BilderSkript Labels
We create rectangular shapes and define the following labels to annotate the shapes:
Development IDE - VS code
We use VS code as the BilderSkript’s development IDE. A docker image encapsulates the IDE and makes it accessible through the web browser. An developer’s software and programming effort focuses mostly on writing and editing shell scripts. As a consequence, the docker image provides appropriate extensions to support this actitity.
Web-based VS code
Start VS code
docker-compose up -d vscode
and point the web browser to http://localhost:8080.
The vscode container starts with the BilderSkript repo mounted and opened. It stores its state, e.g. last file open, extensions, other settings, in the vscode
directory within the BilderSkript’s project dir.
Extensions
BilderSkript’s vscode docker image and repo comes pre-installed with following extentions especially for working with shell scripts.
The shellchecker runs on-the-fly and provides quick fixes for better coding quality of shell scripts. The shfmt tool reformats the shell script. Use shift + alt + f
to reformat the script.
Git
Git version control within the vscode image misses user.name
and user.email
for git actions like commit. There is an extensive discussion on https://stackoverflow.com/questions/42318673/changing-the-git-user-inside-visual-studio-code. The main message is
Changing the git user inside Visual Studio Code, is not inside rather outside.
This would require a developer to run git config --global user....
commands on within vscode’s CLI.
BilderSkript’s docker-compose.yml
file maps a .gitconfig file from the project dir onto the vscode’s container $HOME/.gitconfig
. As a result, it makes git configuration data available to the vscode container.
By default, the .gitconfig
is included in the project’s .gitignore
file to avoid accidentially committing private git configuration details into the public repo.
Versioning and Repository Management
Pipeline Code and Data Versioning
The builder
image takes care of versioning using [git]() and [dvc]().
Pipelines are commonly shared in various volumes.
Repository Management
The project’s repository is mounted as separate volume in the builder
container. As a consequence, only makefiles in the project’s root dir are able to commit changes to the repository. Scripts in subdirectories are not aware that the project’s root is under git version control. The volumes are organized under the following two mount points.
Mount point: /BilderSkriptRepo
Scripts which perform pipeline and data versioning start under this mount point. It contains all files and directories from the project repository:
.git/
Dockerfiles/
- …
docker_compose.yml
README.md
Mount point: /bilderskript
Scripts which actually execute pipelines start under this mount point. It contains the following directories:
docs/
docs_site/
images/
scripts/
src/
Documentation Generation
Docu Blog
The project’s documentation is a form of a single-page blog using the Hugo static webpage generator. It lists notes taken during the development and justifies design decisions. The builder
docker image takes care of the generation utilizing the snakemake
script, pipelines/doc.snakemake
.
The blog utilizes the OneDly theme. This theme is excluded from git versioning because it originates from a separate repository. One include it as a git submodule.
cd docs_site
git add submodule https://github.com/cdeck3r/OneDly-Theme.git themes/onedly
The pipeline doc.snakemake
generates the complete documentation calling
cd /bilderskript/pipelines
snakemake -s doc.snakemake
The docu blog is available under http://cdeck3r.com/BilderSkript/.
APPENDIX: Pipeline Interprocess Communication (IPC)
The distribution of pipelines across various docker containers requires interprocess communication between the docker containers. Let’s discuss the design for the BilderSkript pipelines.
IPC Control Problem
We have the builder
container which controls the execution of all pipelines. A pipeline implements a sequence of processs executions, where some of the processes run in their respective containers. As an example consider the data preparation pipeline. The pipeline in the builder
container initiates the image data preparation in the hugin
container. Once the process in the hugin
container completes the pipeline can continues with the next step.
Basically, the sequence diagram above depicts a synchronous remote procedure call (RPC). This is a classical function in many programming languages, e.g. Java RMI.
IPC control problem formulation:
Implement a synchronous RPC to start processes across containers and that seamlessly integrates into pipelined workflows.
Contraints / requirements:
- low effort for additional network setup
- low effort for access and securing the access control
- script based, preferrably in bash, as it makes it easy to integrate the RPC as shell command.
In the following sections, we discuss some IPC approaches for their qualification as synchronous RPC.
File-based IPC
A very simple, but effective approach is the use of files for exchanging messages between container processes. The process in container A writes a message character string into a file on a shared volume from which the process in container B can read the file’s content. In this scenario, a single file serves as a buffer between those processes. However, the file access needs to be coordinated, otherwise the processes would overwrite each other’s messages. Additionally, to get informed about new messages, the processes need to poll the commonly shared file.
Client / Server IPC
Process within different containers may communicate as networked client and servers. Subsequently, some options:
- REST or similar RPC-alike calls across the network
- SSH remote script execution
- Shared UNIX sockets
REST requests are easy to implement, addtionally, a webserver such as nginx or apache is required to receive the requests. Alternatively, one could implement a simple webserver using python and flask. SSH remote script execution needs accounts to login to the remote containers. One needs to enter a password before execution, which limits automation capabilities. Public key access can resolve this problem, but needs caution in handling public and private keys. Both options expose containers to the network. As a consequence, it requires network definition between containers and a firewall to block unauthorized access.
Finally, UNIX sockets appear to be an ideal approach to interprocess communication. Sockets are stored on a shared volume, therewith enabling a local access control. However, this type of interprocess communication is limited between containers on the same host.
Message Queues and Brokers
A message broker maintains a message queue (MQ), where distirbuted processes can register themselve to exchange messages without directly knowing each other.
Message brokers are a prefered way for decoupled distributed processes, in particular for long running processes executed asynchronously. However, the setup and managment of a MQ takes effort. Processes need to agree on the message format and containers need to reach the broker’s queue via the network.
Databases for IPC
Databases store information, which are accessible by clients. As such, this is related to the file-based IPC. Process information is organized in tables and table attributes store process states, e.g. process execution start or whether a process successfully completed.
Similar to file-based IPC, processes are required to poll the database to receive updates on the process state. Addtionally, the setup and maintenance of even simple DBMS is significant. A simple file based database, e.g. sqlite, is an attractive solution for this last issue. However, write and read access must be coordinated to avoid (orphaned) file locks.
Final Design Decision
As a result of the discussion, BilderSkript implements an interprocess communication using shared UNIX sockets. It combines the charm of file based approaches with the advantage of bi-directional communication. Tool support for scripting is very mature. The next section provides more details how this approach implemented.
Limitation. The use of UNIX limits IPC between containers on the same host. To overcome one may use socat
to forward the UNIX socket to a TCP one. See this stackoverflow post to get an idea how this can be implemented. However, this would re-open the discussion on network access issues, but enables a migration path to evolve the local host approach to distributed network approach.
Further Ressources:
There are a couple of ressources to follow up on these issues.
- https://www.linode.com/docs/security/authentication/use-public-key-authentication-with-ssh/
- Dockerize an SSH service and ubuntu sshd
- sshpass and Docker secret
- Docker ssh considered evil
An interesting op-ed from Bozho’s tech blog is You Probably Don’t Need a Message Queue. It inspired some of the disussion above.
APPENDIX: IPC via Shared UNIX Sockets
This is a technical design description on the usage of shared UNIX socket for docker container IPC. Both containers connect to a socket stored on a shared volume, e.g. under the mount point ipc
. They exchange simple control messages to run scripts and read and write data from the shared volume.
The approach utilizes
socat
for creating sockets and socket communicationss
to check for the presence of the listening socket.
It follows a classical client / server approach where the server initializes the socket and waits for the client to start the communication. The following sequence diagrams depicts the interaction over the socket.
Here are the script calls for the server and client.
./ipc_socket_server.sh <socket name> <path/to/server_app.sh>
./ipc_socket_client.sh <socket name> <path/to/client_app.sh>
Cross Container IPC Example
We illustrate the container IPC between the builder
container and the hugin
container. The hugin
container processes all images taken with the 360 camera and converts the fisheye lens images in regular ones using a equirectangular projection. The server script runs on the hugin
container while the client script runs on the builder
container. The socket between both resides on a shared volume.
It starts by the snakemake file on the builder
calling the hugin.sh
script.
Ressources:
A brief tutorial on socat
is the Socat Cheatsheet from Travis Clarke.