Posted: June 08, 2019 - Updated: April 10, 2024
This article is about docker and the technologies it is built on. Mostly theory here. For more practical things read that.
Before diving into the docker world, lets clarify basics about what Docker is:
The best description of what docker is: lightweight virtual machine.
It’s a common problem when developers create their software on their computer and so it can be run only on their computer. Years before some people uses fully-featured virtual machines to create isolated and clear environment for fighting that. Docker is about the same. Either host machine is a Windows PC, Linux laptop, dedicated server or virtual hosting, when the application run with docker, we can be sure the environment is the one we define.
There also exists a known “Matrix of hell” that show a wide variety of different systems and their different behavior on different hardware:
Also, there exists another problem: developers want to use the latest technologies like Haskel, Rust, Go, Microservices, Cassandra, Hadoop, etc. But server operations and security want to use the same tooling as they used before, something that is proven, works, and don’t require additional work to configure/secure.
And so docker allows combining wishes of both sides, it’s like an interaction contract.
Server operations have to care about only one thing. They have to support deploying containers. So they are satisfied. Developers are happy too. They can develop with whatever the fad of the day is and then just stick it into a container and throw it over the wall to operations.
So docker allows to solve this matrix in a very straightforward way:
This is really cool approach we have nowadays. But before everything was different. Hosting has a long and very complex history. And that has an impact on the docker. I’m pretty sure we must overview some basics of hosting history for understanding what requirements and problems exist prior to docker.
First sites and services were hosted on office computers. Well, you can even imagine how many problems there were with this approach. Staring from “I accidentally turned it off” to real power supply stability issues. After, the DNS-providers starts providing additional service for hosting, as add-on service. And it’s a good point, as DNS-servers anyway must work 24/7 with extreme stability, so the companies running that knew well how to achieve this stability for sites and services of their users. Quite fast the specialization took the place and so appears entire data centers with dedicated computers (dedicated servers). Still, the prices for that were not flexible and because of usually low utilization of the server per user, the same computers start to sell to many people, of course with some isolation level between user’s data. These isolation requirements cause appearing and evolution of virtual server software. Another possible scenario those days was to buy physical server for own usage and put it to data-center. But that was way too expensive.
The virtual servers are software that allows running another virtual/emulated computer as any regular application. And since computer is emulated, its environment is also emulated, and so running any programs in it will not harm your primary operating system(if you have one, as stepping a bit forward, I can say that there exists virtual servers that run directly on hardware and don’t require OS to be installed).
The hypervisor, also referred to as Virtual Machine Manager (VMM), is what enables virtualization (running several operating systems on one physical computer). It allows the host computer to share its resources between VMs(link). There exists two types of hypervisor: native(or bare metal) and hosted.
The native hypervisor is installed right on top of the underlying machine’s hardware (so, in this case, there is no host OS, there are only guest OS’s). This is usually installed on a machine on which the whole purpose was to run many virtual machines. The hypervisor here has its own device drivers and interact with hardware directly. Such hypervisors are faster, simpler, and hence more stable.
Examples: antsleOS, Xen, XCP-ng, Oracle VM Server, Microsoft Hyper-V, Xbox One system software, VMware ESX/ESXi and many more
This is a program(regular executable file) that is running on top of the operating system. This type of hypervisor is something like a “translator” that translates the guest operating system’s system calls into the host operating system’s system calls.
Examples: VirtualBox, Parallels Desktop for Mac, QEMU, VMware Workstation, VMware Player and many more
Pros:
The advantages of virtual servers are:
Cons:
Containerization is about creating an isolated virtual environment for processes without needed of full VM.
Containerization is a lightweight alternative to full machine virtualization that involves encapsulating an application in a container with its own operating environment. For easy understanding we can say, that container is just a native process that running in own environment, so its execution is not affecting host environment in any way.
Docker is an example of containerization technology.
The image describes the difference between a VM and Docker. Instead of a hypervisor with Guest OSes on top, Docker uses a Docker engine and containers on top.Let’s look more aspects:
Processes. A nice way of illustrating this difference is through listing the running processes. On the host running the VM there is only one process running VM itself even though there are many processes running inside the VM(emulated inside of single process). On the host running the Docker all the processes running are visible as they natively executed directly on host OS. They can be inspected and manipulated with normal commands like, ps
, and kill
.
Disk space. Also difference is in disk space usage. VM had to emulate full file system inside of VM and this usually took gigabytes of disk. And the size of a small container with busybox is just 2.5 MB.
Startup time. The startup time of a fast virtual machine is measured in minutes. The startup time of a container is often less than a second.
Chroot - first attempts in processes isolation. A chroot is just a command of Unix OS that changes the apparent root directory for the current running process and its children. A program that is run in such a modified environment cannot name (and therefore normally cannot access) files outside the designated directory tree.
The chroot system call was introduced during development of Unix in 1979, and added to BSD in 1982. An early use of the term “jail” as applied to chroot comes from Bill Cheswick creating a honeypot to monitor a cracker in 1991.
First known breaking out of chroot jail was published in 1999.
Docker is a result of the evolution of the concepts behind chroot. All containers of Docker on a given host run under the same kernel, with other resources isolated per container. Docker allows isolating a process at multiple levels through namespaces and utilities:
Utilizing capabilities above, docker provides:
Still there exists some disadvantages. First is security. Containers share the same hooks into the kernel, and that’s a problem because if there are any vulnerabilities in the kernel, then someone can get into other containers. Containers have not yet demonstrated that they can deliver the same secure boundaries that a VM still has. The next one is flexibility in OS choosing. There is no way to use windows apps under linux and visa versa. Also there even no way to use another kernel, rather than existed one. Yes, we can install an emulator of desired system in a container(like wine in linux and wsl in windows), but that’s will be another intermediate layer of abstraction.
Docker is implemented as a client-server system. Docker software consists of 2 separate programs, that is docker engine, also known as docker daemon (because it is, in fact, a daemon, running in the background) and docker client.
Docker engine is responsible for running processes in isolated environments. For each process, it generates a new Linux container, allocates a new filesystem for it, allocates a network interface, sets an IP for it, sets NAT for it and then runs processes inside it. It also manages such things as creating, removing images, fetching images from the registry of choice, creating, restarting, removing containers and many other things. Docker engine exposes the rest API which can be used to control the daemon.
Docker client is usually CLI app. Daemon is accessible via socket connection(TCP or UNIX) and simply provide controlling API. The client may, but does not have to, be on the same machine as the daemon.
Terminology:
The mentioned above “Union File system” built on top of “Copy-On-Write” principle and allows high reuse of file systems.
For example: if you build most of your applications as PHP site driven by Apache Http Server, the single layer with Apache server will be shared between all of your applications. And the files of each of those applications every time will be put on top of that base at runtime time.
A docker is a powerful tool that is created to solve many common developer’s problems and tasks. But before using it, nice to have some insights on how we came to it and how it is working inside, actually, this article about that. You may also want to look at the more practical part of using docker in my docker cookbook.