Creating a Vertica image
This tutorial creates a Dockerfile that is compatible with Vertica versions 23.4 and lower.
Vertica on Kubernetes deploys an Eon Mode database in a Kubernetes StatefulSet. The Vertica server image is optimized for Kubernetes, using the minimum tools and libraries requried to containerize Vertica.
This tutorial describes the components of our minimal Vertica image so you can build a custom Vertica image for development or production purposes. The Dockerfile reduces image size with a multistage build, and orders instructions for the most efficient cache and build.
For additional guidance, refer to the Dockerfile hosted in the vertica-kubernetes GitHub repository.
Prerequisites
- Physical or virtual machine running x86 Linux
- Docker engine
make
build tool- Vertica 11.0.1 or higher RPM
- Local clone of the vertica-kubernetes GitHub repository
To build a container image with the GitHub repo, you must store the Vertica RPM in your local clone repo, in the docker-vertica/packages
directory, with the name vertica-x86_64.RHEL6.latest.rpm
.
Security Considerations
Vertica recommends the following open-source security scanners to protect against malware, and identify critical vulnerabilities that prevent unauthorized users from accessing sensitive information in your container runtime:
- Anchore Engine: A static analysis and policy-based compliance tool that evaluates images with user-defined security policies. Anchore identifies vulnerabilities and returns links to associated NIST reports.
- Trivy: A vulnerability scanner that you can add to your continuous integration (CI) toolchain.
For additional container file guidance, refer to Best practices for writing Dockerfiles.
Building
The command to build the image is a Makefile
target in the GitHub repository. Use the following command to initiate the build:
$ make docker-build-vertica VERTICA_IMG=<name of image> MINIMAL_VERTICA_IMG=<YES|NO>
Explanation and rationale about the contents of the Dockerfile is in the subsequent section.
Dockerfile Specifics
Multistage Build
Vertica uses a multistage build to reduce the size of the final Docker image. Each RUN
command adds a new layer to the Docker image. Each layer performs work using the previous layer, resulting in leftover build artifacts that increase the image size. Using a multistage build takes the final result of the first stage and use it as a single layer in the second stage, removing any unnecessary artifacts created by intermediary layers during any previous stages.
Setting Global ARG Variables
Before the build stages begin, we must define ARG variables that are scoped globally across the two stages. To make an ARG variable available in a particular stage, you must define it after the FROM tag.
The Vertica image uses two different operating systems during the entire build process:
- CentOS 7 as the base image for the builder stage
- Ubuntu as the base image for the
final image
Assign your operating system selection to the
BUILD_OS_VERSION
orBASE_OS_VERSION
variables:
ARG BASE_OS_VERSION="lunar"
ARG BUILDER_OS_VERSION="stream8"
Both versions that you select must correspond to a docker tag for the OS images. If a tag is one that is overridden then you must do a docker pull
prior to building the image so that you use the latest one.
When the MINIMAL
argument is set to YES
, the Dockerfile builds a smaller image. The smaller image omits large packages like Tensorflow and the Java runtime, which is required only to run Java UDx’s. This will result in over 300MB (uncompressed) of savings. By default, we build the full image.
ARG MINIMAL=""
The NO_KEYS
argument is optional. In some circumstances, you might want to manage the SSH keys that authenticate connections to the Vertica server container. When you set this argument to YES
, the Dockerfile requires user-provided SSH keys:
ARG NO_KEYS=""
We use s6 as the init program. This argument allows you to choose the version of that program. This version refers to one of the GitHub releases on the s6 GitHub repository.
ARG S6_OVERLAY_VERSION=3.1.2.1
First Stage
The first stage is named builder. The builder stage generates the /opt/vertica directory structure by downloading the required packages and dependencies, then running the Vertica installer. The FROM
command sets the base image for the builder and initiates the build of that stage. The BUILDER_OS_VERSION
was previously selected as the image version to use for this stage. The following FROM
command assigns the name builder to the first stage in the multistage build:
FROM quay.io/centos/centos:${BUILDER_OS_VERSION} as builder
Setting ARG Variables
Container files use the ARG
instruction to define build process variables. VERTICA_RPM stores the name of the Vertica RPM file. You must store the RPM in the /packages
directory to build a container image. If you do not have a Vertica license, use the free trial Community Edition RPM.
ARG VERTICA_RPM="vertica-x86_64.RHEL6.latest.rpm"
The MINIMAL
and NO_KEYS
arguments are already globally defined–the following lines makes them available in this stage:
ARG MINIMAL
ARG NO_KEYS
The next two variables define the default UID and GID of the dbadmin user account in the container:
ARG DBADMIN_GID=5000
ARG DBADMIN_UID=5000
Adding Files
The COPY
instruction copies files from the host filesystem into the container filesystem.
The following COPY
instruction copies your Vertica RPM to the container’s /tmp
folder. This is used to install Vertica in the container:
COPY ./packages/$VERTICA_RPM /tmp/
The following COPY
instructions add bash scripts that clean up your image and reduce its size. They are available in the vertica-kubernetes/docker-vertica/packages
folder:
- cleanup.sh strips debugging symbols from the libraries in
/opt/vertica/packages
directory, decreasing the image size. If you setARG MINIMAL=YES
, this script removes any packages that are not installed automatically, such as Tensorflow. - package-checksum-patcher.py patches the library installers to use new checksums that cleanup.sh created when removing the debugging symbols.
COPY ./packages/cleanup.sh /tmp/
COPY ./packages/package-checksum-patcher.py /tmp/
The following COPY
instructions add config files for sshd and ssh. These config files are used to ensure all environment variables are passed and accepted from the ssh client to the ssh server. This is needed so that when we start vertica, environment variables that are set in the pod are picked up by the server.
COPY ./packages/10-vertica-sshd.conf /etc/ssh/sshd_config.d/10-vertica-sshd.conf
COPY ./packages/10-vertica-ssh.conf /etc/ssh/ssh_config.d/10-vertica-ssh.conf
Installing Dependencies
This section incrementally builds a single RUN
instruction. The RUN
instruction executes Bash commands that persist in your container.
Each RUN
instruction adds a layer to the final image. To limit the number of RUN
instructions, use the Bash && operator to chain multiple RUN
commands into a single command. To chain commands that span multiple lines into a single command, enter the backslash ( \ ) character at the end of the line.
Set up the shell
The following command ensures that the build fails if any of the subsequent RUN
commands fail:
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
Begin the RUN Instruction
Add RUN set -x
to log each command to the console as it is executed:
RUN set -x \
Indent the following commands 2 spaces further than the RUN set -x \
command.
Update the Packages
To begin, update all packages:
&& yum -q -y update \
Vertica and Admintools Required Packages
Vertica and Admintools require the following packages to function properly:
&& yum install -y \
cronie \
dialog \
glibc \
glibc-langpack-en \
iproute \
openssh-server \
openssh-clients \
openssl \
sudo \
which \
zlib-devel \
Configure the dbadmin Role and Group
Create the required verticadba group and add the dbadmin user:
&& /usr/sbin/groupadd -r verticadba --gid ${DBADMIN_GID} \
&& /usr/sbin/useradd -r -m -s /bin/bash -g verticadba --uid ${DBADMIN_UID} dbadmin \
Install the Locally-Sourced RPM
Install the RPM from the docker-vertica/packages
directory in the container /tmp
directory:
&& yum localinstall -q -y /tmp/${VERTICA_RPM} \
Run install_vertica Script
To prepare the Vertica environment, run the install_vertica script:
&& /opt/vertica/sbin/install_vertica \
--accept-eula \
--debug \
--dba-user-password-disabled \
--failure-threshold NONE \
--license CE \
--hosts 127.0.0.1 \
--no-system-configuration \
--ignore-install-config \
-U \
--data-dir /home/dbadmin \
Add License Files
If you used the Community Edition license, create a directory to install the Community Edition license key:
&& mkdir -p /home/dbadmin/licensing/ce \
&& cp -r /opt/vertica/config/licensing/* /home/dbadmin/licensing/ce/ \
Configure logrotate to simplify log file administration:
&& mkdir -p /home/dbadmin/logrotate \
&& cp -r /opt/vertica/config/logrotate /home/dbadmin/logrotate/ \
&& cp /opt/vertica/config/logrotate_base.conf /home/dbadmin/logrotate/ \
Provide the dbadmin user ownership of the Vertica files:
&& chown -R dbadmin:verticadba /opt/vertica \
Clean Up Install Files
Run the cleanup.sh
script to reduce the size of the final image:
&& rm -rf /opt/vertica/lib64 \
&& yum clean all \
&& sh /tmp/cleanup.sh
Prepare the static ssh keys
We use static SSH key for the dbadmin id. This is required so that if the environment runs multiple versions of the image, then all nodes can communicate through SSH.
COPY dbadmin/.ssh /home/dbadmin/.ssh
Configure Container Network and Access Privileges
Begin the RUN Instruction
Add RUN set -x
to log each command to the console as it is executed:
RUN set -x \
Copy SSH Keys
The following commands copy the static SSH key to use for root and ensures all keys have the proper permissions:
&& mkdir -p /root/.ssh \
&& if [[ ${NO_KEYS^^} != "YES" ]] ; then \
cp -r /home/dbadmin/.ssh /root; \
chmod 700 /root/.ssh; \
chmod 600 /root/.ssh/*; \
fi \
Ensure proper ownership and permissions
Ensure that everything under /home/dbadmin has the correct ownership and the ssh config files have the correct permissions:
&& chown -R dbadmin:verticadba /home/dbadmin/ \
&& chmod go-w /etc/ssh/sshd_config.d/* /etc/ssh/ssh_config.d/* \
&& if [[ ${NO_KEYS^^} == "YES" ]] ; then \
rm -rf /home/dbadmin/.ssh/*; \
fi
Second Stage
The second stage prepares the OS for the final image. It updates packages to address all known security vulnerabilities. This stage hands off to the third and final stage to remove any intermediate files that are needed for the package update.
The beginning of the second stage is indicated by a row of # characters in the Dockerfile.
Choosing Your Operating System
As previously mentioned, the second stage uses Ubuntu as the operating system. You can set the OS version with the BASE_OS_VERSION
variable that you set earlier:
FROM ubuntu:${BASE_OS_VERSION}
Third Stage
The third and final build stage:
- copies only the necessary build artifacts from the first two stages to reduce the number of layers in the final image.
- creates environment variables.
- exposes ports for networking.
- adds metadata to the image.
Set Up Stage
Start with an empty image called scratch
and copy everything from the second stage:
FROM scratch
COPY --from=initial / /
This removes any intermediate layers that were created to update the OS packages.
Additional ARG Variables
Reuse the ARG variables from the first stage that define Vertica license and user information:
ARG DBADMIN_GID=5000
ARG DBADMIN_UID=5000
In the previous example:
- DBADMIN_GID is the default GID for the dbadmin account in the container.
- DBADMIN_UID is the default UID for the dbadmin account in the container.
Select the Java runtime version to install. This must correspond with a package name found in the Ubuntu distribution:
ARG JRE_PKG=openjdk-8-jre-headless
Inherit the arguments in the previous build stage:
ARG MINIMAL
ARG S6_OVERLAY_VERSION
COPY Artifacts from the First Stage
The COPY
command adds files into the image as a new layer. The --from=builder
COPY --from=builder /opt/vertica /opt/vertica
COPY --from=builder --chown=$DBADMIN_UID:$DBADMIN_GID /home/dbadmin /home/dbadmin
COPY --from=builder /root/.ssh /root/.ssh
COPY --from=builder /var/spool/cron /var/spool/cron
COPY --from=builder /etc/ssh/sshd_config.d/* /etc/ssh/sshd_config.d/
COPY --from=builder /etc/ssh/ssh_config.d/* /etc/ssh/ssh_config.d/
Setting Environment Variables
Values set with the ENV
instruction set environment variables available in the running container:
ENV PATH "$PATH:/opt/vertica/bin:/opt/vertica/sbin"
ENV DEBIAN_FRONTEND noninteractive
In the previous example:
- PATH sets the $PATH in the container to include the Vertica binaries and system binaries.
- DEBIAN_FRONTEND set to
noninteractive
ensures that there is zero interaction while installing or upgrading the system withapt
.
Install Daemon Scripts
Because the container does not run systemd by default, we provide functions to enable the vertica_agent
script to function properly:
ADD ./packages/init.d.functions /etc/rc.d/init.d/functions
Install init program
The init program that we use in the container is called s6. It is like systemd, but is designed for containers. It behaves like a true init program (PID 1): reaping zombie processes, passing signals down to child process, and restarting long-running services. Both cron and sshd are setup as long-running services. If any of those two services stop running, s6 restarts them.
The following commands copies over scripts and binaries needed to run s6:
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz /tmp
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-x86_64.tar.xz /tmp
The following command copies our custom config for s6 so that sshd and cron are created as long-running services:
COPY s6-rc.d/ /etc/s6-overlay/s6-rc.d/
Installing Dependencies
This section incrementally builds a single RUN
instruction. The RUN
instruction executes Bash commands that persist in your container.
Each RUN
instruction adds a layer to the final image. To limit the number of RUN
instructions, use the Bash && operator to chain multiple RUN
commands into a single command.
Many of the following commands are similar to those added in the previous build stage.
Setup the SHELL
Setup the shell so that commands fail if a command flowing through a |
fails.
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
Begin the RUN Instruction
Add RUN set -x
to log each command to the console as it is executed:
RUN set -x \
Be sure to indent all of the following commands 2 spaces further than the RUN set -x \
command.
Update the Packages
Update the package cache so that you can install packages:
&& apt-get -y update \
Vertica and Admintools Required Packages
Vertica and Admintools require the following packages to function properly. There are also some additional packages included to make it easier when running kubectl exec
for the container.
&& apt-get install -y --no-install-recommends \
ca-certificates \
cron \
dialog \
gdb \
iproute2 \
krb5-user \
less \
libkeyutils1\
libz-dev \
locales \
logrotate \
ntp \
openssh-client \
openssh-server \
openssl \
procps \
sysstat \
sudo \
Install Java
Add the following only if you are building a minimal image:
&& if [[ ${MINIMAL^^} != "YES" ]] ; then \
apt-get install -y --no-install-recommends $JRE_PKG; \
fi \
Install vim
Add the Vim text editor for debugging purposes. The vim
package can be old, and some security scanners might find vulnerabilities in it. For this reason, it is not included it in the NO_KEYS image, which should be used in environments with strict security requirements.
&& if [[ ${NO_KEYS^^} != "YES" ]] ; then \
apt-get install -y --no-install-recommends vim-tiny; \
fi \
Cleanup package manager
Remove any cached data brought in from the package manager:
&& apt-get clean \
&& apt-get autoremove \
&& rm -rf /var/lib/apt/lists/* \
Setup the locale
Make the en_US.UTF-8
locale so that Vertica will use utf-8 by default:
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8 \
SSH Setup
This ensures the ssh daemon can start:
&& mkdir -p /run/sshd \
&& ssh-keygen -q -A \
Configure the dbadmin Role and Group
Create the required verticadba group and add the dbadmin user:
&& /usr/sbin/groupadd -r verticadba --gid ${DBADMIN_GID} \
&& /usr/sbin/useradd -r -m -s /bin/bash -g verticadba --uid ${DBADMIN_UID} dbadmin \
Allow Passwordless sudo Access for dbadmin
&& echo "dbadmin ALL=(ALL) NOPASSWD: ALL" | tee -a /etc/sudoers \
Setup limits for dbadmin
&& echo "dbadmin - nofile 65536" >> /etc/security/limits.conf \
Setup Java
We set the JAVA_HOME
environment variable if we included Java in this image. This is used by Vertica to know where to find the Java runtime:
&& if $MINIMAL != "YES" && $MINIMAL != "yes" ; then \
echo "JAVA_HOME=/usr" >> /etc/environment; \
fi \
Set Python Path
This step allows you to call Python from anywhere in the system. This is only required to allow us to run the UDx samples, as some samples use a Python script to generate data for ingestion:
&& update-alternatives --install /usr/bin/python python /opt/vertica/oss/python3/bin/python3 1 \
Make cron a setuid program
This step changes cron so that it’s setuid. This is done so that s6 doesn’t t have to run sudo cron ...
to start it.
&& chmod u+s /usr/sbin/cron \
Unpack s6
We copied s6 tar files in an earlier step. This will extract them into the root of the file system and delete the old host SSH keys:
&& tar -C / -Jxpf /tmp/s6-overlay-x86_64.tar.xz \
&& tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz \
&& rm -rf /etc/ssh/ssh_host*
The Entrypoint Script
The entrypoint script is what executes to create a container from your image. We call the s6 init program and let it supervise the start of other processes:
ENTRYPOINT ["/init"]
Exposing Ports
Expose port 5433 for Vertica, 8443 for Vertica’s HTTP server, and 5444 for Vertica’s agent:
EXPOSE 5433
EXPOSE 8443
EXPOSE 5444
Configuring Image Access
Set the default user that runs the image to dbadmin:
USER dbadmin
Adding Labels to the Image
Labels enable you to add metadata to your image, which is helpful when storing images in repositories and tracking build information. Vertica uses the following LABELS
:
LABEL os-family="ubuntu"
LABEL image-name="vertica_k8s"
LABEL maintainer="K8s Team"
LABEL org.opencontainers.image.source=[https://github.com/vertica/vertica-kubernetes/tree/main/docker-vertica](https://github.com/vertica/vertica-kubernetes/tree/main/docker-vertica) \
org.opencontainers.image.title='Vertica Server' \
org.opencontainers.image.description='Runs the Vertica server that is optimized for use with the VerticaDB operator' \
org.opencontainers.image.url=[https://github.com/vertica/vertica-kubernetes/](https://github.com/vertica/vertica-kubernetes/) \
org.opencontainers.image.documentation=[https://www.vertica.com/docs/latest/HTML/Content/Authoring/Containers/ContainerizedVertica.htm](https://www.vertica.com/docs/latest/HTML/Content/Authoring/Containers/ContainerizedVertica.htm)