What Happens During a Drone Build in K8S

Before discussing this issue, let's look at the structure of drone.

Structure

drone consists of 3 main parts, namely drone-server, drone-controller and drone-agent.

drone-server

As the name suggests, drone-server is the server side of drone. It starts an http service to handle various requests, such as webhooks triggered by every github push or other operations, or every request from drone-web-ui.

drone-controller

The role of the controller is to initialize pipeline information. It defines the functions for each step of the pipeline before execution, after execution, and for obtaining and writing execution logs, and ensures that each pipeline can execute steps in order.

drone-agent

The role of drone-agent in drone can be simply understood as the role of kubelet in k8s. Since this article mainly discusses the execution process of drone in k8s, in k8s, the execution of drone does not depend on drone-agent, so this article will not cover this component in detail.

Execution Process

server

After a commit is completed on github, github will send the information of this commit to /hook. After drone-server receives the request, it will parse this information into core.Hook and core.Repository.
Then it searches for the repository in drone's database according to the namespace and name of the repository. If it cannot be found or the project is not in active state, the build is directly terminated and an error message is returned. Otherwise, the subsequent task is handed over to trigger.

trigger

After receiving core.Hook and core.Repository, trigger will check whether there are fields like [ci skip] in the commit message to skip the execution of ci. If so, it ends directly.
Then trigger verifies whether repo and owner are valid, and whether the commit message is empty. If it is empty, trigger will call the relevant api to get the commit message of the last submission.
Next, trigger will request the build configuration from ConfigService. Usually, this is the content in .drone.yml. ConfigService can be extended through the DRONE_YAML_ENDPOINT environment variable. If not extended, it uses the default FileService, which calls the relevant github interface to get file data.
After getting the config, trigger will call ConvertService to convert the config, converting the config into yaml (because configuration files are not always yaml, they may be jsonnet or script or other formats). ConvertService currently supports jsonnet and starlark. Among them, starlark needs to be configured using an external extension, namely DRONE_CONVERT_PLUGIN_ENDPOINT.
After Convert, trigger will parse the yaml again. This is firstly to convert the yaml format of the old version of drone to the new format, and secondly because drone is compatible with gitlab-ci, this step can convert the configuration format of gitlab-ci to the configuration format of drone.
Next, trigger parses yaml into yaml.Manifest struct. After that, it calls ValidateService to verify config, core.Build, repo owner and repo. ValidateService is configured by the DRONE_VALIDATE_PLUGIN_ENDPOINT environment variable. If not present, this verification step will not be performed.
Next, trigger validates whether each pipeline in yaml.Manifest is valid. It checks for duplicate pipeline names, whether steps have self-dependency, whether there are missing dependencies, and permissions etc.
After each pipeline passes verification, trigger uses directed acyclic graph (DAG) to check if there are circular dependencies between pipelines, and checks which pipelines do not meet execution conditions.
At the same time, it checks whether each pipeline meets execution conditions. Including whether branch, event, action, ref, etc. are met.
After all the above verification conditions are passed, trigger updates the information in the database. Then each pipeline is built into a core.Stage. If a stage has no dependencies, its status will be set to Pending, which means it can be executed. If there are dependencies, status will be set to Waiting.
trigger creates build information in the database and sends the build status to github. At this point, we can see that little yellow dot in github.
Then trigger traverses each stage and schedules stages with status Pending.
Then sends the build information to the address configured by the environment variable DRONE_WEBHOOK_ENDPOINT. At this point, the work of trigger is over.

server-steps

controller

Because of the convenience brought by k8s, scheduling a stage only requires creating a job. Before creating the job, scheduler injects most of the environment variables in drone-server into the job, namely drone-job-stage.ID-randomString (because k8s has specifications for resource names (cannot start or end with . _ -), while other runtimes of drone do not have this requirement, so to comply with k8s naming conventions, drone uses random characters as resource names). When creating the job, for some reasons (mentioned later), drone also mounts a HostPath volume to the job, with path /tmp/drone. The image name of the job is drone-controller.

drone-controller initializes the external SecretService, which is configured by DRONE_SECRET_ENDPOINT.
Next, drone-controller initializes three registryServices, which are two external definitions (configured by DRONE_SECRET_ENDPOINT and DRONE_REGISTRY_ENDPOINT) and local file (path defined by DRONE_DOCKER_CONFIG).
drone-controller also initializes rpc client to communicate with drone-server.
Finally, the controller initializes the k8s engine. At this point, the initialization of the controller is complete, and the subsequent work is handed over to the internal component runner to execute.

runner

runner first gets the detailed information of the stage from drone-server according to the id of the stage.
Then gets the token needed to clone the repo according to the obtained repo.ID. After receiving the request, drone-server first verifies the repo and user. After passing, it obtains the token from github for pulling the project.
Then runner checks the status of the build. If it is not killed or skipped, it executes the build.
After verification is complete, runner parses the format of yaml again. This step is the same as the step executed in trigger.
Afterwards, runner replaces all data inside ${} in yaml with corresponding environment variables.
Then runner gets the detailed information of the pipeline it needs to execute from yaml according to the stage name (because yaml often contains multiple pipelines), and then performs a lint on its own pipeline information. This lint is the same as the operation in step 7 in trigger, aimed at ensuring the validity of metadata.
Next, runner starts setting a series of transform functions, including registry, secret, volume etc. These functions inject corresponding resources into Spec in the subsequent Compile. For example, secret function gets the corresponding secret and adds it to spec.
When the above operations are completed, runner calls the compiler module to start compiling the pipeline.

compiler

At the beginning of Compile, compiler first confirms whether all steps of the pipeline are serial (if there are dependencies in the step, it is not serial). Then it mounts a working directory for the pipeline, that is, injects a volume into spec, which is EmptyDir.
Then injects all volumes defined in yaml into spec.
compiler checks if the pipeline needs to clone repo. If the pipeline does not define

clone:
  disable: true

, compiler injects clone-step into spec. Compiler initializes the step information such as step name, image, mount workspace. 4. After handling clone step, compiler handles all Services in the pipeline. Each Service is also converted into a step and injected into spec. But unlike ordinary steps, service is set to run independently, in other words, they do not depend on any step. Similarly, compiler performs initialization work similar to clone step for each service-related step. 5. Next is handling different steps. Handling ordinary steps is divided into two cases. One is that build configuration is used in the step. Using this configuration, drone automatically packages the repo during the execution of this step. Because docker needs to be used during the packaging process, when handling this step, docker.sock file needs to be additionally mounted to the container. The other case is handled according to the normal process.

Normal process (including clone, services and steps): 1. Copy everything copyable from yaml data to spec 2. Inject volumes configured in yaml into spec 3. Inject envs configured in yaml into spec 4. Inject configuration in setting in yaml as environment variable "PLUGIN_:\"+key: value into spec (some env and setting values may be from_secret, here secret is injected for spec) 5. Convert command defined in yaml to file, path is "/usr/drone/bin/init" and inject into spec (only need to run this script later when running)

Finally, compiler executes all previously defined transform functions, injecting docker auth, environment variables from controller and DRONE_RUNNER_* definition, network rules and secrets obtained from SecretService into spec. At this point, all information of spec has been generated.

hook

Next, runner sets the status of all steps in the stage to Pending and saves them to the execution list, and initializes runtime.Hook. In hook, it defines which steps need to be executed before and after executing each step. Before executing each step, a streamer is created to receive logs, and update the status of the step in the database, then push the repo information to the bound client through long connection. After each step execution, updates information in the database, pushes events, and deletes the previously created streamer. hook also defines write log function so that obtained logs are written to log library.
After defining hook, runner sets the status of the stage to Running. Before starting build, runner updates the status of the stage, saves each step to the database, and then updates the entire build information.

runtime

After runner initializes runtime information, it calls runtime.Run to execute the build. This step is the real start of the build. runtime first calls the previously defined Before function to create streamer, then k8s creates a namespace with a random string as name. Next, all steps will be completed in this namespace. After creating namespace, create secret required for build, then create a configmap for information in command of each step.
After creation, runtime starts executing each step. runtime judges whether there are dependencies between steps. If not, they are executed one by one in order. If there are dependencies, run steps without dependencies first. After running the step, remove the step from the depend list of other steps that depend on it, and then proceed as above until all steps are run (this process is concurrent).
The execution of each step is actually creating a pod in k8s. The image of the pod is the image defined in each step in the set yaml. All pods run under the 👆 namespace. In order to ensure that each pod can share files, all pods need to be scheduled to the same machine and mount HostPath Volume under the same directory. And this machine is the machine where drone-job-stage.ID-randomString is scheduled. When each pod runs, runtime registers a callback function to listen to pod changes. If the status of the pod becomes running or succeed or failed, runtime goes to get the log of the pod and write the log into streamer. Finally, after the pod finishes running, runtime collects the exit status of the pod to determine whether it is normal exit or abnormal exit.
Wait until all pods are executed (or a pod fails execution), runtime first updates the status of relevant data in the database, then does cleanup work, and checks all pipelines in the current build. If a pipeline depends on other pipelines and other pipelines have been executed, then schedule that pipeline.
Until all pipelines in build complete scheduling, this build ends.

controller-steps