Hosted Runners Debugging Guide
Debugging a hosted runner involves two main steps:
- Verifying a runner-manager’s ability to spin up ephemeral VMs.
- Ensuring the ephemeral VMs can connect to GitLab.com or the CI Gateway.
Quick Overview
Section titled “Quick Overview”For a visual walkthrough, check out this video: Hosted Runners Testing.
Part 1: Testing Ephemeral VM Creation
Section titled “Part 1: Testing Ephemeral VM Creation”The most challenging aspect of testing runner-managers is composing the docker-machine command with all the required custom options. These options vary by manager, so we’ve created handy scripts to automate this process.
Using generate-create-machine.sh
Section titled “Using generate-create-machine.sh”This script is typically located in the /tmp folder of runner-manager VMs. It generates another script based on the configurations in the /etc/gitlab-runner/config.toml file of each runner-manager.
Steps to Run
Section titled “Steps to Run”$ sudo su# cd /tmp# export VM_MACHINE=test1# less create-machine.sh # Review the generated script# ./create-machine.sh # Run the scriptExample Output of a Successful Run
Section titled “Example Output of a Successful Run”tmp# ./create-machine.shRunning pre-create checks...(test1) Check that the project exists(test1) Check if the instance already existsCreating machine...(test1) Generating SSH Key(test1) Creating host...(test1) Opening firewall ports(test1) Creating instance(test1) Waiting for Instance(test1) Uploading SSH KeyWaiting for machine to be running, this may take a few minutes...Detecting operating system of created instance...Waiting for SSH to be available...Detecting the provisioner...Provisioning with cos...Copying certs to the local machine directory...Copying certs to the remote machine...Setting Docker configuration on the remote daemon...Checking connection to Docker...Docker is up and running!
To connect your Docker Client to the Docker Engine running on this VM, run: docker-machine env test1Part 2: Testing Ephemeral VM Connectivity
Section titled “Part 2: Testing Ephemeral VM Connectivity”Once the ephemeral VM is created successfully, you can verify its connectivity.
Steps to Test Connectivity
Section titled “Steps to Test Connectivity”# docker-machine ssh test1cos@test1 ~ $ curl -IL https://us-east1-c.ci-gateway.int.gprd.gitlab.net:8989cos@test1 ~ $ curl -IL https://gitlab.comExpected Outcome
Section titled “Expected Outcome”-
A successful call will return a
200status code. -
If any command times out, it may indicate a network misconfiguration.
Part 3: Connecting to a running job
Section titled “Part 3: Connecting to a running job”If there is a problem in an existing job that is still running, it is possible to connect to it directly. Note that this should only be done for our own workloads.
Get the runner-manager
Section titled “Get the runner-manager”This is visible on the web page for the job logs. Either on the top right, or in the logs themselves.
It will look something like this:
Running with gitlab-runner 18.4.0~pre.115.gb2218bab (b2218bab) on blue-4.saas-linux-small-amd64.runners-manager.gitlab.com/default J2nyww-sK, system ID: s_cf1798852952This needs to be translated into the actual hostname, which in this case would be:
runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internalThis mapping is implicit, but can be discovered via:
host="$(cd ~/code/chef-repo && knife node list | grep -vE '^INFO:' | fzf -0 -1 | awk -F: '{print $1}')"if [[ -n $host && "$hostname" != *".internal" ]]then host="$(cd ~/code/chef-repo && knife node show "$host" | grep -vE '^INFO:' | yq '.FQDN')"fiOr via:
knife search 'roles:runners-manager' --attribute 'fqdn' --attribute 'cookbook-gitlab-runner.runners.default.global.name' --format json | grep -vE '^INFO:' | jq -r '.rows[].[]|[.fqdn, ."cookbook-gitlab-runner.runners.default.global.name"]|@tsv' | sort -nGet runner (job VM) and container
Section titled “Get runner (job VM) and container”This is also in the job logs and looks like this:
Running on runner-j2nyww-sk-project-75050198-concurrent-0 via runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff...The second part is the job VM, the first part is the container name on that job VM.
SSH into the job
Section titled “SSH into the job”Now we have all the pieces to get a shell inside of the job.
First, SSH into the runner-manager:
ssh runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internalNext up, SSH into the job VM. We do this through docker-machine.
iwiedler@runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internal:~# sudo -H docker-machine ssh runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceffThis is a containerd-based container-optimized OS. It is possible to run a toolbox:
cos@runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff ~ $ toolboxAs well as docker commands. We can now get a shell inside of the job container:
cos@runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff ~ $ docker exec -it runner-j2nyww-sk-project-75050198-concurrent-0-d0c939fb2a356dee-predefined bashDebugging Step-Based Jobs (GitLab Functions)
Section titled “Debugging Step-Based Jobs (GitLab Functions)”Jobs that use GitLab Functions execute through a step-runner gRPC service inside the build container, using a bootstrap → serve → proxy pattern. This section covers how to debug issues specific to step-based execution.
For a general overview of step-based execution and common errors, see the Troubleshooting Guide.
Enable Debug Logging
Section titled “Enable Debug Logging”For verbose step-runner output in the job log, set the CI_FUNCS_LOG_LEVEL CI/CD variable to debug on the job or project:
variables: CI_FUNCS_LOG_LEVEL: debugVerify Bootstrap
Section titled “Verify Bootstrap”The bootstrap stage copies the gitlab-runner-helper binary into the build container’s shared volume.
-
Check for the
docker_bootstrapstage in the job logs. A successful bootstrap will show the bootstrap container being created and exiting with code 0. -
If you have access to the job VM, verify the binary exists in the container:
Terminal window docker exec <build_container> ls -la /opt/gitlab-runner/gitlab-runner-helperThe binary should exist and be executable (
-rwxr-xr-x). -
Verify the bootstrap volume is mounted:
Terminal window docker inspect <build_container> --format '{{json .Mounts}}' | python3 -m json.tool | grep -A5 "/opt/gitlab-runner"
Inspect Serve Process
Section titled “Inspect Serve Process”In step-based jobs, the build container’s main process is the helper binary running in serve mode.
-
Check the container’s main process:
Terminal window docker exec <build_container> ps auxYou should see a process like:
/opt/gitlab-runner/gitlab-runner-helper steps serve bashIf instead you see just
bashorshas PID 1, the job is using traditional execution, not steps. -
Confirm the step-runner started successfully by checking for the “step-runner is ready.” message in the job log. If this message is absent, the serve process did not initialize.
-
Check the container’s command configuration:
Terminal window docker inspect <build_container> --format '{{json .Config.Cmd}}'For step-based jobs, this will include
steps servein the command chain.
Container Inspection
Section titled “Container Inspection”Inspect step-related volumes and mounts on the job VM:
# List all volumes for the build containerdocker inspect <build_container> --format '{{json .Mounts}}' | python3 -m json.tool
# Check for the /opt/gitlab-runner volume specificallydocker inspect <build_container> --format '{{range .Mounts}}{{if eq .Destination "/opt/gitlab-runner"}}Type={{.Type}} Source={{.Source}} RW={{.RW}}{{end}}{{end}}'The /opt/gitlab-runner volume should be present and writable.
Log Patterns
Section titled “Log Patterns”Step-based execution produces different log patterns compared to traditional execution:
Step-based execution:
- A
docker_bootstrapstage appears before the build starts (ExecutorStageBootstrapinternally). - The bootstrap container is created and started with the command
gitlab-runner-helper steps bootstrap /opt/gitlab-runner/gitlab-runner-helper. - The build container command is prefixed with
/opt/gitlab-runner/gitlab-runner-helper steps serve. - Once the gRPC service is ready, the job log prints “step-runner is ready.”.
Traditional execution:
- No bootstrap stage.
- No
/opt/gitlab-runnervolume creation. - The build container runs the shell command directly (e.g.,
bash,sh).
Unix Socket Verification
Section titled “Unix Socket Verification”The step-runner gRPC service communicates over a Unix socket inside the build container. On Linux, the default path is /tmp/step-runner.sock.
-
Verify the socket exists:
Terminal window docker exec <build_container> ls -la /tmp/step-runner.sockThe socket should be present as a socket file (type
s). -
If the socket does not exist, the serve process likely failed to start or has crashed. Check the container logs:
Terminal window docker logs <build_container> -
Verify the serve process is still running (see Inspect Serve Process above). If the serve process has exited, the socket will no longer accept connections, and the proxy will fail to communicate with the step-runner.
Troubleshooting Tips
Section titled “Troubleshooting Tips”Common Issue: Network Misconfiguration
Section titled “Common Issue: Network Misconfiguration”One frequent issue is a missing network configuration for the CI Gateway. Ensure that the network is allowed in the CI Gateway configuration.
If problems persist, verify the VM’s network settings and access permissions.