Workflow Language (YAML)
Basics
Minimal Example
The minimal example of a DAG file is as follows:
steps:
- name: step 1
command: echo hello
- name: step 2
command: echo world
depends:
- step 1
Working Directory
You can specify the working directory for each step using the dir
field.
steps:
- name: step 1
dir: /path/to/working/directory
command: some command
Code Snippet
script
field provides a way to run arbitrary snippets of code in any language.
steps:
- name: step 1
command: "bash"
script: |
cd /tmp
echo "hello world" > hello
cat hello
output: RESULT
- name: step 2
command: echo ${RESULT} # hello world
depends:
- step 1
Environment Variables
You can define environment variables using the env
field. The environment variables can be accessed by the DAG and its steps.
env:
- SOME_DIR: ${HOME}/batch
- SOME_FILE: ${SOME_DIR}/some_file
steps:
- name: some task in some dir
dir: ${SOME_DIR}
command: python main.py ${SOME_FILE}
Parameters
You can pass parameters to the DAG and its steps using the params
field. The parameters can be accessed by the steps using $1
, $2
, and so on.
params: param1 param2
steps:
- name: some task with parameters
command: python main.py $1 $2
Named Parameters
You can also use named parameters in the params
field. The named parameters can be accessed by the steps using ${FOO}
, ${BAR}
, and so on.
params: FOO=1 BAR=`echo 2`
steps:
- name: some task with parameters
command: python main.py ${FOO} ${BAR}
Conditional Logic
Sometimes you have parts of a DAG that you only want to run under certain conditions. You can use the preconditions
field to add conditional branches to your DAG.
For example, the task below only runs on the first date of each month.
steps:
- name: A monthly task
command: monthly.sh
preconditions:
- condition: "`date '+%d'`"
expected: "01"
If you want the DAG to continue to the next step regardless of the step’s conditional check result, you can use the continueOn
field:
steps:
- name: A monthly task
command: monthly.sh
preconditions:
- condition: "`date '+%d'`"
expected: "01"
continueOn:
skipped: true
Capture Output
The output
field can be used to set an environment variable with standard output. Leading and trailing space will be trimmed automatically. The environment variables can be used in subsequent steps.
steps:
- name: step 1
command: "echo foo"
output: FOO # will contain "foo"
Redirect Standard Output and Error
The stdout field can be used to write standard output to a file.
steps:
- name: create a file
command: "echo hello"
stdout: "/tmp/hello" # the content will be "hello\n"
The stderr field allows to redirect stderr to other file without writing to the normal log file.
steps:
- name: output error file
command: "echo error message >&2"
stderr: "/tmp/error.txt"
Schedule
You can use the schedule field to schedule a DAG with Cron expression.
schedule: "5 4 * * *" # Run at 04:05.
steps:
- name: scheduled job
command: job.sh
See Scheduler for more details.
Docker Image
Execute an Image
Note: It requires Docker daemon running on the host.
The docker executor allows us to run Docker containers instead of bare commands. This can be useful for running commands in isolated environments or for reproducibility purposes.
In the example below, it pulls and runs Deno’s docker image and prints ‘Hello World’.
steps:
- name: deno_hello_world
executor:
type: docker
config:
image: "denoland/deno:1.10.3"
autoRemove: true
command: run https://examples.deno.land/hello-world.ts
Example Log output:
You can config the Docker container (e.g., volumes, env, etc) by passing more detailed options.
For example:
steps:
- name: deno_hello_world
executor:
type: docker
config:
image: "denoland/deno:1.10.3"
container:
volumes:
/app:/app:
env:
- FOO=BAR
host:
autoRemove: true
command: run https://examples.deno.land/hello-world.ts
See the Docker’s API documentation for all available options.
For container, see ContainerConfig.
For host, see HostConfig.
Use Host’s Docker Environment
If you are running dagu using a container, you need the setup below.
Run a socat container with the command below.
docker run -v /var/run/docker.sock:/var/run/docker.sock -p 2376:2375 bobrik/socat TCP4-LISTEN:2375,fork,reuseaddr UNIX-CONNECT:/var/run/docker.sock
Then you can set the DOCKER_HOST environment as follows.
env:
- DOCKER_HOST : "tcp://host.docker.internal:2376"
steps:
- name: deno_hello_world
executor:
type: docker
config:
image: "denoland/deno:1.10.3"
autoRemove: true
command: run https://examples.deno.land/hello-world.ts
For more details, see this page.
Advanced
Making HTTP Requests
The http executor allows us to make an arbitrary HTTP request. This can be useful for interacting with web services or APIs.
steps:
- name: send POST request
command: POST https://foo.bar.com
executor:
type: http
config:
timeout: 10,
headers:
Authorization: "Bearer $TOKEN"
silent: true # If silent is true, it outputs response body only.
query:
key: "value"
body: "post body"
Sending Email
The mail executor can be used to send email. This can be useful for sending notifications or alerts.
Example:
smtp:
host: "smtp.foo.bar"
port: "587"
username: "<username>"
password: "<password>"
params: RECIPIENT=XXX
steps:
- name: step1
executor:
type: mail
config:
to: <to address>
from: <from address>
subject: "Exciting New Features Now Available"
message: |
Hello [RECIPIENT],
We hope you're enjoying your experience with MyApp!
We're thrilled to announce that [] v2.0 is now available,
and we've added some fantastic new features based on your
valuable feedback.
Thank you for choosing MyApp and for your continued support.
We look forward to hearing from you and providing you with
an even better MyApp experience.
Best regards,
Command Execution over SSH
The ssh executor allows us to execute commands on remote hosts over SSH.
steps:
- name: step1
executor:
type: ssh
config:
user: dagu
ip: XXX.XXX.XXX.XXX
port: 22
key: /Users/dagu/.ssh/private.pem
command: /usr/sbin/ifconfig
Command Substitution
You can use command substitution in field values. I.e., a string enclosed in backquotes (`) is evaluated as a command and replaced with the result of standard output.
env:
TODAY: "`date '+%Y%m%d'`"
steps:
- name: hello
command: "echo hello, today is ${TODAY}"
Lifecycle Hooks
It is often desirable to take action when a specific event happens, for example, when a DAG fails. To achieve this, you can use handlerOn fields.
handlerOn:
failure:
command: notify_error.sh
exit:
command: cleanup.sh
steps:
- name: A task
command: main.sh
Repeat a Step
If you want a task to repeat execution at regular intervals, you can use the repeatPolicy field. If you want to stop the repeating task, you can use the stop command to gracefully stop the task.
steps:
- name: A task
command: main.sh
repeatPolicy:
repeat: true
intervalSec: 60
User Defined Functions
You can define functions in the DAG file and call them in steps. The params
field is required for functions. The args
field is used to pass arguments to functions. The arguments can be command substitutions or environment variables.
functions:
- name: my_function
params: param1 param2
command: python main.py $param1 $param2
steps:
- name: step 1
call:
function: my_function
args:
param1: 1
param2: 2
JSON Processing
Executing jq Command
The jq executor can be used to transform, query, and format JSON. This can be useful for working with JSON data in pipelines or for data processing.
steps:
- name: run query
executor: jq
command: '{(.id): .["10"].b}'
script: |
{"id": "sample", "10": {"b": 42}}
Output:
{
"sample": 42
}
Formatting JSON
steps:
- name: format json
executor: jq
script: |
{"id": "sample", "10": {"b": 42}}
Output:
{
"10": {
"b": 42
},
"id": "sample"
}
All Available Fields
DAG
This section provides a comprehensive list of available fields that can be used to configure DAGs and their steps in detail. Each field serves a specific purpose, enabling granular control over how the DAG runs. The fields include:
name
: The name of the DAG, which is optional. The default name is the name of the file.description
: A brief description of the DAG.schedule
: The execution schedule of the DAG in Cron expression format.group
: The group name to organize DAGs, which is optional.tags
: Free tags that can be used to categorize DAGs, separated by commas.env
: Environment variables that can be accessed by the DAG and its steps.logDir
: The directory where the standard output is written. The default value is${DAGU_HOME}/logs/dags
.restartWaitSec
: The number of seconds to wait after the DAG process stops before restarting it.histRetentionDays
: The number of days to retain execution history (not for log files).delaySec
: The interval time in seconds between steps.maxActiveRuns
: The maximum number of parallel running steps.params
: The default parameters that can be referred to by$1
,$2
, and so on.preconditions
: The conditions that must be met before a DAG or step can run.mailOn
: Whether to send an email notification when a DAG or step fails or succeeds.MaxCleanUpTimeSec
: The maximum time to wait after sending a TERM signal to running steps before killing them.handlerOn
: The command to execute when a DAG or step succeeds, fails, cancels, or exits.steps
: A list of steps to execute in the DAG.
In addition, a global configuration file, $DAGU_HOME/config.yaml
, can be used to gather common settings, such as logDir
or env
.
Note: If DAGU_HOME
environment variable is not set, the default path is $HOME/.dagu/config.yaml
.
Example:
name: DAG name
description: run a DAG
schedule: "0 * * * *"
group: DailyJobs
tags: example
env:
- LOG_DIR: ${HOME}/logs
- PATH: /usr/local/bin:${PATH}
logDir: ${LOG_DIR}
restartWaitSec: 60
histRetentionDays: 3
delaySec: 1
maxActiveRuns: 1
params: param1 param2
preconditions:
- condition: "`echo $2`"
expected: "param2"
mailOn:
failure: true
success: true
MaxCleanUpTimeSec: 300
handlerOn:
success:
command: "echo succeed"
failure:
command: "echo failed"
cancel:
command: "echo canceled"
exit:
command: "echo finished"
Step
Each step can have its own set of configurations, including:
name
: The name of the step.description
: A brief description of the step.dir
: The working directory for the step.command
: The command and parameters to execute.stdout
: The file to which the standard output is written.output
: The variable to which the result is written.script
: The script to execute.signalOnStop
: The signal name (e.g.,SIGINT
) to be sent when the process is stopped.mailOn
: Whether to send an email notification when the step fails or succeeds.continueOn
: Whether to continue to the next step, regardless of whether the step failed or not or the preconditions are met or not.retryPolicy
: The retry policy for the step.repeatPolicy
: The repeat policy for the step.preconditions
: The conditions that must be met before a step can run.depends
: The step depends on the other step.
Example:
steps:
- name: some task
description: some task
dir: ${HOME}/logs
command: bash
stdout: /tmp/outfile
output: RESULT_VARIABLE
script: |
echo "any script"
signalOnStop: "SIGINT"
mailOn:
failure: true
success: true
continueOn:
failure: true
skipped: true
retryPolicy:
limit: 2
intervalSec: 5
repeatPolicy:
repeat: true
intervalSec: 60
preconditions:
- condition: "`echo $1`"
expected: "param1"
depends:
- some task name step