diff options
Diffstat (limited to 'taskcluster/docs/taskgraph.rst')
-rw-r--r-- | taskcluster/docs/taskgraph.rst | 276 |
1 files changed, 276 insertions, 0 deletions
diff --git a/taskcluster/docs/taskgraph.rst b/taskcluster/docs/taskgraph.rst new file mode 100644 index 000000000..5d3e7c7d3 --- /dev/null +++ b/taskcluster/docs/taskgraph.rst @@ -0,0 +1,276 @@ +====================== +TaskGraph Mach Command +====================== + +The task graph is built by linking different kinds of tasks together, pruning +out tasks that are not required, then optimizing by replacing subgraphs with +links to already-completed tasks. + +Concepts +-------- + +* *Task Kind* - Tasks are grouped by kind, where tasks of the same kind do not + have interdependencies but have substantial similarities, and may depend on + tasks of other kinds. Kinds are the primary means of supporting diversity, + in that a developer can add a new kind to do just about anything without + impacting other kinds. + +* *Task Attributes* - Tasks have string attributes by which can be used for + filtering. Attributes are documented in :doc:`attributes`. + +* *Task Labels* - Each task has a unique identifier within the graph that is + stable across runs of the graph generation algorithm. Labels are replaced + with TaskCluster TaskIds at the latest time possible, facilitating analysis + of graphs without distracting noise from randomly-generated taskIds. + +* *Optimization* - replacement of a task in a graph with an equivalent, + already-completed task, or a null task, avoiding repetition of work. + +Kinds +----- + +Kinds are the focal point of this system. They provide an interface between +the large-scale graph-generation process and the small-scale task-definition +needs of different kinds of tasks. Each kind may implement task generation +differently. Some kinds may generate task definitions entirely internally (for +example, symbol-upload tasks are all alike, and very simple), while other kinds +may do little more than parse a directory of YAML files. + +A ``kind.yml`` file contains data about the kind, as well as referring to a +Python class implementing the kind in its ``implementation`` key. That +implementation may rely on lots of code shared with other kinds, or contain a +completely unique implementation of some functionality. + +The full list of pre-defined keys in this file is: + +``implementation`` + Class implementing this kind, in the form ``<module-path>:<object-path>``. + This class should be a subclass of ``taskgraph.kind.base:Kind``. + +``kind-dependencies`` + Kinds which should be loaded before this one. This is useful when the kind + will use the list of already-created tasks to determine which tasks to + create, for example adding an upload-symbols task after every build task. + +Any other keys are subject to interpretation by the kind implementation. + +The result is a nice segmentation of implementation so that the more esoteric +in-tree projects can do their crazy stuff in an isolated kind without making +the bread-and-butter build and test configuration more complicated. + +Dependencies +------------ + +Dependencies between tasks are represented as labeled edges in the task graph. +For example, a test task must depend on the build task creating the artifact it +tests, and this dependency edge is named 'build'. The task graph generation +process later resolves these dependencies to specific taskIds. + +Decision Task +------------- + +The decision task is the first task created when a new graph begins. It is +responsible for creating the rest of the task graph. + +The decision task for pushes is defined in-tree, in ``.taskcluster.yml``. That +task description invokes ``mach taskcluster decision`` with some metadata about +the push. That mach command determines the optimized task graph, then calls +the TaskCluster API to create the tasks. + +Note that this mach command is *not* designed to be invoked directly by humans. +Instead, use the mach commands described below, supplying ``parameters.yml`` +from a recent decision task. These commands allow testing everything the +decision task does except the command-line processing and the +``queue.createTask`` calls. + +Graph Generation +---------------- + +Graph generation, as run via ``mach taskgraph decision``, proceeds as follows: + +#. For all kinds, generate all tasks. The result is the "full task set" +#. Create dependency links between tasks using kind-specific mechanisms. The + result is the "full task graph". +#. Select the target tasks (based on try syntax or a tree-specific + specification). The result is the "target task set". +#. Based on the full task graph, calculate the transitive closure of the target + task set. That is, the target tasks and all requirements of those tasks. + The result is the "target task graph". +#. Optimize the target task graph based on kind-specific optimization methods. + The result is the "optimized task graph" with fewer nodes than the target + task graph. +#. Create tasks for all tasks in the optimized task graph. + +Transitive Closure +.................. + +Transitive closure is a fancy name for this sort of operation: + + * start with a set of tasks + * add all tasks on which any of those tasks depend + * repeat until nothing changes + +The effect is this: imagine you start with a linux32 test job and a linux64 test job. +In the first round, each test task depends on the test docker image task, so add that image task. +Each test also depends on a build, so add the linux32 and linux64 build tasks. + +Then repeat: the test docker image task is already present, as are the build +tasks, but those build tasks depend on the build docker image task. So add +that build docker image task. Repeat again: this time, none of the tasks in +the set depend on a task not in the set, so nothing changes and the process is +complete. + +And as you can see, the graph we've built now includes everything we wanted +(the test jobs) plus everything required to do that (docker images, builds). + +Optimization +------------ + +The objective of optimization to remove as many tasks from the graph as +possible, as efficiently as possible, thereby delivering useful results as +quickly as possible. For example, ideally if only a test script is modified in +a push, then the resulting graph contains only the corresponding test suite +task. + +A task is said to be "optimized" when it is either replaced with an equivalent, +already-existing task, or dropped from the graph entirely. + +A task can be optimized if all of its dependencies can be optimized and none of +its inputs have changed. For a task on which no other tasks depend (a "leaf +task"), the optimizer can determine what has changed by looking at the +version-control history of the push: if the relevant files are not modified in +the push, then it considers the inputs unchanged. For tasks on which other +tasks depend ("non-leaf tasks"), the optimizer must replace the task with +another, equivalent task, so it generates a hash of all of the inputs and uses +that to search for a matching, existing task. + +In some cases, such as try pushes, tasks in the target task set have been +explicitly requested and are thus excluded from optimization. In other cases, +the target task set is almost the entire task graph, so targetted tasks are +considered for optimization. This behavior is controlled with the +``optimize_target_tasks`` parameter. + +Action Tasks +------------ + +Action Tasks are tasks which help you to schedule new jobs via Treeherder's +"Add New Jobs" feature. The Decision Task creates a YAML file named +``action.yml`` which can be used to schedule Action Tasks after suitably replacing +``{{decision_task_id}}`` and ``{{task_labels}}``, which correspond to the decision +task ID of the push and a comma separated list of task labels which need to be +scheduled. + +This task invokes ``mach taskgraph action-task`` which builds up a task graph of +the requested tasks. This graph is optimized using the tasks running initially in +the same push, due to the decision task. + +So for instance, if you had already requested a build task in the ``try`` command, +and you wish to add a test which depends on this build, the original build task +is re-used. + +Action Tasks are currently scheduled by +[pulse_actions](https://github.com/mozilla/pulse_actions). This feature is only +present on ``try`` pushes for now. + +Mach commands +------------- + +A number of mach subcommands are available aside from ``mach taskgraph +decision`` to make this complex system more accesssible to those trying to +understand or modify it. They allow you to run portions of the +graph-generation process and output the results. + +``mach taskgraph tasks`` + Get the full task set + +``mach taskgraph full`` + Get the full task graph + +``mach taskgraph target`` + Get the target task set + +``mach taskgraph target-graph`` + Get the target task graph + +``mach taskgraph optimized`` + Get the optimized task graph + +Each of these commands taskes a ``--parameters`` option giving a file with +parameters to guide the graph generation. The decision task helpfully produces +such a file on every run, and that is generally the easiest way to get a +parameter file. The parameter keys and values are described in +:doc:`parameters`; using that information, you may modify an existing +``parameters.yml`` or create your own. + +Task Parameterization +--------------------- + +A few components of tasks are only known at the very end of the decision task +-- just before the ``queue.createTask`` call is made. These are specified +using simple parameterized values, as follows: + +``{"relative-datestamp": "certain number of seconds/hours/days/years"}`` + Objects of this form will be replaced with an offset from the current time + just before the ``queue.createTask`` call is made. For example, an + artifact expiration might be specified as ``{"relative-timestamp": "1 + year"}``. + +``{"task-reference": "string containing <dep-name>"}`` + The task definition may contain "task references" of this form. These will + be replaced during the optimization step, with the appropriate taskId for + the named dependency substituted for ``<dep-name>`` in the string. + Multiple labels may be substituted in a single string, and ``<<>`` can be + used to escape a literal ``<``. + +Taskgraph JSON Format +--------------------- + +Task graphs -- both the graph artifacts produced by the decision task and those +output by the ``--json`` option to the ``mach taskgraph`` commands -- are JSON +objects, keyed by label, or for optimized task graphs, by taskId. For +convenience, the decision task also writes out ``label-to-taskid.json`` +containing a mapping from label to taskId. Each task in the graph is +represented as a JSON object. + +Each task has the following properties: + +``task_id`` + The task's taskId (only for optimized task graphs) + +``label`` + The task's label + +``attributes`` + The task's attributes + +``dependencies`` + The task's in-graph dependencies, represented as an object mapping + dependency name to label (or to taskId for optimized task graphs) + +``task`` + The task's TaskCluster task definition. + +``kind_implementation`` + The module and the class name which was used to implement this particular task. + It is always of the form ``<module-path>:<object-path>`` + +The results from each command are in the same format, but with some differences +in the content: + +* The ``tasks`` and ``target`` subcommands both return graphs with no edges. + That is, just collections of tasks without any dependencies indicated. + +* The ``optimized`` subcommand returns tasks that have been assigned taskIds. + The dependencies array, too, contains taskIds instead of labels, with + dependencies on optimized tasks omitted. However, the ``task.dependencies`` + array is populated with the full list of dependency taskIds. All task + references are resolved in the optimized graph. + +The output of the ``mach taskgraph`` commands are suitable for processing with +the `jq <https://stedolan.github.io/jq/>`_ utility. For example, to extract all +tasks' labels and their dependencies: + +.. code-block:: shell + + jq 'to_entries | map({label: .value.label, dependencies: .value.dependencies})' + |