1 files changed, 276 insertions, 0 deletions
diff --git a/taskcluster/docs/taskgraph.rst b/taskcluster/docs/taskgraph.rst
new file mode 100644
index 000000000..5d3e7c7d3
--- /dev/null
+++ b/taskcluster/docs/taskgraph.rst
@@ -0,0 +1,276 @@
+======================
+TaskGraph Mach Command
+======================
+
+The task graph is built by linking different kinds of tasks together, pruning
+out tasks that are not required, then optimizing by replacing subgraphs with
+links to already-completed tasks.
+
+Concepts
+--------
+
+* *Task Kind* - Tasks are grouped by kind, where tasks of the same kind do not
+  have interdependencies but have substantial similarities, and may depend on
+  tasks of other kinds.  Kinds are the primary means of supporting diversity,
+  in that a developer can add a new kind to do just about anything without
+  impacting other kinds.
+
+* *Task Attributes* - Tasks have string attributes by which can be used for
+  filtering.  Attributes are documented in :doc:`attributes`.
+
+* *Task Labels* - Each task has a unique identifier within the graph that is
+  stable across runs of the graph generation algorithm.  Labels are replaced
+  with TaskCluster TaskIds at the latest time possible, facilitating analysis
+  of graphs without distracting noise from randomly-generated taskIds.
+
+* *Optimization* - replacement of a task in a graph with an equivalent,
+  already-completed task, or a null task, avoiding repetition of work.
+
+Kinds
+-----
+
+Kinds are the focal point of this system.  They provide an interface between
+the large-scale graph-generation process and the small-scale task-definition
+needs of different kinds of tasks.  Each kind may implement task generation
+differently.  Some kinds may generate task definitions entirely internally (for
+example, symbol-upload tasks are all alike, and very simple), while other kinds
+may do little more than parse a directory of YAML files.
+
+A ``kind.yml`` file contains data about the kind, as well as referring to a
+Python class implementing the kind in its ``implementation`` key.  That
+implementation may rely on lots of code shared with other kinds, or contain a
+completely unique implementation of some functionality.
+
+The full list of pre-defined keys in this file is:
+
+``implementation``
+   Class implementing this kind, in the form ``<module-path>:<object-path>``.
+   This class should be a subclass of ``taskgraph.kind.base:Kind``.
+
+``kind-dependencies``
+   Kinds which should be loaded before this one.  This is useful when the kind
+   will use the list of already-created tasks to determine which tasks to
+   create, for example adding an upload-symbols task after every build task.
+
+Any other keys are subject to interpretation by the kind implementation.
+
+The result is a nice segmentation of implementation so that the more esoteric
+in-tree projects can do their crazy stuff in an isolated kind without making
+the bread-and-butter build and test configuration more complicated.
+
+Dependencies
+------------
+
+Dependencies between tasks are represented as labeled edges in the task graph.
+For example, a test task must depend on the build task creating the artifact it
+tests, and this dependency edge is named 'build'.  The task graph generation
+process later resolves these dependencies to specific taskIds.
+
+Decision Task
+-------------
+
+The decision task is the first task created when a new graph begins.  It is
+responsible for creating the rest of the task graph.
+
+The decision task for pushes is defined in-tree, in ``.taskcluster.yml``.  That
+task description invokes ``mach taskcluster decision`` with some metadata about
+the push.  That mach command determines the optimized task graph, then calls
+the TaskCluster API to create the tasks.
+
+Note that this mach command is *not* designed to be invoked directly by humans.
+Instead, use the mach commands described below, supplying ``parameters.yml``
+from a recent decision task.  These commands allow testing everything the
+decision task does except the command-line processing and the
+``queue.createTask`` calls.
+
+Graph Generation
+----------------
+
+Graph generation, as run via ``mach taskgraph decision``, proceeds as follows:
+
+#. For all kinds, generate all tasks.  The result is the "full task set"
+#. Create dependency links between tasks using kind-specific mechanisms.  The
+   result is the "full task graph".
+#. Select the target tasks (based on try syntax or a tree-specific
+   specification).  The result is the "target task set".
+#. Based on the full task graph, calculate the transitive closure of the target
+   task set.  That is, the target tasks and all requirements of those tasks.
+   The result is the "target task graph".
+#. Optimize the target task graph based on kind-specific optimization methods.
+   The result is the "optimized task graph" with fewer nodes than the target
+   task graph.
+#. Create tasks for all tasks in the optimized task graph.
+
+Transitive Closure
+..................
+
+Transitive closure is a fancy name for this sort of operation:
+
+ * start with a set of tasks
+ * add all tasks on which any of those tasks depend
+ * repeat until nothing changes
+
+The effect is this: imagine you start with a linux32 test job and a linux64 test job.
+In the first round, each test task depends on the test docker image task, so add that image task.
+Each test also depends on a build, so add the linux32 and linux64 build tasks.
+
+Then repeat: the test docker image task is already present, as are the build
+tasks, but those build tasks depend on the build docker image task.  So add
+that build docker image task.  Repeat again: this time, none of the tasks in
+the set depend on a task not in the set, so nothing changes and the process is
+complete.
+
+And as you can see, the graph we've built now includes everything we wanted
+(the test jobs) plus everything required to do that (docker images, builds).
+
+Optimization
+------------
+
+The objective of optimization to remove as many tasks from the graph as
+possible, as efficiently as possible, thereby delivering useful results as
+quickly as possible.  For example, ideally if only a test script is modified in
+a push, then the resulting graph contains only the corresponding test suite
+task.
+
+A task is said to be "optimized" when it is either replaced with an equivalent,
+already-existing task, or dropped from the graph entirely.
+
+A task can be optimized if all of its dependencies can be optimized and none of
+its inputs have changed.  For a task on which no other tasks depend (a "leaf
+task"), the optimizer can determine what has changed by looking at the
+version-control history of the push: if the relevant files are not modified in
+the push, then it considers the inputs unchanged.  For tasks on which other
+tasks depend ("non-leaf tasks"), the optimizer must replace the task with
+another, equivalent task, so it generates a hash of all of the inputs and uses
+that to search for a matching, existing task.
+
+In some cases, such as try pushes, tasks in the target task set have been
+explicitly requested and are thus excluded from optimization. In other cases,
+the target task set is almost the entire task graph, so targetted tasks are
+considered for optimization.  This behavior is controlled with the
+``optimize_target_tasks`` parameter.
+
+Action Tasks
+------------
+
+Action Tasks are tasks which help you to schedule new jobs via Treeherder's
+"Add New Jobs" feature. The Decision Task creates a YAML file named
+``action.yml`` which can be used to schedule Action Tasks after suitably replacing
+``{{decision_task_id}}`` and ``{{task_labels}}``, which correspond to the decision
+task ID of the push and a comma separated list of task labels which need to be
+scheduled.
+
+This task invokes ``mach taskgraph action-task`` which builds up a task graph of
+the requested tasks. This graph is optimized using the tasks running initially in
+the same push, due to the decision task.
+
+So for instance, if you had already requested a build task in the ``try`` command,
+and you wish to add a test which depends on this build, the original build task
+is re-used.
+
+Action Tasks are currently scheduled by
+[pulse_actions](https://github.com/mozilla/pulse_actions). This feature is only
+present on ``try`` pushes for now.
+
+Mach commands
+-------------
+
+A number of mach subcommands are available aside from ``mach taskgraph
+decision`` to make this complex system more accesssible to those trying to
+understand or modify it.  They allow you to run portions of the
+graph-generation process and output the results.
+
+``mach taskgraph tasks``
+   Get the full task set
+
+``mach taskgraph full``
+   Get the full task graph
+
+``mach taskgraph target``
+   Get the target task set
+
+``mach taskgraph target-graph``
+   Get the target task graph
+
+``mach taskgraph optimized``
+   Get the optimized task graph
+
+Each of these commands taskes a ``--parameters`` option giving a file with
+parameters to guide the graph generation.  The decision task helpfully produces
+such a file on every run, and that is generally the easiest way to get a
+parameter file.  The parameter keys and values are described in
+:doc:`parameters`; using that information, you may modify an existing
+``parameters.yml`` or create your own.
+
+Task Parameterization
+---------------------
+
+A few components of tasks are only known at the very end of the decision task
+-- just before the ``queue.createTask`` call is made.  These are specified
+using simple parameterized values, as follows:
+
+``{"relative-datestamp": "certain number of seconds/hours/days/years"}``
+    Objects of this form will be replaced with an offset from the current time
+    just before the ``queue.createTask`` call is made.  For example, an
+    artifact expiration might be specified as ``{"relative-timestamp": "1
+    year"}``.
+
+``{"task-reference": "string containing <dep-name>"}``
+    The task definition may contain "task references" of this form.  These will
+    be replaced during the optimization step, with the appropriate taskId for
+    the named dependency substituted for ``<dep-name>`` in the string.
+    Multiple labels may be substituted in a single string, and ``<<>`` can be
+    used to escape a literal ``<``.
+
+Taskgraph JSON Format
+---------------------
+
+Task graphs -- both the graph artifacts produced by the decision task and those
+output by the ``--json`` option to the ``mach taskgraph`` commands -- are JSON
+objects, keyed by label, or for optimized task graphs, by taskId.  For
+convenience, the decision task also writes out ``label-to-taskid.json``
+containing a mapping from label to taskId.  Each task in the graph is
+represented as a JSON object.
+
+Each task has the following properties:
+
+``task_id``
+   The task's taskId (only for optimized task graphs)
+
+``label``
+   The task's label
+
+``attributes``
+   The task's attributes
+
+``dependencies``
+   The task's in-graph dependencies, represented as an object mapping
+   dependency name to label (or to taskId for optimized task graphs)
+
+``task``
+   The task's TaskCluster task definition.
+
+``kind_implementation``
+   The module and the class name which was used to implement this particular task.
+   It is always of the form ``<module-path>:<object-path>``
+
+The results from each command are in the same format, but with some differences
+in the content:
+
+* The ``tasks`` and ``target`` subcommands both return graphs with no edges.
+  That is, just collections of tasks without any dependencies indicated.
+
+* The ``optimized`` subcommand returns tasks that have been assigned taskIds.
+  The dependencies array, too, contains taskIds instead of labels, with
+  dependencies on optimized tasks omitted.  However, the ``task.dependencies``
+  array is populated with the full list of dependency taskIds.  All task
+  references are resolved in the optimized graph.
+
+The output of the ``mach taskgraph`` commands are suitable for processing with
+the `jq <https://stedolan.github.io/jq/>`_ utility.  For example, to extract all
+tasks' labels and their dependencies:
+
+.. code-block:: shell
+
+    jq 'to_entries | map({label: .value.label, dependencies: .value.dependencies})'
+