1 files changed, 198 insertions, 0 deletions
diff --git a/taskcluster/docs/transforms.rst b/taskcluster/docs/transforms.rst
new file mode 100644
index 000000000..1679c5589
--- /dev/null
+++ b/taskcluster/docs/transforms.rst
@@ -0,0 +1,198 @@
+Transforms
+==========
+
+Many task kinds generate tasks by a process of transforming job descriptions
+into task definitions.  The basic operation is simple, although the sequence of
+transforms applied for a particular kind may not be!
+
+Overview
+--------
+
+To begin, a kind implementation generates a collection of items; see
+:doc:`loading`.  The items are simply Python dictionaries, and describe
+"semantically" what the resulting task or tasks should do.
+
+The kind also defines a sequence of transformations.  These are applied, in
+order, to each item.  Early transforms might apply default values or break
+items up into smaller items (for example, chunking a test suite).  Later
+transforms rewrite the items entirely, with the final result being a task
+definition.
+
+Transform Functions
+...................
+
+Each transformation looks like this:
+
+.. code-block::
+
+    @transforms.add
+    def transform_an_item(config, items):
+        """This transform ..."""  # always a docstring!
+        for item in items:
+            # ..
+            yield item
+
+The ``config`` argument is a Python object containing useful configuration for
+the kind, and is a subclass of
+:class:`taskgraph.transforms.base.TransformConfig`, which specifies a few of
+its attributes.  Kinds may subclass and add additional attributes if necessary.
+
+While most transforms yield one item for each item consumed, this is not always
+true: items that are not yielded are effectively filtered out.  Yielding
+multiple items for each consumed item implements item duplication; this is how
+test chunking is accomplished, for example.
+
+The ``transforms`` object is an instance of
+:class:`taskgraph.transforms.base.TransformSequence`, which serves as a simple
+mechanism to combine a sequence of transforms into one.
+
+Schemas
+.......
+
+The items used in transforms are validated against some simple schemas at
+various points in the transformation process.  These schemas accomplish two
+things: they provide a place to add comments about the meaning of each field,
+and they enforce that the fields are actually used in the documented fashion.
+
+Keyed By
+........
+
+Several fields in the input items can be "keyed by" another value in the item.
+For example, a test description's chunks may be keyed by ``test-platform``.
+In the item, this looks like:
+
+.. code-block:: yaml
+
+    chunks:
+        by-test-platform:
+            linux64/debug: 12
+            linux64/opt: 8
+            default: 10
+
+This is a simple but powerful way to encode business rules in the items
+provided as input to the transforms, rather than expressing those rules in the
+transforms themselves.  If you are implementing a new business rule, prefer
+this mode where possible.  The structure is easily resolved to a single value
+using :func:`taskgraph.transform.base.get_keyed_by`.
+
+Organization
+-------------
+
+Task creation operates broadly in a few phases, with the interfaces of those
+stages defined by schemas.  The process begins with the raw data structures
+parsed from the YAML files in the kind configuration.  This data can processed
+by kind-specific transforms resulting, for test jobs, in a "test description".
+For non-test jobs, the next step is a "job description".  These transformations
+may also "duplicate" tasks, for example to implement chunking or several
+variations of the same task.
+
+In any case, shared transforms then convert this into a "task description",
+which the task-generation transforms then convert into a task definition
+suitable for ``queue.createTask``.
+
+Test Descriptions
+-----------------
+
+The transforms configured for test kinds proceed as follows, based on
+configuration in ``kind.yml``:
+
+ * The test description is validated to conform to the schema in
+   ``taskcluster/taskgraph/transforms/tests/test_description.py``.  This schema
+   is extensively documented and is a the primary reference for anyone
+   modifying tests.
+
+ * Kind-specific transformations are applied.  These may apply default
+   settings, split tests (e.g., one to run with feature X enabled, one with it
+   disabled), or apply across-the-board business rules such as "all desktop
+   debug test platforms should have a max-run-time of 5400s".
+
+ * Transformations generic to all tests are applied.  These apply policies
+   which apply to multiple kinds, e.g., for treeherder tiers.  This is also the
+   place where most values which differ based on platform are resolved, and
+   where chunked tests are split out into a test per chunk.
+
+ * The test is again validated against the same schema.  At this point it is
+   still a test description, just with defaults and policies applied, and
+   per-platform options resolved.  So transforms up to this point do not modify
+   the "shape" of the test description, and are still governed by the schema in
+   ``test_description.py``.
+
+ * The ``taskgraph.transforms.tests.make_task_description:transforms`` then
+   take the test description and create a *task* description.  This transform
+   embodies the specifics of how test runs work: invoking mozharness, various
+   worker options, and so on.
+
+ * Finally, the ``taskgraph.transforms.task:transforms``, described above
+   under "Task-Generation Transforms", are applied.
+
+Test dependencies are produced in the form of a dictionary mapping dependency
+name to task label.
+
+Job Descriptions
+----------------
+
+A job description says what to run in the task.  It is a combination of a
+``run`` section and all of the fields from a task description.  The run section
+has a ``using`` property that defines how this task should be run; for example,
+``mozharness`` to run a mozharness script, or ``mach`` to run a mach command.
+The remainder of the run section is specific to the run-using implementation.
+
+The effect of a job description is to say "run this thing on this worker".  The
+job description must contain enough information about the worker to identify
+the workerType and the implementation (docker-worker, generic-worker, etc.).
+Any other task-description information is passed along verbatim, although it is
+augmented by the run-using implementation.
+
+The run-using implementations are all located in
+``taskcluster/taskgraph/transforms/job``, along with the schemas for their
+implementations.  Those well-commented source files are the canonical
+documentation for what constitutes a job description, and should be considered
+part of the documentation.
+
+Task Descriptions
+-----------------
+
+Every kind needs to create tasks, and all of those tasks have some things in
+common.  They all run on one of a small set of worker implementations, each
+with their own idiosyncracies.  And they all report to TreeHerder in a similar
+way.
+
+The transforms in ``taskcluster/taskgraph/transforms/task.py`` implement
+this common functionality.  They expect a "task description", and produce a
+task definition.  The schema for a task description is defined at the top of
+``task.py``, with copious comments.  Go forth and read it now!
+
+In general, the task-description transforms handle functionality that is common
+to all Gecko tasks.  While the schema is the definitive reference, the
+functionality includes:
+
+* TreeHerder metadata
+
+* Build index routes
+
+* Information about the projects on which this task should run
+
+* Optimizations
+
+* Defaults for ``expires-after`` and and ``deadline-after``, based on project
+
+* Worker configuration
+
+The parts of the task description that are specific to a worker implementation
+are isolated in a ``task_description['worker']`` object which has an
+``implementation`` property naming the worker implementation.  Each worker
+implementation has its own section of the schema describing the fields it
+expects.  Thus the transforms that produce a task description must be aware of
+the worker implementation to be used, but need not be aware of the details of
+its payload format.
+
+The ``task.py`` file also contains a dictionary mapping treeherder groups to
+group names using an internal list of group names.  Feel free to add additional
+groups to this list as necessary.
+
+More Detail
+-----------
+
+The source files provide lots of additional detail, both in the code itself and
+in the comments and docstrings.  For the next level of detail beyond this file,
+consult the transform source under ``taskcluster/taskgraph/transforms``.