torch-mlir/docs/architecture.md

# Torch-MLIR Architecture

## Introduction

The Torch-MLIR project provides core infrastructure for bridging the PyTorch
ecosystem and the MLIR ecosystem. For example, Torch-MLIR enables PyTorch models
to be lowered to a few different MLIR dialects. Torch-MLIR does not attempt to
provide a production end-to-end flow for PyTorch programs by itself, but is a
useful component for constructing one.

## Overview

Torch-MLIR has two parts, which we call the "frontend" and "backend". These two
halves interface at an abstraction layer that we call the "backend contract",
which is a subset of the `torch` dialect with certain properties appealing for
backends to lower from.

![Torch-MLIR Architecture](Torch-MLIR_Architecture.png)

The frontend of Torch-MLIR is concerned with interfacing to PyTorch itself, and
then normalizing the program to the "backend contract". This part involves build
system complexity and exposure to PyTorch APIs to get the program into the MLIR
`torch` dialect. When we interface with TorchScript, we additionally have a
large amount of lowering and simplification to do within MLIR on the `torch`
dialect.

The "backend" of Torch-MLIR takes IR in the "backend contract" form and lowers
it to various target dialects of interest to the MLIR ecosystem (various
"backends"). In particular, right now we support lowering to:

- Linalg-on-Tensors (+ `arith`, `tensor`, etc.)
- [TOSA](https://mlir.llvm.org/docs/Dialects/TOSA/)
- [MHLO](https://github.com/tensorflow/mlir-hlo)

The terms "frontend" and "backend" are highly overloaded in any compiler
project, but frequently in Torch-MLIR this is the meaning that they have.
Sometimes "frontend" can mean something even further up the stack, such as
something in PyTorch itself. When there is ambiguity we will refer to this as
"at the PyTorch level". Similarly, "backend" can sometimes refer to something
sitting below Linalg-on-Tensors, TOSA, or MHLO.

## The `torch` dialect

See [include/torch-mlir/Dialect/Torch/IR](https://github.com/llvm/torch-mlir/tree/main/include/torch-mlir/Dialect/Torch/IR)

The central MLIR abstraction in the Torch-MLIR project is the `torch` dialect.
This dialect supports progressive lowering from the raw imported PyTorch
programs that various PyTorch integration points provide, all the way down to
the backend contract.

The `torch` dialect must be versatile enough to support being imported by any
program capture mechanism in PyTorch -- this could be TorchDynamo, `torch.fx`,
LazyTensorCore, TorchScript, `torch.jit.trace`, etc. Thankfully, PyTorch is
factored such that we can handle this with one core import path, which is
through the PyTorch
"[JIT IR](https://github.com/pytorch/pytorch/blob/78c8a0d75220bdd4955415b5f81509e005af4232/torch/csrc/jit/OVERVIEW.md)",
and lives in
[torch-mlir/python/torch_mlir/dialects/torch/importer/jit_ir](https://github.com/llvm/torch-mlir/tree/e322f6a8784009b37aa354abfa9a40a80f30877d/python/torch_mlir/dialects/torch/importer/jit_ir).
The JIT IR is a highly principled IR that faithfully models a Python subset (+
tensors, the PyTorch op registry, and a few other things). All the other PyTorch
program representations can eventually bottom-out on the JIT IR via some path
provided by PyTorch. The `torch` dialect is almost entirely in 1:1
correspondence with the JIT IR -- this allows the importer to be extremely small
(the core is
[under 500 lines of code](https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/python/torch_mlir/dialects/torch/importer/jit_ir/csrc/node_importer.cpp#L1)).

### Ops

See [TorchOps.td](https://github.com/llvm/torch-mlir/blob/114f48e96c578ee76a6f83b3aa4aa229a8d5b76e/include/torch-mlir/Dialect/Torch/IR/TorchOps.td#L1)

The ops in the `torch` dialect are almost entirely generated based on the
PyTorch JIT IR operator registry via the script
[torch_ods_gen.py](https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/python/torch_mlir/dialects/torch/importer/jit_ir/build_tools/torch_ods_gen.py#L1) (invoked via [update_torch_ods.sh](https://github.com/llvm/torch-mlir/blob/main/build_tools/update_torch_ods.sh)).
This script queries the registry and generates MLIR
[ODS](https://mlir.llvm.org/docs/OpDefinitions/) in
[GeneratedTorchOps.td](https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/include/torch-mlir/Dialect/Torch/IR/GeneratedTorchOps.td#L1). We have a guide for [adding a new op end-to-end](https://github.com/llvm/torch-mlir/wiki/Torch-ops-E2E-implementation).

There are also some manually implemented ops in the following categories (see
[TorchOps.td](https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/include/torch-mlir/Dialect/Torch/IR/TorchOps.td#L1)):

- Ops used for modeling PyTorch IValue object graphs (e.g. `torch.nn_module`,
  `torch.class_type`).
- `torch.global_slot` and related ops which are used to model an incremental
  lowering of the IValue object graphs.
- Ops that are supported in the JIT interpreter directly, and so don't have a
  corresponding op in the registry (e.g. `torch.prim.If`,
  `torch.prim.ListConstruct`, `torch.constant.*`)
- `torch.operator` which is used to represent ops from the registry which
  haven't been generated by `torch_ods_gen.py`.

### Types

See [TorchTypes.td](https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/include/torch-mlir/Dialect/Torch/IR/TorchTypes.td#L1)

The `torch` dialect has a complete set of types modeling the PyTorch type
system, which itself is a strongly typed subset of the Python type system (+
tensors). These types are almost all 1:1 with the corresponding
[PyTorch types](https://github.com/pytorch/pytorch/blob/c54d18dbc7bb2f9fdd83c5de529702e5a02295c3/aten/src/ATen/core/jit_type.h#L1).

The one exception where a significant amount of design work has been done in
Torch-MLIR is the handling of tensors. Torch-MLIR's tensor types allow
progressive lowering from raw imported IR which maybe be missing shapes, dtypes,
and value semantics, into the backend contract which provides those. Torch-MLIR
has two tensor types `ValueTensorType` (`!torch.vtensor`) and
`NonValueTensorType` (`!torch.tensor`) sharing most of their definition in
[TorchTypes.td](https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/include/torch-mlir/Dialect/Torch/IR/TorchTypes.td#L58).
The `NonValueTensorType` models a `torch.Tensor` including mutation, aliasing,
etc. while the `ValueTensorType` has value semantics. That is, `ValueTensorType`
is immutable and non-aliased. These types have a common C++ base class
[`BaseTensorType`](https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/include/torch-mlir/Dialect/Torch/IR/TorchTypes.h#L40)
which permits abstracting across them. Both `ValueTensorType` and
`NonValueTensorType` have an optional list of optional sizes and an optional
dtype.

## The "backend contract"

See [satisfiesBackendContract](https://github.com/llvm/torch-mlir/blob/114f48e96c578ee76a6f83b3aa4aa229a8d5b76e/lib/Dialect/Torch/Transforms/LowerToBackendContract.cpp#L151)

The backend contract is a normalized form of the `torch` dialect with a set of
properties that make it easy to lower into various forms such as
Linalg-on-Tensors, TOSA, MHLO, or other forms that we don't provide out of the
box. The primary guarantees that we provide Torch-MLIR's backends are:

- All tensors have been converted to value semantics.
- All tensors have at least a known number of dimensions (i.e. rank), and
  ideally also have a precise size for each dimension.
- All tensors have a known dtype.
- Certain ops have been decomposed to make them easier to handle (this is
  configurable).

See the extensive comments in the function `satisfiesBackendContract` (and its
callees) in the `LowerToBackendContract` pass for an extended rationale for
these decisions, and a precise definition of the backend contract.

## The Frontends

Torch-MLIR provides 2 main frontends:

- LazyTensorCore - a frontend that is based around intercepting PyTorch
  dispatcher calls and creating a graph that is lazily evaluated on demand.
- TorchScript - a frontend based around importing TorchScript functions or
  modules. Such modules or functions can be obtained via `torch.jit.script`,
  `torch.jit.trace`, or a few other methods in the PyTorch ecosystem.

Internally these share a lot of the core import code.

### LazyTensorCore

Docs: https://github.com/llvm/torch-mlir/blob/main/docs/ltc_backend.md

LazyTensorCore (LTC) is a program capture method provided by PyTorch that does
device-level tracing. This low-level interception point sits below gradient
calculations, and is thus a good choice for training flows. The downside of LTC
is that it depends on having the whole PyTorch runtime available, so cannot be
used for ahead-of-time compilation or capturing standalone program artifacts.

From an implementation perspective, the JIT IR that is produced by
LazyTensorCore has already had a number of transformations performed on it, in
particular, after importing from JIT IR to MLIR, the backend contract is
trivially satisfied. So the Torch-MLIR implementation complexity for
LazyTensorCore is restricted to build system and PyTorch integration, rather
than actual MLIR compiler passes.

### TorchScript (`torch.jit.script`)

[TorchScript](https://pytorch.org/docs/stable/jit.html) is a strict Python
subset which is modeled faithfully in the JIT IR. Additionally, TorchScript can
represent a full `torch.nn.Module` object graph (hierarchy). This results in a
significant amount of work needing to be done by the frontend to lower it to the
backend contract:

- The `torch.nn.Module` hierarchy must be lowered to the backend contract, which
  does not allow any program state.
- The program must be converted to value semantics (functionalized).
- Shapes and dtypes must be inferred.
- Many "Python-isms" must be simplified away, such as list appends, string
  operations, etc.

Because TorchScript does not naturally give shapes or dtypes, we usually require
the user to annotate a set of expected shapes and dtypes of any arguments. We then propagate those throughout the program.

`torch.jit.trace` produces JIT IR with shapes and dtypes already, but no value
semantics. And often users want to erase the shapes in the trace to allow
dynamic shapes for the trace. Additionally, the Python-level data structures and
APIs are very parallel between `torch.jit.script` and `torch.jit.trace`, so we
consider both of those as the same from the perspective of the responsibilities
of the compiler. Both are accessed via the `torch_mlir.compile` Python API.

### Modeling the `torch.nn.Module` object (`IValue`) hierarchy for TorchScript

PyTorch consistently models a subset of Python objects with its concept of
[`IValue`](https://github.com/pytorch/pytorch/blob/1ee9eb52b612f5fb4b63bbda832e44c8902edb64/aten/src/ATen/core/ivalue.h#L171)
(interpreter value). These are used throughout PyTorch to represent Python
values. When one `torch.jit.script`'s a `torch.nn.Module`, the result is
actually an `IValue` that represents the module, with a hierarchy of children
`IValue`'s. Strictly speaking, JIT IR `torch::jit::Graph`'s are only used to
represent the bodies of methods on the modules. So in addition to importing the
JIT IR, we also need to import the `IValue`'s. This happens inside [ivalue_importer.cpp](https://github.com/llvm/torch-mlir/blob/fde390c7669e29362b18388448ef2b188713383f/python/torch_mlir/dialects/torch/importer/jit_ir/csrc/ivalue_importer.cpp#L1).

Most of the IValue modeling can reuse `torch` dialect ops that already exist
otherwise, such as `torch.constant.int` to represent an int in the object graph.
However, special IR constructs are needed for modeling the `torch.nn.Module`'s
themselves.

An example is:

```mlir
torch.class_type @c {
  torch.attr "b" : !torch.bool
  torch.attr "i" : !torch.int
  torch.attr "f" : !torch.float
  torch.attr "t" : !torch.tensor
  torch.method "get_tensor", @get_tensor
}
func.func private @get_tensor(%arg0: !torch.nn.Module<"c">) -> !torch.tensor {
  %2 = torch.prim.GetAttr %arg0["t"] : !torch.nn.Module<"c"> -> !torch.tensor
  return %2 : !torch.tensor
}

%true = torch.constant.bool true
%int3 = torch.constant.int 3
%float4.250000e01 = torch.constant.float 4.250000e+01
%0 = torch.tensor.literal(dense<1.000000e+00> : tensor<1xf32>) : !torch.tensor
%1 = torch.nn_module {
  torch.slot "b", %true : !torch.bool
  torch.slot "i", %int3 : !torch.int
  torch.slot "f", %float4.250000e01 : !torch.float
  torch.slot "t", %0 : !torch.tensor
} : !torch.nn.Module<"c">
```

See the documentation for the ops for more information on the semantics of this
form.


### Lowering TorchScript to the backend contract

The `torchscript-module-to-torch-backend-pipeline` contains the set of simplifications used convert TorchScript to the backend contract. At a high level, it consists of the following transformations:

1. GlobalizeObjectGraph: This takes the `IValue` object graph and converts it
   into a flat list of globals (see `torch.global_slot` and related ops).
1. LowerToBackendContract: This pass iteratively applies a simplification
   pipeline until the backend contract is reached. The simplification pipeline consists of:
   - Standard canonicalization.
   - Shape refinement. See [shape_lib.md](https://github.com/llvm/torch-mlir/blob/main/docs/shape_lib.md) for detail
   - DType refinement. See `RefineTypes`.
   - Decomposing ops into more primitive ops. See `DecomposeComplexOps`.

### Layering of the PyTorch Dependency

One of the core principles of our Torch-MLIR <-> PyTorch interop is that
anything that links against PyTorch must interact with MLIR through
[the Torch-MLIR C API](https://github.com/llvm/torch-mlir/tree/main/include/torch-mlir-c).
This bypasses a number of very complex dependency and shared library issues.

Additionally, we maintain the invariant that the core MLIR compiler code (in
`lib/` and `include/`) never has a build dependency on PyTorch itself. This
strict isolation avoids a number of complex dependency issues and ensures that
`torch-mlir-opt` and similar debugging tools always provide the excellent
development and debugging experience that MLIR developers expect. Sometimes,
certain highly stable enums and related logic must be shared with upstream
PyTorch, and for those we copy code from PyTorch into
[TorchUpstream.h](https://github.com/llvm/torch-mlir/blob/fde390c7669e29362b18388448ef2b188713383f/include/torch-mlir/Dialect/Torch/Utils/TorchUpstream.h#L13).

## The Backends

Torch-MLIR provides 3 built-in backends, which take the backend contract IR and
lower it to the requirements of each backend. The 3 backends are:

- [`linalg`](https://mlir.llvm.org/docs/Dialects/Linalg/) on tensors (+ `arith`,
  `tensor`, etc.)
- [TOSA](https://mlir.llvm.org/docs/Dialects/TOSA/)
- [MHLO](https://github.com/tensorflow/mlir-hlo)

### The Linalg Backend (Linalg-on-Tensors)

Code: https://github.com/llvm/torch-mlir/tree/main/lib/Conversion/TorchToLinalg

The Linalg-on-Tensors backend was the first backend that we added, and it is
still the most complete. It fully supports dynamic shapes (known number of
dimensions but arbitrary dynamic dimension sizes). Since linalg was originally
designed as a dialect for transformations, it can be too low-level for certain
consumers.

### The TOSA Backend

Code: https://github.com/llvm/torch-mlir/tree/main/lib/Conversion/TorchToTosa

The TOSA backend was the second backend that we added. It remains preferred by
many users (especially "hardware" or "hardware-adjacent" folks). Some of its characteristics are:
- It is tied to a [spec](https://www.mlplatform.org/tosa/tosa_spec.html) with a
  really clear "ISA-like" expository style that resonates with a lot of folks
- The coarse-grained named-op approach is a good match for the many compilers
  that are designed that way.
- It has really good support for quantization / integer data types.
- It has clear versioning/stability guarantees on the op semantics.
- It is extremely solid with static shapes (and many of its users only care
  about static shapes, so that's fine).

### The MHLO Backend

Code: https://github.com/llvm/torch-mlir/tree/main/lib/Conversion/TorchToMhlo

The MHLO backend was the third backend that we added, and it offers a reasonable
blend of the benefits of the other two.
- It is a coarse-grained named-op approach.
- It has a pretty clear spec for most of the ops (with a bit of mental
  translation and hoping that MHLO is the same as HLO):
  https://www.tensorflow.org/xla/operation_semantics
- It functionally supports dynamic shapes (though not as coherent and consistent
  as Linalg-on-Tensors, and the dynamic shape support falls outside the
  wonderful HLO docs above).
- It appears to be pretty tied to HLO (which is highly mature) so most of the op
  surface area doesn't change too much.
- It has a different set of principles than TOSA which tend to make it more
  expressive at the cost of having a larger abstraction gap from hardware. For
  example, TOSA limits (for highly considered reasons) the number of dimensions
  that certain operators can handle to 1D-4D, when from a purely algebraic
  perspective there isn't a good reason to not be more general. Similarly, more
  general forms of reduction and scatter also fall into MHLO nicely while
  TOSA's principles tend to bias it away from that.

### Backend Implementation

All the backends are implemented using the MLIR [Dialect Conversion
infrastructure](https://mlir.llvm.org/docs/DialectConversion/). This involves
converting the `torch` dialect types to other types, so we closely follow the
principes from the "Type Conversions the Not-So-Hard Way" talk
([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing),
[recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)).
We follow the standard `{include,lib}/Conversion/TorchTo*` convention used in
MLIR for conversion passes.

For type conversion, we provide
[BackendTypeConversion.cpp](https://github.com/llvm/torch-mlir/blob/57681f794764a34c34e2be7f07f7dfbcafa683c1/lib/Dialect/TorchConversion/Transforms/BackendTypeConversion.cpp#L1)
and
[BackendTypeConversionPasses.cpp](https://github.com/llvm/torch-mlir/blob/57681f794764a34c34e2be7f07f7dfbcafa683c1/lib/Dialect/TorchConversion/Transforms/BackendTypeConversionPasses.cpp#L1)
which provide a default conversion from `torch` dialect types to the builtin
`tensor` type and scalar integer/float types. These are not the right choice for
all backends, but can be copied and adapted by backends. These files closely
follow the "Type Conversions the Not-So-Hard Way" talk.


## Testing

See
[development.md](https://github.com/llvm/torch-mlir/blob/9c8b96272057f4f8210de5842b6952228434cfa2/development.md#testing)
for more details on running tests.

Torch-MLIR has two types of tests:

1. End-to-end execution tests. These compile and run a program and check the
   result against the expected output from execution on native Torch. These use
   a homegrown testing framework (see
   [framework.py](https://github.com/llvm/torch-mlir/blob/7d4a0d0e2b65c7ce8de19993f3b10ad5344fe32b/python/torch_mlir_e2e_test/torchscript/framework.py#L6))
   and the test suite lives at `python/torch_mlir_e2e_test/test_suite`.

2. Compiler and Python API unit tests. These use LLVM's `lit` testing framework.
   For example, these might involve using `torch-mlir-opt` to run a pass and
   check the output with `FileCheck`. `lit` is flexible enough to unit test
   various Python pieces, importers, and LTC this way as well.

### Why so much end-to-end testing?

Torch-MLIR places a heavy emphasis on end-to-end testing for the following reasons:

Reason 1: Even if a compiler pass produces the output IR that the author
expected, that output IR may not correctly implement the semantics of the op.
This is especially true for complex, often-poorly-specified deep learning
operators that Torch-MLIR is mainly concerned with. It is critical to run these
against the source of truth to ensure correct implementation.

Reason 2: There are many patterns in Torch-MLIR's backends that really just
expand one op into other ops without any real logic. When we started Torch-MLIR,
we were very religious about always having `.mlir` unit tests even for these
"macro expansion" patterns, but we found that these tests 1) Never caught a bug
2) Interfered with refactoring / caused spurious extra work (changing op syntax,
etc.). There is not much point to having a bunch of tests like this, which are
basically just rewriting the builder calls in a different syntax:

```
// MyPass.cpp
b.create<FooOp>(...)
b.create<BarOp>(...)

// test.mlir
// CHECK: foo
// CHECK: bar
```

Such a test is simply checking that the implementation of an op is the way it
is. There is no way to change the implementation while having the test pass. So
the test is fully redundant with the implementation.

Because of this, many Torch-MLIR patches adding support for new ops have no
`.mlir` unit tests, and only include end-to-end test(s). We generally make sure
that our end-to-end tests are as targeted as possible. As a result, when
debugging end-to-end test failures, the resulting reproducers (which our test
framework automaticaly produces for failures) are usually already fully reduced
test cases.

### Do's and Don'ts for unit vs end-to-end testing.

DO use an end-to-end test if you are implementing a new op or extending the
support for an existing op.

DO use a unit test if your lowering for an op has multiple cases / logic. This
also helps future maintainers of the lowering to see in one place all the
different edge cases of the op that you had to handle. (these can be easily
reduced out of all the end-to-end tests you added).

DON'T use a unit test if your lowering pattern could be described as a trivial
"macro expansion" of one op into another op or set of ops. That is, if you feel
like your unit test is just rewriting `b.create<...>(...)` into `CHECK: ...`
then it is probably not a useful unit test.

DON'T add a unit test for trivial changes to RefineTypes.

With the exceptions above, all changes should include appropriate unit tests, as
is standard in the LLVM and MLIR community. This includes full coverage of all
canonicalizations, pretty printing, passes, errors, and diagnostics.

### The RefBackend (Reference Backend)

In order to run end-to-end tests, Torch-MLIR needs an end-to-end flow.
Thankfully, upstream MLIR has just enough pieces to precariously put one
together that is enough for testing.

The RefBackend consists of a few minor
[C++ passes](https://github.com/llvm/torch-mlir/blob/114f48e96c578ee76a6f83b3aa4aa229a8d5b76e/include/torch-mlir/RefBackend/Passes.td#L1)
filling in some corners missing upstream and
[Python glue logic](https://github.com/llvm/torch-mlir/blob/114f48e96c578ee76a6f83b3aa4aa229a8d5b76e/python/torch_mlir_e2e_test/linalg_on_tensors_bakends/refbackend.py#L1)
to pull together upstream functionality into a working system.

The RefBackend accepts Linalg-on-Tensors as input. It mainly just bufferizes the
ops and lowers them to loops. Note that TOSA and MHLO support lowering to
Linalg-on-Tensors, so all our end-to-end testing bottoms out on RefBackend.

The RefBackend is absolutely not suitable for any production use case. It leaks
memory, doesn't support any error handling, performs no optimizations, and
probably a bunch of other horrible things. We are patiently awaiting for the
upstream MLIR community to produce a viable end-to-end flow with better
characteristics.