A few remain in examples/docs that will be naturally be updated in due
time.
This regresses the list support and the general direction of more widely
supported control flow, lists/dicts/globals that we were going for with
the TorchScript path. The idea is that we are deferring that work to
make torch-mlir a very clean standalone thing. We will reboot it,
probably using some of the tools of iree_pydm to make it simpler, and in
a more natural place (such as an iree-torch repo that depends on IREE and
torch-mlir to build a working PyTorch frontend solution for IREE -- it
was really weird that npcomp depended on IREE).
This plumbs through a vertical slice of support for lists.
The main chunk of new code here is AnnotateABIPass which captures the
program signature at the Torch backend contract layer, right before we
start `TorchConversion`. The `TorchConversion` lowering process is lossy
w.r.t. types, so it's necessary to do this for all targets in general.
Like using `!iree.list` directly, we use IREE's ABI annotation
representation for this, although there is nothing very IREE-specific
about it (see
https://github.com/google/iree/blob/main/docs/developers/design_docs/function_abi.md)
We change `ListLiteralModule_basic` to use `!torch.int` because IREE
doesn't support f64 yet (and we don't yet have a way for users to say
that they want `!torch.float` to lower as f32).
Recommended review order:
- AnnotateABIPass and tests
- Arg marshaling in npcomp_backend.py and `iree.py`
- Updates to `list_programs.py` / `xfail_sets.py`
- Moving DeleteDeadIREEListsPass to Backend/Common, so that backends
that don't support lists can use it. RefBackend uses that pass, for
example.
This converts a basic list op (torch.prim.ListConstruct) to the IREE
dialect.
```
def forward(self, x: float):
return [x, x]
```
turns into:
```
builtin.func @forward(%arg0: !torch.float) -> !torch.list<!torch.float> {
%0 = torch.prim.ListConstruct %arg0, %arg0 : (!torch.float, !torch.float) -> !torch.list<!torch.float>
return %0 : !torch.list<!torch.float>
}
```
which turns into:
```
builtin.func @forward(%arg0: f64) -> !iree.list<f64> {
%c1 = constant 1 : index
%c0 = constant 0 : index
%c2 = constant 2 : index
%0 = iree.list.create %c2 : !iree.list<f64>
iree.list.set %0[%c0], %arg0 : !iree.list<f64>, f64
iree.list.set %0[%c1], %arg0 : !iree.list<f64>, f64
return %0 : !iree.list<f64>
}
```
As part of doing this, I realized that it was time to formalize the IR
form that we reach right before running TorchTo{Linalg,Std,...}. We now
call it the "Torch backend contract". We then lower the "Torch backend
contract" to the "npcomp backend contract", which involves the new
TorchConversion (`torch_c`) dialect, which holds ops that need to
operate on both the npcomp backend types (e.g. builtin tensors, i1, IREE
list, etc.) and the `!torch` types.
This made more sense, as I realized that if I didn't factor out
`torch_c` then the Torch dialect would have a dependency on IREE
dialect (we previously didn't notice this was an issue because we only
depended on `builtin` types), which seemed wrong to me.
Recommended review order:
- TorchToIREE.cpp / `TorchToIREE/basic.mlir`
- Look at the new structure of createTorchScriptToNpcompBackendPipeline.
It now lives in TorchConversion/Transforms/Passes.cpp and cleanly
calls into `Torch::createTorchScriptToTorchBackendPipeline` for the
frontend lowering to the Torch backend contract.
- Mechanical change extracting
`torch_c.{to,from}_{i1,i64,f64,builtin_tensor,iree_list}` into a new
TorchConversion dialect, and a few passes specific to the lowering
from the Torch backend contract to the npcomp backend contract.
- Minor fixes to TorchToLinalg.cpp to use unconverted operands (now that
we convert lists as part of operand materialization, we need to use
the original operands). Also added test for AtenMaxPool2dOp and fixed
m_TorchConstantIntList.
- TmpDeleteDeadIREELists pass. Temporary pass for deleting dead IREE lists that
are created as part of operand materialization for conv/max pool/avg pool ops
in TorchToLinalg.
- Build adjustments for `.cpp.inc` dialect files.
- Renaming of `memref.dim` to `tensor.dim` for tensor case.
Minor changes:
- Renaming of `mlir::linalg::ReassociationIndices` to
`mlir::ReassociationIndices`.
- Adjust command line option parsing in npcomp-run-mlir.
This now gives [much nicer output](https://gist.github.com/silvasean/f048e0f37b04542dae6469b86802bb3e).
Embarrassingly, we previously couldn't even report failures for two
different tests, and weren't able to report on compilation failures
(besides just crashing).
- Move frontend lowering pipelines to c++ (this helps with reproducing
failures in npcomp-opt)
- Add debugging printouts when compilation fails on RefBackendTestConfig
The experience now when a test fails during MLIR lowering is now like this:
```
NPCOMP TorchScript Object Graph IR -> NPCOMP Backend IR lowering failed with the following diagnostics:
failed to legalize operation 'torch.global_slot'
Module does not conform to npcomp's backend contract. See dialect conversion legality information above.
Error can be reproduced with:
$ npcomp-opt -torchscript-to-npcomp-backend-pipeline /tmp/ResNet18Module.mlir
```
And when TorchScript->MLIR import fails it looks like this:
```
PyTorch TorchScript module -> NPCOMP Object Graph IR import failed with the following diagnostics:
unhandled prim operation: %18 : int = prim::min(%17) # /usr/local/google/home/silvasean/.local/lib/python3.9/site-packages/torch/nn/functional.py:4532:4
```
Also,
- Add `--filter=<regex>` to e2e test harness to filter tests.
- Add a few prim ops that were needed to import ResNet18
- Fix torch.prim.Loop.condition assemblyFormat (it previously would not
round-trip in the case of no loop-carried variables)
This pass verifies that a given module satisfies the contract that we
have for backends. This is phrased as an "allowlist", because we want to
keep this interface tight. Also, this gives much better diagnostics than
a backend randomly crashing or failing to compile would (though they
could still be improved).
This was especially painful because if we had
`tensor<?x!numpy.any_dtype>` slip through, at some point RefBackend
would convert it to a memref type and trip the "verify type invariants"
assertion which gives no location or anything and crashed the process,
which was very unpleasant.
We implement this with the dialect conversion framework, which works
reasonably well and was quick to put together and familiar, but is still
very "op oriented". We probably want to make this hand-rolled
eventually, especially the error reporting (the most useful kind of
error for a dialect conversion user is not necessarily the best for this
use case). Also, in production, these error will go to users, and need
to be surfaced carefully such as "the compiler needs a type annotation
on this function parameter" which in general requires some special
analysis, wordsmithing, and overall awareness of the e2e use case (such
as how much we can lean into certain source locations) to provide a
meaningful user-level diagnostic.
Also, add `inline` to the current frontend lowering pass pipeline to
allow slightly more complicated programs that otherwise would fail on
shape inference.