This converts a basic list op (torch.prim.ListConstruct) to the IREE
dialect.
```
def forward(self, x: float):
return [x, x]
```
turns into:
```
builtin.func @forward(%arg0: !torch.float) -> !torch.list<!torch.float> {
%0 = torch.prim.ListConstruct %arg0, %arg0 : (!torch.float, !torch.float) -> !torch.list<!torch.float>
return %0 : !torch.list<!torch.float>
}
```
which turns into:
```
builtin.func @forward(%arg0: f64) -> !iree.list<f64> {
%c1 = constant 1 : index
%c0 = constant 0 : index
%c2 = constant 2 : index
%0 = iree.list.create %c2 : !iree.list<f64>
iree.list.set %0[%c0], %arg0 : !iree.list<f64>, f64
iree.list.set %0[%c1], %arg0 : !iree.list<f64>, f64
return %0 : !iree.list<f64>
}
```
As part of doing this, I realized that it was time to formalize the IR
form that we reach right before running TorchTo{Linalg,Std,...}. We now
call it the "Torch backend contract". We then lower the "Torch backend
contract" to the "npcomp backend contract", which involves the new
TorchConversion (`torch_c`) dialect, which holds ops that need to
operate on both the npcomp backend types (e.g. builtin tensors, i1, IREE
list, etc.) and the `!torch` types.
This made more sense, as I realized that if I didn't factor out
`torch_c` then the Torch dialect would have a dependency on IREE
dialect (we previously didn't notice this was an issue because we only
depended on `builtin` types), which seemed wrong to me.
Recommended review order:
- TorchToIREE.cpp / `TorchToIREE/basic.mlir`
- Look at the new structure of createTorchScriptToNpcompBackendPipeline.
It now lives in TorchConversion/Transforms/Passes.cpp and cleanly
calls into `Torch::createTorchScriptToTorchBackendPipeline` for the
frontend lowering to the Torch backend contract.
- Mechanical change extracting
`torch_c.{to,from}_{i1,i64,f64,builtin_tensor,iree_list}` into a new
TorchConversion dialect, and a few passes specific to the lowering
from the Torch backend contract to the npcomp backend contract.
- Minor fixes to TorchToLinalg.cpp to use unconverted operands (now that
we convert lists as part of operand materialization, we need to use
the original operands). Also added test for AtenMaxPool2dOp and fixed
m_TorchConstantIntList.
- TmpDeleteDeadIREELists pass. Temporary pass for deleting dead IREE lists that
are created as part of operand materialization for conv/max pool/avg pool ops
in TorchToLinalg.
With the following changes the compilation can continue until
RefineTypes pass:
- Add operators without ODS into `torch_ods_gen.py`
- Add some new optional and list types in `TorchTypes.td`
- Add some folders for aten int type comparator ops
- Modify GlobalizeObjectGraph.cpp. For global slots that's not used,
dont check if an aliased value is stored in more than one of global
slots. This can work around a failure where the same tensor is stored
in multiple "version" slots which are not used.
We plan on using these dialects "natively" as part of the npcomp backend
contract, and provide feedback to evolve them in IREE. Roughly speaking,
we can consider these dialects as "what's missing from upstream that we
think belongs in the general abstraction layer that npcomp's backend
contract targets".
We integrate them by just copying the relevant directory from the IREE
source tree (with `build_tools/update_iree_dialects.sh`). This avoids
adding IREE as a submodule, which is way too heavyweight (including
IREE itself, another copy of LLVM, TensorFlow, ...) and would give the
false impression of a source dependency rather than the lightweight (and
eventually versioned/stabilized) IR-level compatibility that we strive
for.
Most of the change is in the reporting code to give error messages that
are useful, and adjusting TraceItem to be semantically correct w.r.t.
Python's modeling of return values.
This allows writing a test like `ListLiteralModule_basic` for list
functionality, which we will soon be hooking up to IREE.
The IR for that test currently gets this far:
```
builtin.func @forward(%arg0: f64) -> !torch.list<!torch.float> {
%0 = torch.from_f64 %arg0
%1 = torch.prim.ListConstruct %0, %0 : (!torch.float, !torch.float) -> !torch.list<!torch.float>
return %1 : !torch.list<!torch.float>
}
```
It should be sufficient to just add a conversion of
`torch.prim.ListConstruct` (+ relevant type conversion) to necessary
IREE primitives.
For lists of *tensors* (rather than scalar floats), it gets more
complicated, as we need to deal with changing their element type to
ValueTensorType first (by default, they will all be NonValueTensorType).
It seems that IREE might have a type we can lower into for non-value
tensors as well, TBD.
This includes the following changes to import MT model into MLIR. There
are still a lot of work to for actual compilation.
- Add `torch.dict<>`, `torch.any`, `torch.number` types
- Add `torch.prim.DictConstruct` op
- Fix `torch.prim.TupleConstruct` op assembly format to include resulting types
This takes the example from torchscript_resnet18_e2e.py and puts it into
a slightly cleaned up notebook form.
It's still a little rough around the edges. Areas for improvement:
- Installation / setup.
- API usability.
Also,
- Add `npcomp-backend-to-iree-frontend-pipeline` since we will be adding
more stuff there.
- Slight cleanups.
To use, do `ninja npcomp-lsp-server`, copy `build/bin/npcomp-lsp-server`
into your PATH somewhere, and then add
```
"mlir.server_path": "npcomp-lsp-server",
```
to your settings.json.
Also bump llvm-project to 2d9759c7902c5cbc9a7e3ab623321d5578d51687 to
bring in latest `mlir-lsp-server` changes.
- torch.aten.flatten.using_ints to linalg lowering
- torch.aten.max_pool2d to linalg lowering
- Support torch.aten.conv2d for more flexible dilation and strides values
The tests use the same (pure-Python) test framework as the
normal torchscript_e2e_test.sh, but the tests are added in
`build_tools/torchscript_e2e_heavydep_tests` instead of
`frontends/pytorch/e2e_testing/torchscript`. Any needed dependencies can
easily be configured in generate_serialized_tests.sh.
We add an initial machine translation model with a complex set of
dependencies to seed the curriculum there. I verified that this model
gets to the point of MLIR import (it fails there with a segfault due to
not being able to import the "Any" type).
This required moving a few files from the `torch_mlir` Python module
into multiple modules to isolate the code that depends on our C++
extensions (which now live in `torch_mlir` and
`torch_mlir_torchscript_e2e_test_configs`) from the pure Python code
(which now lives in `torch_mlir_torchscript`). This is an entirely
mechanical change, and lots of imports needed to be updated.
The dependency graph is:
```
torch_mlir_torchscript_e2e_test_configs
/ |
/ |
/ |
V V
torch_mlir_torchscript torch_mlir
```
The `torch_mlir_torchscript_e2e_test_configs` are then dependency-injected
into the `torch_mlir_torchscript` modules to successfully assemble a
working test harness (the code was already structured this way, but this
new file organization allows the isolation from C++ code to actually
happen). This isolation is critical to allowing the serialized programs
to be transported across PyTorch versions and for the test harness to be
used seamlessly to generate the heavydep tests.
Also:
- Extend `_Tracer` class to support nested property (submodule) accesses.
Recommended review order:
- "user-level" docs in README.md
- code in `build_tools/torchscript_e2e_heavydep_tests`.
- changes in `torch_mlir_torchscript/e2e_test/framework.py`
- misc mechanical changes.
These were legacy concepts that are now superceded by direct Torch to
linalg-on-tensors lowering. These were based on some very early thinking
related to the layering of frontends vs codegen, which is now obsolete
because:
- We expected a lot more centralization at the frontend (TCF) level. It
turns out that frontend needs really vary a lot, and there is no grand
unifying TCF dialect plausible. The additional layer isn't worth it.
- Linalg-on-tensors obsoletes the primary need for TCP. There are still
a few things not representable with linalg-on-tensors, but the support
is growing and the whole "not included in linalg-on-tensors" direction
needs to be rethought. Our TCP dialect didn't cover any of the
actually important things in this space (such as sort, FFT, top-k,
etc.).
See historical [slides](https://drive.google.com/file/d/1iljcpTQ5NPaMfGpoPDFml1XkYxjK_6A4/view) / [recording](https://drive.google.com/file/d/1jSPa8TwPKUt0WuLquGc8OgSUVYJHMvWZ/view)
for more details on the origin story here.
Their presence was confusing users too
[bug](https://github.com/llvm/mlir-npcomp/issues/248).
Also,
- Trim down npcomp-run-mlir testing. It was testing TCF to TCP
lowering for the most part. The essential stuff is retained and
rephrased with linalg-on-tensors. (we should probably rename it
"refback-run" or something, as it is just a way to invoke RefBackend)
- test/Python/Backend/RefJIT/simple_invoke_numpy.py is XFAIL'ed. Our
"anti-framework" direction seems to be the likely future path.
`Conv2dNoPaddingModule_basic` and `Conv2dWithPaddingModule_basic` start
failing because of results accuracy after changing conv_2d linalg ops
from tc ops to yaml ops.
* Adds a minimal setup.py for frontends/pytorch
* Makes npcomp-core export its headers and libraries
* Adds a script to build packages.
* Adds CI step to package and smoke test.
* Will need some more tweaks and coordination prior to deploying (version locking etc).
* Change aligned_alloc -> malloc. It can fail (and does on MacOS) and is a bit over-aggressive optimization for a reference backend.
* Fixed a fragile test that prints -0.0 on MacOS.
* Fail the test (not the framework) on failure to trace (Torch on MacOS is missing features).
* Fix .so -> .dylib for compiler runtime.
* Added additional *ToLLVM conversion patterns (they were disaggregated from standard).
* Misc renames.
* Spelling change on ConvNCHW op, and it now expects strides and dilations attributes.
- Build adjustments for `.cpp.inc` dialect files.
- Renaming of `memref.dim` to `tensor.dim` for tensor case.
Minor changes:
- Renaming of `mlir::linalg::ReassociationIndices` to
`mlir::ReassociationIndices`.
- Adjust command line option parsing in npcomp-run-mlir.
This includes IREE and RefBackend.
This includes a fixup to torchscript_e2e_test.sh for handling the
situation where PYTHONPATH was not already exported.
I'm seeing the following error:
```
CMake Error in frontends/pytorch/csrc/CMakeLists.txt:
Imported target "torch" includes non-existent path
"/usr/local/include/breakpad"
in its INTERFACE_INCLUDE_DIRECTORIES.
```
Reported upstream in: https://github.com/pytorch/pytorch/issues/60485
- Add support for "expected failures" in test reporting. The new error
reports look like
[this](https://gist.github.com/silvasean/6ffd95e1d55302b699673da201da210d).
- We will now be able to put these tests into CI, since the harness
understand which tests are expected to pass and fail.
- Refactor RefBackendTestConfig to NpcompBackendTestConfig which
supports both RefBackend and IREE.
- Add instructions for installing IREE dependencies (both from packages
and for local builds of IREE)
- Add `tools/torchscript_e2e_test.sh` for invoking the e2e test
harness (this makes invoking a bit easier, as it doesn't rely on a
loose Python invocation).
We plumb through e2e a fair number of interesting cases:
- unary, binary, ternary elementwise ops
- ops like `torch.aten.add.Tensor` that also take a scalar parameter
- static size-1 broadcasting
We allow the static size-1 broadcasting case, but emit a runtime error
in the case of dynamic size-1 broadcasting. This seems like a sweet spot
subset of things that can be lowered directly to linalg, while not being
overly constraining to users. This is consistent with what IREE is doing
for CHLO->Linalg lowering as well
([code](50bf7a87e4/iree/compiler/InputConversion/MHLO/BroadcastingToLinalgPatterns.cpp (L1))).
To test the static size-1 case, we added support for the
`torch.aten.unsqueeze` op and lowering for it through
`linalg.tensor_expand_shape`. This involved a generalization of
`MaximizeValueSemantics` able to handle it (the solution there also
works for `torch.aten.flatten.using_ints` which we need for ResNet
anyway)
Also, a few minor additional changes:
- Add `VerifyInvariantsBeforeBackendLowering` pass, which catches a
large class of errors before we get to backend lowering (now that we
are doing dialect conversion, the errors are way nicer if we just emit
them up front rather than in the guts of a random pattern).
- Minor change to RefBackend to allow `linalg.tensor_expand_shape`.
Recommended review order:
- e2e tests in elementwise.py
- `ConvertElementwiseOp` in TorchToLinalg.cpp + elementwise.mlir test
- `ConvertAtenUnsqueezeOp` in TorchToLinalg.cpp + unsqueeze.mlir test
- RefineTypes.cpp + tests
- MaximizeValueSemantics changes + test
- VerifyInvariantsBeforeBackendLowering pass + test
This adds a pattern to MaximizeValueSemantics which does a simple
abstract interpretation within a block, which handles simple cases of
`torch.overwrite_tensor`, enough to remove all the unnecessary uses of
non-value tensors in ResNet right now.
Before/after IR:
[gist](https://gist.github.com/silvasean/a3e1ef625b19dfc63579f73cd3b543b6)
Also,
- Split `torch.copy.tensor` into `torch.copy.to_tensor` and
`torch.copy.to_vtensor` which convert between value and non-value
semantic tensors. This is a much cleaner factorization as they have
very separate use cases and properties (e.g. different side effects)
- Remove the various canonicalization patterns they had, which were
confusing because they resulted in limited forms of maximizing value
semantics throughout the pipeline. We should structure our compilation
pipeline such that only MaximizeValueSemantics should be maximizing
value semantics.
- Adjust pass pipeline to only run MaximizeValueSemantics once.
- Make OverwriteTensorOp `$value` always be a value tensor and
`$overwritten` be a non-value tensor.
1. Added a simplified version of torch.aten.batch_norm which only handles
inference and assumes the weight, bias, running_mean, running_var are not
None.
2. Removed the primitive types check in verifyLinalgCompatibleTypes check
since now we have proper type converter to handle torch types conversion.
The checks for RankedTensorType is kept because the type converter
doesn't guarantee the converted builtin tensor type is ranked. A
separate verification pass to verify the invariant expected by later
passes will need to be added before those can be removed as well.