As [@ezyang suggested](https://github.com/pytorch/pytorch/issues/90276#issuecomment-1339791275),
use `torch._dynamo.optimizations.training.aot_autograd` instead of raw
`make_fx`. This is more future proof and gives us the backward pass and
functionalization. We don't currently get functionalization because of
https://github.com/pytorch/pytorch/issues/90759
This also incidentally fixes the source location handling, which makes
`lockstep_basic.py` give an accurate source location!
This was an experimental attempt at rolling out own op-by-op executor
with `__torch_dispatch__`, but it proved difficult to make it robust.
Op-by-op execution is very easy to implement robustly now with the
PyTorch 2.0 stack, so we don't need eager_mode.
Downstream users were using eager_mode to implement lockstep numerical
accuracy debuggers. We implemented the same functionality with
TorchDynamo in https://github.com/llvm/torch-mlir/pull/1681 so now there
is not much reason to continue maintaining it.
Thanks to TorchDynamo's great layering and design, this is only about
100 lines of code for a basic lockstep debugger.
This should allow us to deprecate eager_mode, since AFAIK the only
interesting use case that it was really supporting is for downstream users to
write lockstep debuggers.
NOTE: The exact reporting and interface here is subject to change. Please
try it out and provide feedback (or patches :) ).
- make_fx should not drop source locations: https://github.com/pytorch/pytorch/issues/90276
- Report tensors better (huge tensors should be summarized)
- Maybe don't abort, but just warn?
- Allow customizing atol/rtol.
- How best to print the failing node? And include surrounding graph
context?
For AoT deployments models often have multiple exported methods.
This patch enables something like this:
```
class TwoMethodsModule(torch.nn.Module):
def sin(self, x):
return torch.ops.aten.sin(x)
def cos(self, x):
return torch.ops.aten.cos(x)
example_args = torch_mlir.ExampleArgs()
example_args.add_method("sin", torch.ones(2, 3))
example_args.add_method("cos", torch.ones(2, 4))
print(torch_mlir.compile(TwoMethodsModule(), example_args))
```
In the
[long-term](https://github.com/llvm/torch-mlir/blob/main/docs/long_term_roadmap.md#tools-for-advanced-aot-deployments)
we will need to reconcile this with our story for stateful models and the
backend contract being purely functional. For now, this provides some basic
infra that seems harmless. Arguably, we could tighten up the backend contract
even more to only allow a single compiled function which would prohibit this or
require building out a layer above.
Fixes#1557
Allow customizing `backend_legal_ops` for "torch" output type, since we
don't know which backend will be used (it might be a custom backend).
We don't allow customizing the `backend_legal_ops` for the other output
types (Linalg, TOSA, MHLO) since those backends control their set of
legal ops directly.
Fixes#1418
* test: allow spaces in path to Python executable
On Windows, the path to the Python binary may contain spaces, so this
patch adds quotes around the path to the python executable.
Thanks to @sstamenova for suggesting the fix!
* python: remove header file that causes Windows build failures
Similar to https://reviews.llvm.org/D125284, we can safely remove this
header file without affecting the build on either Linux. It is
necessary to remove this header file on Windows builds since otherwise
it causes build errors.
* python: drop `TORCH_API` from function defined in Torch-MLIR
`TORCH_API` should apply to functions that are either exported by
libtorch.so or ones that are imported from libtorch.so by its downstream
consumers (like Torch-MLIR). Neither case applies to the
`importJitFunctionAsFuncOp()` function, since it is defined in
Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem
by dropping `TORCH_API` from that function's declaration.
* python: make output of class anotations deterministic
The `class-annotator-repr.py` test checks for class annotations in a
specific order, but prior to this patch, the order was
non-deterministic, since the code iterated on an _unordered_ map.
This patch makes the iteration order deterministic through two changes:
1. using a sorted map
2. using the class qualified name instead of the address of the class in
memory
* test: use Python3_EXECUTABLE as interpreter path for consistency
This ensures that tests use the Python3 version that was detected using
CMake, instead of whichever python version that happens to be in the
PATH variable when invoking the test.
* test: fix RUN string
The parenthesis syntax does not run on Windows (the shell interprets the
`(` character as part of the path). Moreover, the ODR violation in the
comment no longer seems to apply.
* python: port parallel test framework to Windows
Since Windows does not support `fork` natively, Python's
`multiprocessing` module needs to use `spawn` on Windows. However, to
use `spawn`, the multiprocessing module serializes (or pickles) the
worker function and its arguments. Sadly, the multiprocessing module
(both the default one in Python and the one that is extended in PyTorch)
is unable to serialize lambda functions (see
https://stackoverflow.com/a/19985580) for detals.
Unfortunately, given how our tests are structured, we require that the
function under test is passed as an argument to another function, so we
cannot sidestep our use of lambda functions.
To resolve this problem, this patch makes use of the `multiprocess` and
`dill` Python modules, which together offers a multiprocessing mechanism
that can serialize lambda functions. The multiprocess module also
offers a process pool, which simplifies the code for our parallel
testing framework.
We use it for more than TorchScript testing now. This is a purely
mechanical change to adjust some file paths to remove "torchscript".
The most perceptible change here is that now e2e tests are run with
```
./tools/e2e_test.sh
instead of:
./tools/torchscript_e2e_test.sh
```
* Propagate device data names
* Address PR comment
* Add example usage
* Add test for device data names
* Make TorchMlirComputation fields protected
* Add lazy backend device data name unit tests
* Disable lazy backend tests if LTC is disabled
* Add comments
In some cases, users know that a traced graph is valid for a wider set
of shapes than they originally traced it with. Provide an option for
users to ignore the shapes in the traced graph when they know it is
legal.
Fixes#997
use_tracing=True was behaving unexpectedly because the handling of
single arguments was happening after the torch.jit.trace call.
This also fixes the check to specifically test for a torch.Tensor or
TensorPlaceholder so that both lists and tuples would be correctly
handled.
We do this by inroducing a TensorPlaceholder class, which can be used to
specify dynamic sizes. Internally, we canonicalize all example inputs
to TensorPlaceholder's.
This commit also adds some basic testing, which was missing before.
NB: `shouldnt_normalize2` and `shouldnt_normalize3` currently XPASS i.e., args *will* successfully normalize despite being incorrect due to an [upstream bug](https://github.com/pytorch/pytorch/issues/75342).
- Split out TOSA in the CI.
- Add summary of unexpected test outcomes. This works better when there
are many XFAIL'ing tests, as it only prints out the error_str on
FAIL, not on XFAIL. Example here:
https://gist.github.com/silvasean/c7886ec7b3d35c21563cb09f7c3407da
This commit (with approval from all contributors) dual licenses
the torch-mlir project under both the standard LLVM license and the
standard PyTorch license. This will facilitate moving code between
torch-mlir and the two upstream projects.
The standard file comment is now:
```
// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// Also available under a BSD-style license. See LICENSE.
```
See `LICENSE` in the project root for the terms of both licenses.
This leaves no real code outside torch-mlir.
This also renames the "npcomp backend contract" to "linalg on tensors
backend contract" as the name of the abstraction layer that RefBackend
(IREE too) accepts.
Fixes https://github.com/llvm/mlir-npcomp/issues/311
The key change is that TorchPlugin is folded into
`torch_mlir.dialects.torch.importer.jit_ir` (it imports the PyTorch
JIT's IR, so that's a good, scoped name for it).
The CMake option `-DTORCH_MLIR_ENABLE_JIT_IR_IMPORTER=OFF` disables it,
which allows building without a PyTorch native dependency.
After this change, there are now just two subdirectories in the
`python_packages` directory in our combined build:
- `npcomp_core` with all the npcomp stuff
- `torch_mlir` with all the `torch-mlir` stuff.
The combined `torch_mlir` build will be packaged for use by `pip`.
There isn't anything super useful for wider use in `npcomp_core` so for
now we aren't going to package that one.
It just contained the e2e testing framework. We now fold it into the
main project to reduce complexity.
- `frontends/pytorch/python/` -> `python/torch_support`
- `frontends/pytorch/e2e_testing -> e2e_testing`
- `frontends/pytorch/examples -> examples`
- `frontends/pytorch/test` -> `python/test`
- `torch_mlir_torchscript` python module -> `npcomp_torchscript`
- `torch_mlir_torchscript_e2e_test_configs` python module ->
`npcomp_torchscript_e2e_test_configs`
This also changes the license of a handful of files from the
"pytorch-style" license to the regular LLVM/npcomp license. The only
people who committed to those files were myself and Yi.