This commit adds decomposition support into the core aten operators
before importing the module from torch.
Also, this commit deals with the lifted tensor constants in
torch.export.export(). We don't want to add unnecessary placeholder
nodes in the graph (extra args in the block module), and should treat
them like the constants that they are. The unnecessary clone is also
removed for max efficiency.
Link to related RFC:
https://discourse.llvm.org/t/rfc-rename-torch-mlir-compile-apis-and-introduce-fx-based-analogs/76646
This commit updates the documentation, tests, CMake files, and API for
the proposed changes in the RFC. There is a new torch_mlir/fx.py for
user level APIs related to importing modules and a corresponding test
for this path can be found at test/python/fx_importer/basic_test.py.
---------
Co-authored-by: MaheshRavishankar <mravisha@amd.com>
Per the RFC and numerous conversations on Discord, this rebuilds the
torch-mlir CI and discontinues the infra and coupling to the binary
releases
(https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371).
I iterated on this to get latency back to about what it was with the old
(much larger and non-ephemeral) runners: About 4m - 4.5m for an
incremental change.
Behind the scenes changes:
* Uses a new runner pool operated by AMD. It is currently set to manual
scaling and has two runners (32-core, 64GiB RAM) while we get some
traction. We can either fiddle with some auto-scaling or use a schedule
to give it an increase during certain high traffic hours.
* Builds are now completely isolated and cannot have run-to-run
interference like we were getting before (i.e. lock file/permissions
stuff).
* The GHA runner is installed directly into a manylinux 2.28 container
with upgraded dev tools. This eliminates the need to do sub-invocations
of docker on Linux in order to run on the same OS that is used to build
wheels.
* While not using it now, this setup was cloned from another project
that posts the built artifacts to the job and fans out testing. Might be
useful here later.
* Uses a special git cache that lets us have ephemeral runners and still
check out the repo and deps (incl. llvm) in ~13s.
* Running in an Azure VM Scale Set.
In-repo changes:
* Disables (but does not yet delete):
* Old buildAndTest.yml jobs
* releaseSnapshotPackage.yml
* Adds a new `ci.yml` pipeline and scripts the steps in `build_tools/ci`
(by decomposing the existing `build_linux_packages.sh` for in-tree
builds and modularizing it a bit better).
* Test framework changes:
* Adds a `TORCH_MLIR_TEST_CONCURRENCY` env var that can be used to bound
the multiprocess concurrency. Ended up not using this in the final
version but is useful to have as a knob.
* Changes the default concurrency to `nproc * 0.8 + 1` vs `nproc * 1.1`.
We're running on systems with significantly less virtual memory and I
did a bit of fiddling to find a good tradeoff.
* Changed multiprocess mode to spawn instead of fork. Otherwise, I was
getting instability (as discussed on discord).
* Added MLIR configuration to disable multithreaded contexts globally
for the project. Constantly spawning `nproc * nproc` threads (more than
that actually) was OOM'ing.
* Added a test timeout of 5 minutes. If a multiprocess worker crashes,
the framework can get wedged indefinitely (and then will just be reaped
after multiple hours). We should fix this, but this at least keeps the
CI pool from wedging with stuck jobs.
Functional changes needing followup:
* No matter what I did, I couldn't get the LTC tests to work, and I'm
not 100% sure they were being run in the old setup as the scripts were a
bit twisty. I disabled them and left a comment.
* Dropped out-of-tree build variants. These were not providing much
signal and increase CI needs by 50%.
* Dropped MacOS and Windows builds. Now that we are "just a library" and
not building releases, there is less pressure to test these commit by
commit. Further, since we bump torch-mlir to known good commits on these
platforms, it has been a long time since either of these jobs have
provided much signal (and they take ~an hour+ to run). We can add them
back later post-submit if ever needed.
Changes made during upstreaming:
* Removed comments attributing some copied code back to torch-mlir
(since it is now repatriated).
* Re-organized imports.
* Inlined RefMapping/RefTracker and TypeSubclassMap from an external
utility module.
* Added FxImporter class comments.
* Updated stack trace extraction to be fail safe.
* Added an entry-point for `import_frozen_exported_program` which uses
the shiny new upstream `torch.export.export()` API (versus the
lower-level/older API that Turbine is presently using). This
necessitated a small FX rewrite to line external state management up
with current conventions.
* Adapted one of Turbine's importer tests to go with this initial
submission. Turbine unfortunately has a lot of more-integration-ey
tests, and I would like to extract those as more of unit tests of the
importer features and upstream them that way vs trying to copy directly.
For now, one overall test with the initial submission gets us moving.
I acknowledge that there are some code quality things that could be
improved in this submission: this was authored over the course of many
months (and often via some trial and error). I would like to keep it
relatively converged with the downstream for the next few steps while
getting the test suite upstreamed. And then it will be easier to take a
hygienic pass through the code.
Including co-authors for contributors in the git log of the original
repository.
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: Avinash Sharma <aviator1994@gmail.com>
Co-authored-by: Arham Khan <arhammkhan@gmail.com>
Co-authored-by: brucekimrokcmu <kwangkyk@alumni.cmu.edu>
Co-authored-by: saienduri <77521230+saienduri@users.noreply.github.com>
Simple Python console script to import an ONNX protobuf to the torch
dialect for additional processing.
For installed wheels, this can be used with something like:
```
torch-mlir-import-onnx test/python/onnx_importer/LeakyReLU.onnx
```
Or from a dev setup:
```
python -m torch_mlir.tools.import_onnx ...
```
This is part 1 of 2, which will also include upstreaming the FX
importer. I started with ONNX because it forces some project layout
updates and is more self contained/easier as a first step.
Deviating somewhat from the RFCs on project layout, I made the following
decisions:
* Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks
already has opened up that namespace and it seemed to fit. Better to
have fewer things at that level.
* Setup the build so that the root project only contains MLIR Python and
pure Python deps (like the importers), but this can be augmented with
the `projects/` adding more depending on which features are enabled.
* The default build continues to build everything whereas in
`TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a
`torch-mlir-core` wheel with the pure contents only.
`onnx_importer.py` and `importer_smoke_test.py` are almost verbatim
copies from SHARK-Turbine. I made some minor local alterations to adapt
to paths and generalize the way they interact with the outer project. I
expect I can copy these back to Turbine verbatim from here. I also
updated the license boilerplate (they have the same license but slightly
different project norms for the headers) but retained the correct
copyright.
Other updates:
* Added the ONNX importer unit test (which also can generate test data)
in lit, conditioned on the availability of the Python `onnx` package. In
a followup once I know everything is stable, I'll add another env var
that the CI can set to always enable this so we know conclusively if
tests pass.
* Moved the ONNX conversion readme to `docs/`.
* Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` ->
`TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the
JitIR importer and LTC options `cmake_dependent_options` for robustness.
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20
There are two primary goals:
1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.
Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.
This also takes steps toward a couple of other layering enhancements:
* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.
It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.
Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
* [custom op] Generalize shape library logic to work with dtypes
This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.
For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
This was an experimental attempt at rolling out own op-by-op executor
with `__torch_dispatch__`, but it proved difficult to make it robust.
Op-by-op execution is very easy to implement robustly now with the
PyTorch 2.0 stack, so we don't need eager_mode.
Downstream users were using eager_mode to implement lockstep numerical
accuracy debuggers. We implemented the same functionality with
TorchDynamo in https://github.com/llvm/torch-mlir/pull/1681 so now there
is not much reason to continue maintaining it.
This adds a basic e2e Config for TorchDynamo using
Linalg-on-Tensors/RefBackend.
But TorchDynamo is pretty orthogonal to
various other pieces, so it should compose nicely with variations like:
- Switching out all the backends (Linalg-on-Tensors, TOSA, MHLO)
- PyTorch functionalization and decompositions
- Taking the example inputs and compiling with all dynamic or all static
shapes without duplicating tests.
This adds it to the CI, but there are still a lot of XFAIL's.
This also adds a helper `from torch_mlir.dynamo import
make_simple_dynamo_backend` which simplifies some of the steps for
making a Torch-MLIR-based TorchDynamo backend. We include "simple" in
the name because we are going to be exploring various things next from
the long-term roadmap.
The next steps are:
- Burn down all the XFAIL's.
- Start working on the pieces from the [long-term roadmap](https://github.com/llvm/torch-mlir/blob/main/docs/long_term_roadmap.md).
- Add functionalization/decompositions into the TorchDynamo flow and
remove reliance on the current Torch-MLIR "frontend".
- Write a pure-Python direct FX->MLIR importer.
- Hook up the new PyTorch symbolic shape stuff.
- Explore PrimTorch decompositions for simplifying backends.
* ci: cache PyTorch source builds
This patch reduces the time spent in regular CI builds by caching
PyTorch source builds. Specifically, this patch:
1. Makes CI lookup the cache entry for the PyTorch commit hash in
pytorch-version.txt
2. If lookup was successful, CI fetches the previously-generated WHL
file into the build_tools/python/wheelhouse directory
3. CI sets the `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` variable to `true`
4. The build_libtorch.sh script then uses the downloaded WHL file
instead of rebuilding PyTorch
* ci: warm up PyTorch source cache during daily RollPyTorch action
This patch makes the RollPyTorch action write the updated WHL file to
the cache, so that it can be later retrieved by CI that runs for each
PR. We deliberately add the caching step to the end of the action since
the RollPyTorch action never needs to read from the cache, although
executing this step earlier in the process should not cause problems
either.
* mac m1 cross compile
Add support for M1 cross compile
* Remove redundant ExecutionEngine
It is registered as part of RegisterEverything
* nuke non-universal zstd
disable LTC
* Replace CHECK_EQ with TORCH_CHECK_EQ
* Check value of TORCH_MLIR_USE_INSTALLED_PYTORCH during LTC build
* Update LTC XFAIL with NewZerosModule ops
* Explicitly blacklist _like ops
* Automatically blacklist new_/_like ops
* Prune away unused Python dependencies from LTC
* Add flag to disable LTC
* Autogen dummy _REFERENCE_LAZY_BACKEND library when LTC is disabled
* Implement compute_shape_var
* Removed Var tests from XFAIL Set
* XFAIL tests using _local_scalar_dense or index.Tensor
* Add StdDim tests to XFAIL set
* Autogen aten::cat
* Changed Example MLIR backend to Reference MLIR backend
* Moved reference_ltc_backend into csrc
* Merged sys_utils.h
* Renamed reference_ltc_backend to reference_lazy_backend
* Addressed review comments
* Update docs with new library name
* Removed _REFERENCE_LAZY_BACKEND from .gitignore
* Added reference_lazy_backend to the TorchMLIRPythonModules dependency list
Fixed typo in `ltc_examples.md`
Missed instance where `ltc_backend` was used instead of `lazy_backend`.
This enables building Pytorch from source in the CI.
The build should mostly hit the ccache.
Release builds will follow once we have some runtime on the CI.
In the interest of merging upstream LLVM quickly, a previous patch
(7f08169) updated the torch-mlir build to register all dialects and
passes through Python bindings. This patch limits the dialects and
passes to only those that are used in torch-mlir.
Key to this change are the removal of
`MLIRPythonExtension.RegisterEverything` and the introduction of a new
Python module (`_mlir_libs/_site_initialize_0.py`), where we register
the dialects and passes used by torch-mlir.
This patch makes some rudimentary changes to torch-mlir's use of MLIR
Python bindings to work with the most recent LLVM code. We can perhaps
do better by being more selective in what we link against, instead of
using `MLIRPythonExtension.RegisterEverything`.
Remove all the libtorch downloads. If the user sets
-DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF then just build from src.
Doesn't change developer workflow since we still default to local
PyTorch versions.
TEST: Build and verify all tests (except one xfail quant) pass on linux
On my local machine, `unzip` didn't exist (producing a "command not
found" error), but CMake ignored the error. Although the build did
succeed (because it found a previously-built version of libtorch), it
seems better to abort builds on such failures, so this patch checks the
return code of all external process invocations.
Along similar lines, this patch also updates the shell scripts in
`build_tools` to extensively use double-quoting to prevent unintentional
word splitting or globbing. Since some of the scripts execute `rm`
while using shell variables, this patch also adds the preamble `set -u`
to abort execution if an undefined variable is referenced, so that we
reduce the chances of executing `rm -rf /` if the path expression
happens to refer to an undefined variable.
Add an option to cache libtorch/ releases if you don't want to
download the latest. Add an option to enable source builds.
TESTS:
macOS: verify with / without cache downloads
verify source builds -- shared and static
Linux: Build Tests and Release builds
PyTorch allows new operators to be registered dynamically in modules.
Torch-mlir already makes it fairly straightforward to add support for
new operators, and this commit just extends that support to allow new
PyTorch ops to come from a external module.
This does *not* allow ops to be dynamically loaded into torch-mlir.
Torch-mlir must still be compiled with support built-in.
Add a `_torch_mlir_custom_op_example` subpackage to `torch_mlir` which
registers an demonstration op. It will not be imported by default when
importing torch_mlir. It's strictly for testing and documentation.
Adds an end-to-end test for the `torch_mlir_custom_op_example::identity` op.
With all these changes, we should now be actively testing PyTorch extension
support with all future patches.
This makes it much easier to convert models and hides all the
ClassAnnotator complexity.
This also adds a new example `torchscript_resnet18_all_output_types.py`
which shows the ResNet18 IR for all output types.
Also,
- This moves `run_pipeline_with_repro_report` to
`torch_mlir.compiler_utils`.
Effectively, this mode works by compiling op by op as the NN is eagerly executed by PyTorch. Entailed in that compilation is building a representation of the op that can be `torch.jit.script`ed, importing using `ModuleBuilder`, and then executing (e.g., with `RefBackendLinalgOnTensorsBackend`). This mode includes a fallback to conventional PyTorch if anything in the torch-mlir compilation process fails (e.g., unsupported op).
Currently, all e2e tests pass execpt for two that involve an upstream PyTorch bug (https://github.com/pytorch/pytorch/issues/74400).
High priority next steps:
1. A compile cache in order to speed up reruns of the same NN.
2. Integration with IREE (though not in this repo).
3. Integration with `torch.distributed`.
* Also adds a requirements.txt and updates docs to reference it versus stringy pip install.
* Adds doc with instructions on creating a wheel.
Fixes#370
This leaves no real code outside torch-mlir.
This also renames the "npcomp backend contract" to "linalg on tensors
backend contract" as the name of the abstraction layer that RefBackend
(IREE too) accepts.
Fixes https://github.com/llvm/mlir-npcomp/issues/311
The key change is that TorchPlugin is folded into
`torch_mlir.dialects.torch.importer.jit_ir` (it imports the PyTorch
JIT's IR, so that's a good, scoped name for it).
The CMake option `-DTORCH_MLIR_ENABLE_JIT_IR_IMPORTER=OFF` disables it,
which allows building without a PyTorch native dependency.
It just contained the e2e testing framework. We now fold it into the
main project to reduce complexity.
- `frontends/pytorch/python/` -> `python/torch_support`
- `frontends/pytorch/e2e_testing -> e2e_testing`
- `frontends/pytorch/examples -> examples`
- `frontends/pytorch/test` -> `python/test`
- `torch_mlir_torchscript` python module -> `npcomp_torchscript`
- `torch_mlir_torchscript_e2e_test_configs` python module ->
`npcomp_torchscript_e2e_test_configs`
This also changes the license of a handful of files from the
"pytorch-style" license to the regular LLVM/npcomp license. The only
people who committed to those files were myself and Yi.
This moves the bulk of the Python code (including the Torch interop)
from `frontends/pytorch` into `torch-mlir/TorchPlugin`. This also
required reconciling a bunch of other Python-related stuff, like the
`torch` dialects.
As I did this, it was simpler to just remove all the old numpy/basicpy
stuff because we were going to delete it anyway and it was faster than
debugging an intermediate state that would only last O(days) anyway.
torch-mlir has two top-level python packages (built into the
`python_packages` directory):
- `torch_mlir_dialects`: `torch` dialect Python bindings (does not
depend on PyTorch). This also involves building the aggregate CAPI for
`torch-mlir`.
- `torch_mlir`: bindings to the part of the code that links against
PyTorch (or C++ code that transitively does).
Additionally, there remain two more Python packages in npcomp (but
outside `torch-mlir`):
- `npcomp_torch`: Contains the e2e test framework and testing configs
that plug into RefBackend and IREE.
- `npcomp_core`: Contains the low-level interfaces to RefBackend and
IREE that `npcomp_torch` uses, along with its own
`MLIR_PYTHON_PACKAGE_PREFIX=npcomp.` aggregation of the core MLIR
python bindings. (all other functionality has been stripped out)
After all the basicpy/numpy deletions, the `npcomp` C++ code is now very
tiny. It basically just contains RefBackend and the `TorchConversion`
dialect/passes (e.g. `TorchToLinalg.cpp`).
Correspondingly, there are now 4 main testing targets paralleling the
Python layering (which is reflective of the deeper underlying dependency
structure)
- `check-torch-mlir`: checks the `torch-mlir` pure MLIR C++ code.
- `check-torch-mlir-plugin`: checks the code in `TorchPlugin` (e.g.
TorchScript import)
- `check-frontends-pytorch`: Checks the little code we have in
`frontends/pytorch` -- mainly things related to the e2e framework
itself.
- `check-npcomp`: Checks the pure MLIR C++ code inside npcomp.
There is a target `check-npcomp-all` that runs all of them.
The `torch-mlir/build_standalone.sh` script does a standalone build of
`torch-mlir`.
The e2e tests (`tools/torchscript_e2e_test.sh`) are working too.
The update_torch_ods script now lives in
`torch-mlir/build_tools/update_torch_ods.sh` and expects a standalone
build.
This change also required a fix upstream related to cross-shlib Python
dependencies, so we also update llvm-project to
8dca953dd39c0cd8c80decbeb38753f58a4de580 to get
https://reviews.llvm.org/D109776 (no other fixes were needed for the
integrate, thankfully).
This completes most of the large source code changes. Next will be
bringing the CI/packaging/examples back to life.