Discord Thread:
https://discord.com/channels/636084430946959380/1238330633328005243
## Context:
[This](https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/fx.py#L61)
was updated to support e2e tests for the TorchDynamo frontend in
Torch-MLIR, where we run FX decompositions and import the FX IR to
generate Torch dialect, followed by
`torch-function-to-torch-backend-pipeline`, skipping only the shape/type
refinement for now. However, we should be able to skip many of the torch
simplification passes, as depicted in the [frontend
roadmap](https://github.com/llvm/torch-mlir/blob/main/docs/images/roadmap_frontend.png).
Based on IREE's TorchDynamo
[pipeline](https://github.com/iree-org/iree/blob/main/compiler/plugins/input/Torch/InputConversion/Passes.cpp#L29),
the only two passes we seem to require are: `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. This is inline with our findings as well
based on initial exploration.
This PR creates a dedicated frontend simplification pipeline for
TorchDynamo / FX Importer which calls only `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. We rely on the e2e fx_importer tests to
ensure we're not regressing by removing many of the passes that were
historically needed for TorchScript.
One notable change here is that we do not call the
`LowerToBackendContractPass` anymore, which used to call
`TorchSimplificationPipeline` iteratively until VerifyBackendContract
was clean. Some of this was required for the shape/type refinement to
converge, which seems a non-issue for Dynamo frontend. Do we anticipate
this (the iterative invocation of TorchSimplificationPipeline followed
by VerifyBackendContract) to be worth retaining in the Dynamo frontend
pipeline? If so, I can make those changes, PLMK.
This lifts the core of the jit_ir_importer and ltc out of the pt1
project, making them peers to it. As a side-effect of this layering, now
the "MLIR bits" (dialects, etc) are not commingled with the various
parts of the pt1 project, allowing pt1 and ltc to overlay cleanly onto a
more fundamental "just MLIR" Python core. Prior to this, the Python
namespace was polluted to the point that this could not happen.
That "just MLIR" Python core will be introduced in a followup, which
will create the space to upstream the FX and ONNX pure Python importers.
This primary non-NFC change to the API is:
* `torch_mlir.dialects.torch.importer.jit_ir` ->
`torch_mlir.jit_ir_importer`.
The rest is source code layering so that we can make the pt1 project
optional without losing the other features.
Progress on #2546.
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20
There are two primary goals:
1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.
Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.
This also takes steps toward a couple of other layering enhancements:
* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.
It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.
Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
Gets both CI and Release builds integrated in one workflow.
Mount ccache and pip cache as required for fast iterative builds
Current Release docker builds still run with root perms, fix it
in the future to run as the same user.
There may be some corner cases left especially when switching
build types etc.
Docker build TEST plan:
tl;dr:
Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs.
TM_PACKAGES="torch-mlir out-of-tree in-tree"
2.57s user 2.49s system 0% cpu 30:33.11 total
Out of Tree + PyTorch binaries:
Fresh build (purged cache):
TM_PACKAGES="out-of-tree"
0.47s user 0.51s system 0% cpu 5:24.99 total
Incremental with ccache:
TM_PACKAGES="out-of-tree"
0.09s user 0.08s system 0% cpu 34.817 total
Out of Tree + PyTorch from source
Incremental
TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.81s system 2% cpu 1:59.61 total
In-Tree + PyTorch binaries:
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree"
0.53s user 0.49s system 0% cpu 6:23.35 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree"
0.45s user 0.66s system 0% cpu 3:57.47 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree"
0.16s user 0.09s system 0% cpu 2:18.52 total
In-Tree + PyTorch from source
Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
2.03s user 2.28s system 0% cpu 11:11.86 total
Fresh build/ but with prior ccache
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.58s user 1.88s system 1% cpu 4:53.15 total
Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
1.09s user 1.10s system 1% cpu 3:29.84 total
Incremental without tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON
1.52s user 1.42s system 3% cpu 1:15.82 total
In-tree+out-of-tree + Pytorch Binaries
TM_PACKAGES="out-of-tree in-tree"
0.25s user 0.18s system 0% cpu 3:01.91 total
To clear all artifacts:
rm -rf build build_oot llvm-build libtorch docker_venv
externals/pytorch/build
* Changed Example MLIR backend to Reference MLIR backend
* Moved reference_ltc_backend into csrc
* Merged sys_utils.h
* Renamed reference_ltc_backend to reference_lazy_backend
* Addressed review comments
* Update docs with new library name
* Removed _REFERENCE_LAZY_BACKEND from .gitignore
* Added reference_lazy_backend to the TorchMLIRPythonModules dependency list
Fixed typo in `ltc_examples.md`
Missed instance where `ltc_backend` was used instead of `lazy_backend`.
* Update native function definitions
* Add ops to support bert lowering
- Add empty_strided and as_strided
- Restore zeros_like to op blacklist (Without this, tensors will be unintentionally created with a CPU device rather than lazy)
- Check for composite implicit ops and add device data IR
- Also fix codegen for functionalization
* Add autogen to CMakeList
* Remove PyTorch submodule
* Reduced BERT model size
* Print Mark Step status in Torch MLIR LTC debug string
* Apply fixes to work with latest upstream/main
- Pass importOptions into getMlirTypeFromTorchType during NodeImporter::importNode
Without this, the tensor type created may have a mismatched type as ImportOptions may cause vtensor to be used instead of tensor
* Update shape inference functions
- Fixed compute_shape_native_batch_norm when mean and var are uninitialized
Previously, the number of shapes returned would be <3 if either mean or val was didn't exist. Instead, we now initialize them with a vector matching the number of channels.
- Implemented compute_shape_mul
- Fixed bug in reshape shape inference error message
* Get MLIR backend more consistent with TS backend
- Remove LazyNativeFunctions::_unsafe_view from autogen
- Blacklist ops to make JIT graph more like output of TS backend
- Print graph when SSA value has mismatch of types and results
- Remove normalize_index from LazyShapeInference
- Fix seeds for LTC example models
* Update and clean up shape inference functions
- Prune shape inference functions
- Add shape inference function for GenerateSlice
- Add shape inference function for GenerateCopy
Co-authored-by: Henry Tu <henry.tu@cerebras.net>
* Added JIT to MLIR lowering
Lowering to JIT is performed in a way similar to how it's done in the TS LTC backend. After a jit::Graph is constructed, it gets converted to a jit::Function, which is fed into the existing utility to generate an MlirModule in torch-mlir.
* Renamed `csrc/backend` to `csrc/base_lazy_backend`
Note that to enable folding of the code coming from an example
like the ConstantPad2dStaticModule e2e test, support for other
operations had to be added/improved:
- aten::neg.int
- aten::eq.float
- aten::eq.str
- prim::Uninitialized
* Also adds a requirements.txt and updates docs to reference it versus stringy pip install.
* Adds doc with instructions on creating a wheel.
Fixes#370
* Picks up Python configure changes (was pinned to a bad intermediate commit).
* Uses the new mlir_configure_python_dev_packages() to ensure CMake python is found consistently.
* Fixes the JIT importer to build as a MODULE vs SHARED (needed for linking to Python as a module, per config changes).
* Adds some notes to the README to help folks build a smaller set focused just on this project.
This takes the example from torchscript_resnet18_e2e.py and puts it into
a slightly cleaned up notebook form.
It's still a little rough around the edges. Areas for improvement:
- Installation / setup.
- API usability.
Also,
- Add `npcomp-backend-to-iree-frontend-pipeline` since we will be adding
more stuff there.
- Slight cleanups.
* Adds a minimal setup.py for frontends/pytorch
* Makes npcomp-core export its headers and libraries
* Adds a script to build packages.
* Adds CI step to package and smoke test.
* Will need some more tweaks and coordination prior to deploying (version locking etc).