* Replace CHECK_EQ with TORCH_CHECK_EQ
* Check value of TORCH_MLIR_USE_INSTALLED_PYTORCH during LTC build
* Update LTC XFAIL with NewZerosModule ops
* Explicitly blacklist _like ops
* Automatically blacklist new_/_like ops
* Prune away unused Python dependencies from LTC
* Add flag to disable LTC
* Autogen dummy _REFERENCE_LAZY_BACKEND library when LTC is disabled
* Implement compute_shape_var
* Removed Var tests from XFAIL Set
* XFAIL tests using _local_scalar_dense or index.Tensor
* Add StdDim tests to XFAIL set
* Autogen aten::cat
* Changed Example MLIR backend to Reference MLIR backend
* Moved reference_ltc_backend into csrc
* Merged sys_utils.h
* Renamed reference_ltc_backend to reference_lazy_backend
* Addressed review comments
* Update docs with new library name
* Removed _REFERENCE_LAZY_BACKEND from .gitignore
* Added reference_lazy_backend to the TorchMLIRPythonModules dependency list
Fixed typo in `ltc_examples.md`
Missed instance where `ltc_backend` was used instead of `lazy_backend`.
* Update native function definitions
* Add ops to support bert lowering
- Add empty_strided and as_strided
- Restore zeros_like to op blacklist (Without this, tensors will be unintentionally created with a CPU device rather than lazy)
- Check for composite implicit ops and add device data IR
- Also fix codegen for functionalization
* Add autogen to CMakeList
* Remove PyTorch submodule
* Reduced BERT model size
* Print Mark Step status in Torch MLIR LTC debug string
* Apply fixes to work with latest upstream/main
- Pass importOptions into getMlirTypeFromTorchType during NodeImporter::importNode
Without this, the tensor type created may have a mismatched type as ImportOptions may cause vtensor to be used instead of tensor
* Update shape inference functions
- Fixed compute_shape_native_batch_norm when mean and var are uninitialized
Previously, the number of shapes returned would be <3 if either mean or val was didn't exist. Instead, we now initialize them with a vector matching the number of channels.
- Implemented compute_shape_mul
- Fixed bug in reshape shape inference error message
* Get MLIR backend more consistent with TS backend
- Remove LazyNativeFunctions::_unsafe_view from autogen
- Blacklist ops to make JIT graph more like output of TS backend
- Print graph when SSA value has mismatch of types and results
- Remove normalize_index from LazyShapeInference
- Fix seeds for LTC example models
* Update and clean up shape inference functions
- Prune shape inference functions
- Add shape inference function for GenerateSlice
- Add shape inference function for GenerateCopy
Co-authored-by: Henry Tu <henry.tu@cerebras.net>
* Save InputOutputAliases to TorchMlirComputation
* Implement GetResultShape for TorchMlirLoweringContext
* Use optional return type for GetResultShape
* Remove support for aten::detach
With this op enabled, tensors were being copied, which resulted in incorrect aliasing.
* Add newline before printing I/O alias mapping
* Changed printout to use "Input param" as label instead of "Input"
* Remote shape inference function for aten::detach
* Moved implementation of SetUpAlias to MlirLoweringContext
As part of this change, TorchMlirComputation has been moved to the end of mlir_lowering_context.h so that it can access some new structs in TorchMlirLoweringContext
* Use updated PyTorch API
* Remove GetResultShape
Complements this upstream PyTorch PR: pytorch/pytorch#75828
This PR adds support for mapping input and output tensors which alias each other. (e.g. maps input weight tensor in parameter to the same tensor in output after a training iteration)
MLIR:
func @graph(%arg0: !torch.vtensor<[1,5],f32>, %arg1: !torch.vtensor<[1],si64>, ..., %arg6: !torch.vtensor<[10,5],f32>, %arg7: !torch.vtensor<[10],f32>, ...) {
...
return %arg0, %arg1, %17, %23, ... : !torch.vtensor<[1,5],f32>, !torch.vtensor<[1],si64>, !torch.vtensor<[10,5],f32>, !torch.vtensor<[10],f32>, ...
}
Input/Output Alias Mapping:
Output: 0 -> Input: 0
Output: 1 -> Input: 1
Output: 2 -> Input: 6
Output: 3 -> Input: 7
The aten::detach op has also been disabled in this PR to fix the issue of tensors not aliasing properly due to copying.
* Added JIT to MLIR lowering
Lowering to JIT is performed in a way similar to how it's done in the TS LTC backend. After a jit::Graph is constructed, it gets converted to a jit::Function, which is fed into the existing utility to generate an MlirModule in torch-mlir.
* Renamed `csrc/backend` to `csrc/base_lazy_backend`
This enables building Pytorch from source in the CI.
The build should mostly hit the ccache.
Release builds will follow once we have some runtime on the CI.
Remove all the libtorch downloads. If the user sets
-DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF then just build from src.
Doesn't change developer workflow since we still default to local
PyTorch versions.
TEST: Build and verify all tests (except one xfail quant) pass on linux
Found while trying to build torch-mlir on an AArch64 Linux VM, worth
a belts and braces to prevent such cases.
Change-Id: I89c6fccb62e666dbda0d9acac2d0ee43c2899e9b
On my local machine, `unzip` didn't exist (producing a "command not
found" error), but CMake ignored the error. Although the build did
succeed (because it found a previously-built version of libtorch), it
seems better to abort builds on such failures, so this patch checks the
return code of all external process invocations.
Along similar lines, this patch also updates the shell scripts in
`build_tools` to extensively use double-quoting to prevent unintentional
word splitting or globbing. Since some of the scripts execute `rm`
while using shell variables, this patch also adds the preamble `set -u`
to abort execution if an undefined variable is referenced, so that we
reduce the chances of executing `rm -rf /` if the path expression
happens to refer to an undefined variable.
Add an option to cache libtorch/ releases if you don't want to
download the latest. Add an option to enable source builds.
TESTS:
macOS: verify with / without cache downloads
verify source builds -- shared and static
Linux: Build Tests and Release builds
The MacOS builders are having linking trouble with the extension library.
Until it's fixed, all support for op extensions is disabled. It should be
easy to restore once the issue is resolved.
PyTorch allows new operators to be registered dynamically in modules.
Torch-mlir already makes it fairly straightforward to add support for
new operators, and this commit just extends that support to allow new
PyTorch ops to come from a external module.
This does *not* allow ops to be dynamically loaded into torch-mlir.
Torch-mlir must still be compiled with support built-in.
Add a `_torch_mlir_custom_op_example` subpackage to `torch_mlir` which
registers an demonstration op. It will not be imported by default when
importing torch_mlir. It's strictly for testing and documentation.
Adds an end-to-end test for the `torch_mlir_custom_op_example::identity` op.
With all these changes, we should now be actively testing PyTorch extension
support with all future patches.
1. With the help of `make_fx` we are able to get the full training graph
with weight updates.
2. NeuralNet_training passes. Bert_training passes after cherry-picking
https://github.com/llvm/torch-mlir/pull/844.
3. TODO: Remove the functorch's dependency after make_fx moves to
pytorch core.
* Add oneshot release snapshot for test/ondemand
Add some build scripts to test new release flow based on IREE.
Wont affect current builds, once this works well we can plumb it
in.
Build with manylinux docker
* Fixes a few issues found when debugging powderluv's setup.
* It is optional to link against Python3_LIBRARIES. Check that and don't do it if they don't exist for this config.
* Clean and auditwheel need to operate on sanitized package names. So "torch_mlir" vs "torch-mlir".
* Adds a pyproject.toml file that pins the build dependencies needed to detect both Torch and Python (the MLIR Python build was failing to detect because Numpy wasn't in the pip venv).
* Commented out auditwheel: These wheels are not PyPi compliant since they weak link to libtorch at runtime. However, they should be fine to deploy to users.
* Adds the --extra-index-url to the pip wheel command, allowing PyTorch to be found.
* Hack setup.py to remove the _mlir_libs dir before building. This keeps back-to-back versions from accumulating in the wheels for subsequent versions. IREE has a more principled way of doing this, but what I have here should work.
Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
This avoids issues where PyTorch version drift has made things
incompatible.
One caveat is that you will need to specify
`-f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
--pre` on the command line for pip to know where to find the nightly
packages (there is no way around this) -- this is easiest to do by
simultaneously passing `-r requirements.txt` on the pip command line.
I am investigating the breakage.
Also, fix "externals" rename in setup.py and some cases where we weren't
using `requirements.txt` consistently.
Also, fix a case where the packaging script would get confused due to
".." in the path name.
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
This was an aspirational goal at an earlier stage in the project where
the focus was heavily on programs with state, control flow, and
lists/dicts. We will circle back to such programs likely 2022H2 at some
point, but for now, having this test doesn't add much, since basically
nothing works or is being worked on.
See the documentation in `docs/shape_lib.md` and
`docs/adding_a_shape_function.md` for an overview of the system.
This completely overhauls how we represent shape functions. In
particular, RefineTypes does not infer shapes anymore (only dtypes).
Shape functions are now written in (TorchScript'able) Python.
Recommended review order:
1. Read `docs/shape_lib.md` and `docs/adding_a_shape_function.md`.
1. Code and tests for ReifyShapeCalculations, DropShapeCalculations.
1. Code and tests for SimplifyShapeCalculations.
1. shape_lib_gen.py
1. Code and tests for new RefineTypes pass.
1. Random folders/canonicalizers in TorchOps.cpp and associated test in
`canonicalize.mlir`.
1. New ReadOnly trait inferred from the registry.
1. Any miscellaneous remaining stuff.
Example `-print-ir-after-all` for ElementwiseUnaryModule:
[IR lowering dump](https://gist.github.com/silvasean/e4dc8cbc8d00aac7819602e3cbd8e212).
Example `-print-ir-after-all` for ElementwiseBinaryModule:
[IR lowering dump](https://gist.github.com/silvasean/daf6860ecced732af3568af6b1899113).