Commit Graph

1651 Commits (8ba77ae2a519bc8db9f5c0ae9197471845b925cf)
 

Author SHA1 Message Date
Prashant Kumar 8ba77ae2a5 Yapf Format `refbacked.py`. 2022-12-15 21:19:52 +05:30
Prashant Kumar 564403e3a1 Add float16 support in the refbackend.
This will require https://reviews.llvm.org/D139121 patch to go through.
2022-12-15 21:19:52 +05:30
powderluv cd90c0aaf5
Update buildAndTest.yml (#1723) 2022-12-15 05:42:01 -08:00
Sean Silva af9e8a5e63 [torchdynamo] Move to aot_autograd instead of raw make_fx
As [@ezyang suggested](https://github.com/pytorch/pytorch/issues/90276#issuecomment-1339791275),
use `torch._dynamo.optimizations.training.aot_autograd` instead of raw
`make_fx`. This is more future proof and gives us the backward pass and
functionalization. We don't currently get functionalization because of
https://github.com/pytorch/pytorch/issues/90759

This also incidentally fixes the source location handling, which makes
`lockstep_basic.py` give an accurate source location!
2022-12-15 01:55:50 -08:00
Ashay Rane 64f9a0e978
ci: print ccache statistics and configuration at end of CI run (#1719)
There appear to be two problems with the caching layer in our CI runs:
(a) the sizes of some of the caches have grown to multiples of the
300 MB limit and (b) caching on Windows seems to be provide little to no
benefit.

To help understand the reasons for these problems, this patch adds a
line item to the list of steps run in CI to dump the ccache
configuration and statistics just prior to uploading the cache artifact.
2022-12-14 09:50:43 -06:00
Roll PyTorch Action a29f173a6b update PyTorch version to 2.0.0.dev20221214 2022-12-14 15:23:09 +00:00
Sean Silva b60da34f84 [cleanup] Fix a few more llvm::None -> std::nullopt 2022-12-14 05:59:49 -08:00
Sean Silva 8c3774bb2a
Minor fixes for development.md
- Mention the rotation doc
- Fix minor typos / broken link
2022-12-14 02:55:51 -08:00
Ashay Rane f63bb9f86c
build: update llvm tag to 3a020527 (#1717)
Summary of changes:

 - Replace `llvm::None` with `std::nullopt`, since the former is deprecated
   (https://reviews.llvm.org/D139763)

 - Use setter for symbol visibility instead of passing string attribute when
   creating FuncOp
2022-12-14 02:06:39 -06:00
Ahmed S. Taei b1f6832849
Add aten.slice.Tensor & aten.cat folders (#1691) 2022-12-13 13:02:47 -08:00
Ashay Rane 731c313231
ci: run `git pull` before committing pytorch version updates (#1716)
The RollPyTorch action often takes more than 1.5 hours to finish.
During this time, if another PR is merged, then the RollPyTorch action
needs to first pull the merged changes before committing the updates to
the PyTorch commit hash and version files.  This patch adds the required
`git pull` statement, without which, the subsequent `git push` statement
fails, causing the RollPyTorch action to fail as well.
2022-12-13 13:41:41 -06:00
Daniel Ellis 07a65961dd
Disable pypi publishing.
See https://github.com/llvm/torch-mlir/issues/1709
2022-12-13 11:45:41 -05:00
Ramiro Leal-Cavazos a710237437
[custom op] Generalize shape library logic to work with dtypes (#1594)
* [custom op] Generalize shape library logic to work with dtypes

This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.

For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
2022-12-13 08:25:41 -08:00
Sean Silva 2acf7da63c [README] Small touch-ups, and mention PT2 2022-12-13 08:06:17 -08:00
Roll PyTorch Action 8d098dc8d5 update PyTorch version to 2.0.0.dev20221213 2022-12-13 14:52:27 +00:00
Chi_Liu 163d19cce6
[TOSA] Add aten.add/sub.Scalar/Tensor si64 type support (#1604) 2022-12-12 12:13:07 -08:00
Ramiro Leal-Cavazos 73bd32d06c
Make `getTensorRank` safer by changing return to `Optional<unsigned>` (#1707)
Currently `getTensorRank` returns -1 if it was unable to get the rank
of the tensor. However, not every use in the codebase was checking the
return value, and in some cases, the return value was casted to
unsigned leading to some infinte loops when an unranked tensor reached
a decomposition.

This commit changes the return of `getTensorRank` to
`Optional<unsigned>` to make it clear to the user that the function
can fail.

This commit also changes a couple of for loops that iterate a vector
in reverse order that can potentially become infinite loops into
range-based for loops.
2022-12-12 08:56:28 -08:00
Ashay Rane 430737b820
[cleanup] fix naming of private variable according to the style guide (#1704) 2022-12-12 09:04:46 -06:00
Sean Silva a595942033 [cleanup] Use `"` instead of `'` for string literals
This is the more predominant style in the codebase. I'm sure there are
more in other parts of the codebase but it's hard to search/replace.
2022-12-12 02:40:09 -08:00
Vivek Khandelwal d4862ec611 [MLIR][TORCH] Add e2e support for aten.var_mean op
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
2022-12-12 15:46:54 +05:30
Vivek Khandelwal 143a8f378d build: manually update PyTorch version
Set PyTorch and TorchVision version to nightly release 2022-12-11.

Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
2022-12-12 15:46:54 +05:30
Vivek Khandelwal f783e19dcb Revert "[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs"
This reverts commit 55c7e66aa7.
2022-12-09 19:30:46 +05:30
Sean Silva 7731211d02 Remove eager_mode
This was an experimental attempt at rolling out own op-by-op executor
with `__torch_dispatch__`, but it proved difficult to make it robust.
Op-by-op execution is very easy to implement robustly now with the
PyTorch 2.0 stack, so we don't need eager_mode.

Downstream users were using eager_mode to implement lockstep numerical
accuracy debuggers. We implemented the same functionality with
TorchDynamo in https://github.com/llvm/torch-mlir/pull/1681 so now there
is not much reason to continue maintaining it.
2022-12-09 03:50:00 -08:00
Sambhav Jain 109c91ae9b
[CI] Verify bazel buildifier is run and changes committed (#1700)
Ensures the buildifier (linter for bazel build files) is run and changes are pushed.
2022-12-08 15:56:57 -08:00
Gleb Kazantaev 804f9f1f8f
Extended TorchMLIRLoweringContext with virtual CreateComputation method (#1699)
* Extended TorchMLIRLoweringContext with virtual CreateComputation method

* Fix device_data_cast return value
2022-12-08 15:57:07 -05:00
Sambhav Jain f8a2592905
[Bazel] Resolve circular dependency and add targets for conversion to MLProgram dialect (#1694)
A circular dependency was introduced in e7edcc62fd. 

Specifically, the `makeShapeLLVMCompatible` and `makeShapeTorchCompatible` utilities were being called from `lib/Dialect/Torch/IR/TorchTypes.cpp` and `lib/Dialect/Torch/IR/TorchOps.cpp` defined under the `:TorchMLIRTorchDialect` bazel target, leading it to take a dependency on `:TorchMLIRConversionUtils` which already depends on `:TorchMLIRTorchDialect`, hence creating a circular dependency.

This commit resolves the same by moving said utilities from `lib/Conversion/Utils/Utils.cpp` to `lib/Dialect/Torch/Utils/Utils.cpp`. Please LMK if there's a better way to fix this and I will update the code.

This commit also adds the required targets to support building the new conversions from Torch to ML Program dialect that was introduced in f416953600.

Bazel build GHA triggered manually to verify: https://github.com/sjain-stanford/torch-mlir/actions/runs/3645944517
2022-12-08 09:49:54 -08:00
Ramiro Leal-Cavazos a54b334578
Allow running DecomposeComplexOps more than once (#1671)
The current implementation of `DecomposeComplexOps` fails if an op
expected to be decomposed does not get decomposed in the first
iteration of the `createTorchSimplificationPipeline` in
`LowerToBackendContractPass`. However, some graphs require multiple
iterations of `createTorchSimplificationPipeline` to fully propagate
all statically knowable information, such as dtypes and shapes, to the
entire graph, sometimes resulting in the need to run
`DecomposeComplexOps` more than once.

This commit changes `DecomposeComplexOps` to use a greedy algorithm
for pattern application and moves the legalization check of ops to the
`LowerToBackendContractPass` to allow for the `DecomposeComplexOps` to
run more than once.
2022-12-08 09:26:38 -08:00
Sean Silva e8511840c3 [cleanup] Use a single function pipeline for TOSA->Linalg
This should run faster and is overall clearer.
2022-12-08 09:02:38 -08:00
Ramiro Leal-Cavazos 76190e8a3f
Remove unnecessary decompose-complex-ops tests (#1693)
This commit removes lit tests from the `decompose-complex-ops` that
are essentially testing a macro expansion, in accordance with
https://github.com/llvm/torch-mlir/blob/main/docs/architecture.md#dos-and-donts-for-unit-vs-end-to-end-testing .
2022-12-08 08:22:08 -08:00
Sean Silva 69171c246a [RefBackend] Add elementwise fusion and buffer deallocation
This gives some decent improvements to memory consumption and latency of
testing. I would have expected buffer-deallocation to actually make a
big difference to the final process RSS but it doesn't appear to. Also
running buffer-deallocation later in the pipeline results in
miscompiles. I didn't have the time or interest to dig in deeper, but
something is off.

(numbers below are taken from a single run, but I did do a few runs to make
sure that the variance wasn't that great)

- Linalg-on-Tensors shows memory consumption improvements and some slight speedups.
```
./tools/e2e_test.sh -s -v -c refbackend
fuse=0 dealloc=0
RSS: 3071.33 MB
real    3m58.204s
user    6m22.299s
sys     0m51.235s
fuse=1 dealloc=0
RSS: 2515.89 MB
real    3m34.797s
user    5m56.902s
sys     0m44.933s
fuse=1 dealloc=post-bufferize:
RSS: 2290.25 MB
real    3m42.242s
user    6m0.560s
sys     0m46.335s
```

- TOSA ResNet18 gets significantly faster and uses significantly less memory.
```
time ./tools/e2e_test.sh -s -v -c tosa -f ResNet18
fuse=0 dealloc=0
rss 1328.56 MB
real    0m50.303s
user    0m55.355s
sys     0m12.260s
fuse=1 dealloc=0
rss 859MB
real    0m30.454s
user    0m35.551s
sys     0m11.879s
fuse=1 dealloc=post-bufferize:
rss 851MB
real    0m30.313s
user    0m39.889s
sys     0m11.941s
```

Big thanks to Ramiro for the methodology here for measuring the RSS with
`psutil`:
https://gist.github.com/ramiro050/5b5c2501f7389c008d9029210772c3a8
2022-12-08 03:14:42 -08:00
Sean Silva 29c8823464 [e2e tests] Rename default config from "refbackend" to "linalg"
This more accurately reflects what it is. The previous name was
conflating the use of RefBackend (which `linalg`, `tosa`, and `mhlo`
configs all use) with the use of the linalg backend (e.g. TorchToLinalg).

This conflation was artifically giving the linalg backend a "privileged"
position, which we want to avoid. We still keep it as the default
backend, and it remains the most complete, but at least there's not
artificial boosting.
2022-12-08 01:34:46 -08:00
Ramiro Leal-Cavazos dd35488da5
build: update llvm tag to 798fa4b4 (#1684)
- Support for non-prefixed accessors has been removed. See:
  https://reviews.llvm.org/D136727
- Rename `operands` to `methodOperands` in `prim.CallMethod` since the
  name `operands` overlaps with a builtin method name. See:
  https://reviews.llvm.org/D136727
- Add passes in refbackend to lower memref.subview. See:
  https://reviews.llvm.org/D136377
- Replace `CopyToValueTensorOps` first in `RewriteViewLikeSubgraph` in
  maximize-value-semantics.

  The current implementation of the `RewriteViewLikeSubgraph` pass in
  maximize-value-semantics creates temporarily invalid IR. In
  particular, given a forward slice starting from a
  `CopyToNonValueTensorOp` and ending in `CopyToValueTensorOp`s, the
  pass first replaces all uses of the `CopyToNonValueTensorOp` with
  its operand, which results in all the `CopyToValueTensorOp` users
  having their operand have type `!torch.vtensor`, which is invalid.

  The correct way to do things is to first replace all the
  `CopyToValueTensorOp`s with their operand, and then replace all uses
  of the `CopyToNonValueTensorOp` with its operand.

  This only started failing now because the generated accessor
  `getOperand` for the `CopyToValueTensorOp` now returns a
  `TypedValue<NonValueTensorType>`, which has an assert checking that
  the value returned is of the expected type.
2022-12-07 12:20:41 -08:00
Sean Silva b1f9e09f85 [torchdynamo] Add ResNet18 example with TorchDynamo
This is a minor variation on our other resnet18 examples swapping in
TorchDynamo.

We replicate the refbackend_torchdynamo_backend out of the e2e test
config to avoid making that appear like a public API.

Also, some minor cleanups to TorchDynamoTestConfig.
2022-12-07 09:25:27 -08:00
Daniel Ellis 98d80a642a
Publish releases to PyPI after build 2022-12-07 10:01:55 -05:00
Sean Silva c956c39c86 [cleanup] Remove disabled e2e test
This test has been disabled a long time, and since RefBackend is so slow
we don't want to add this unnecessarily. I believe it is covered by
downstream testing such as the Shark Tank.
2022-12-07 06:36:48 -08:00
Sean Silva d52359a891 [docs] Add info about special e2e testing cases. 2022-12-07 12:53:07 +01:00
Vivek Khandelwal 3e4bb2bd8e [MLIR][TORCH] Add E2E support for randn and randn.generator op
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
2022-12-06 22:41:24 +05:30
Sean Silva 485c18bb2f [torchdynamo] Add "lockstep" numerical accuracy debugger.
Thanks to TorchDynamo's great layering and design, this is only about
100 lines of code for a basic lockstep debugger.

This should allow us to deprecate eager_mode, since AFAIK the only
interesting use case that it was really supporting is for downstream users to
write lockstep debuggers.

NOTE: The exact reporting and interface here is subject to change. Please
try it out and provide feedback (or patches :) ).
- make_fx should not drop source locations: https://github.com/pytorch/pytorch/issues/90276
- Report tensors better (huge tensors should be summarized)
- Maybe don't abort, but just warn?
- Allow customizing atol/rtol.
- How best to print the failing node? And include surrounding graph
context?
2022-12-06 07:57:45 -08:00
Vivek Khandelwal ef39b9ebb4 build: manually update PyTorch version
Set PyTorch and TorchVision version to nightly release 2022-12-05.

Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
2022-12-05 22:44:32 +05:30
Roll PyTorch Action 6c5360e281 update PyTorch version to 1.14.0.dev20221204 2022-12-04 14:28:48 +00:00
Roll PyTorch Action 8baa9e42e7 update PyTorch version to 1.14.0.dev20221203 2022-12-03 14:37:17 +00:00
Roll PyTorch Action fcc670d785 update PyTorch version to 1.14.0.dev20221202 2022-12-02 14:50:28 +00:00
Vivek Khandelwal f416953600 [MLIR][TORCH] Add TorchConversionToMLProgram and MLProgramBufferize pass
This commit changes the `InsertRngGlobalsPass` to `TorchConversionToMLProgram`
pass. This commit also adds the `MLProgramBufferize` pass for the
bufferization of ml_program dialect ops to run on refbackend.

Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
2022-12-02 13:20:46 +05:30
Eric Kunze 3fc27cf6ca
Update LLVM Tag to 2c1fa734 (#1670)
Summary of changes:
 - Change ShapedType::kDynamicSize -> ShapedType::kDynamic
 - llvm::NoneType has been deprecated, change convertScalarToDtype to use llvm::None
2022-12-01 20:38:28 -08:00
Sean Silva 88db99946b [torchdynamo] Use decompositions to support a few ops 2022-12-01 11:25:20 -08:00
Ramiro Leal-Cavazos b4b92c990e
Replace LCG algorithm with squares64 algorithm in AtenUniformOp (#1633)
This commit replaces the LCG algorithm that was being used by the
`TorchToLinalg` lowering of `AtenUniformOp` to generate random numbers
with the `squares64` algorithm, for the LCG algorithm was producing
tensors that were highly correlated with one another.

Squares64 algorithm: https://arxiv.org/abs/2004.06278

Closes https://github.com/llvm/torch-mlir/issues/1608
2022-12-01 08:30:10 -08:00
Roll PyTorch Action e66bf7b8cb update PyTorch version to 1.14.0.dev20221201 2022-12-01 15:01:09 +00:00
Vivek Khandelwal e7edcc62fd build: update llvm tag to 147fe9de
Summary of changes:
- Replace call to `MemoryEffectOpInterface::hasNoEffect`
  with `isMemoryEffectFree`.
- Make fix for the dynamic dims, since
  `kDynamicSize` value changed to
  `std::numeric_limits<int64_t>::min()` from `-1` in llvm
- `makeShapeLLVMCompatible` and `makeShapeTorchCompatible`
  utilities convert shapes in order to remain consistent
  with the Torch and MLIR semantics.
- Update tags
  llvm: 147fe9de29dc13c14835127b35280c4d95c8e8ba
  mhlo: 1944b5fa6062ec4c065d726c9c5d64f1487ee8c5

Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
2022-12-01 13:36:50 +05:30
Abhishek Varma 47f67853ac [RefineTypes] Add Float16Type dtype knowledge support for trivial ops
-- This commit adds Float16Type dtype knowledge support for trivial ops.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2022-12-01 10:22:43 +05:30
Ramiro Leal-Cavazos 0983a7f93a
Fix modulus calculation in LCG algorithm of refbackend (#1658)
The current implementation sets the `nextSeed` value to `temp & 127`,
which is wrong. The last step of the LCG algorithm for the multiplier
and increment chosen should be `temp % 2^{64} = temp & (1 <<
63)`. However, because we are dealing with i64 values, the modulus
operation happens automatically, so it is not needed.

See Donald Knuth's values for LCG here:
https://en.wikipedia.org/wiki/Linear_congruential_generator
2022-11-30 08:46:52 -08:00