The current decomposition for `aten.randn.generator` does not specify
the `dtype` argument of the empty tensors created to store the random
values. This leads to invalid IR when the output type of the `randn`
op is not the default PyTorch dtype.
-- In Python we have the concept of negative dimension indexing.
-- We would want to normalize such dimensions to be +ve and within the
expected range instead.
-- This commit takes care of a few remaining set of Ops and their
lowerings by applying `toPositiveDim` and `isValidDim` to the
extracted integer `dim` value.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
-- This commit adds e2e support for atend.sort op.
-- 1. Adds aten.sort op in torch dialect.
-- 2. Adds tm_tensor.sort op in TMTensor dialect.
-- 3. Adds lowering of aten.sort -> tm_tensor.sort.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
-- This commit adds e2e support for aten.randint by decomposing it into
an aten.randint.low by setting low=0.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
This commits adds the support for cases for index_put_op:
1.) where index is a 2-d tensor.
2.) where indices is a list of tensors and none, with exactly
2 non none tensors along the consecutive dimensions.
This commit also adds a utility to compute the broadcast shape
given the two input tensors.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit also adds the support for non-unit output padding in the
case of transposed convolution.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
This commit adds the ability to specify extra abstract interpretation
functions in `torch_mlir.compile` to use during type refinement. This
allows users to easily add custom ops without having to interact with
MLIR or C++ directly.
The ops `aten.convolution_overrideable` and
`aten.convolution_backward_overrideable` are currently not e2e tested
in Torch-MLIR. Moreover, there is no way to add e2e tests for them
because the ops cannot be called using the CPU backend (this also
prevents adding tested dtype functions for these ops). Since these two
ops are not expected to ever appear in PyTorch traces obtained through
standard means (https://github.com/pytorch/pytorch/issues/97481),
Torch-MLIR should not have to worry about them.
There are several ops that have their shape function upstream and had
not been updated in Torch-MLIR to use the upstream version. This
commit updates those shape function. In addition, TODOs have been
added for shape functions that should be upstream but are not.
The original design for the dtype functions outlined in
https://github.com/llvm/torch-mlir/issues/1462 was unable to properly
handle ops that take optional tensors as an input when the optional
tensor has a value of None. By the time the op gets imported into
torch-mlir, if an optional value is None, all information about the
original type is lost from the op type signature, preventing
torch-mlir from knowing if a value of None was from an optional tensor
or not, which was crucial in the original design since each tensor
argument must be turned into two separate arguments for the dtype
function.
This commit changes the interface to dtype functions such that each
tensor turns into a tuple of two ints, the first representing the rank
of the tensor and the second the dtype of the tensor. Since now there
is a one-to-one correspondence between the operands of an op and the
operands of its dtype function, there is no ambiguity about which
operand of the op corresponds with which operand of the dtype
function.
To test the implementation, this commit defines dtype function for
convolution op, which takes one optional tensor as an argument.
* implemented ceil_mode== true support for lowering aten.max_pool2d to tosa
* add e2e test for lowering aten.max_pool2d to tosa with ceil_mode=true
---------
Co-authored-by: Lisa Liu <lingl@xilinx.com>
The current implementation of `getScalarValue` does not check that the
input to a `ValueTensorLiteralOp` is an i64 before extracting the
value, and it does not check that the result type of the
`PrimNumToTensorScalarOp` is also an i64. This leads to crashes or
invalid IR generated when the `input` is something other than an i64
tensor or `!torch.int`.
This commit addresses those issues. In addition, the function
`getScalarValue` is renamed to `getScalarIntValue` to make it clear
that it *only* extracts scalar integers.
Set PyTorch and TorchVision version to nightly release 2023-02-27.
This commit also adds the lowering for aten.add and aten.Float.Scalar op.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
Random tensors used in e2e tests should be created using the
`TestUtils` object passed to the registered test case to ensure that
the compiled module and the golden trace receive the same tensors as
input. This commit changes all the cases of `torch.rand` and
`torch.randn` to use the `TestUtils` instead.
This patch replaces all MHLO operations with their StableHLO
counterparts and adds a validation pass to ensure that no MHLO operations
remain before translating all Stablehlo operations to the MHLO dialect
for further lowering to the Linalg dialect.
This patch also updates all lit tests so that they refer to the
`convert-torch-to-stablehlo` pass and so that they check for StableHLO
operations.
Week of 01/30/2023:
Green LLVM commit: e31ee6417c33a6e2f0e8440b1a86d5365279ad68
Green MHLO commit: c2a6f4064d426567b9ef7b0d29d5ab86dc7b2b02 (branch greencommit/2023-01-30-e31ee641)
This commit replaces the `tanh` dtype function, which was being used
to test the implementation of dtype functions in
a710237437, with a dtype function for
`expm1`. The dtype function for `expm1` is identical to the `tanh`
one, so the same level of testing is maintained.
Currently, there are ops getting dtype information from the
`RefineTypes` pass and ops getting dtype information from the
`TorchDtypeRefinementPipeline`. Since each pass can only propagete
dtype information for the ops it knows how to handle, some models with
many ops handled in both passes require the two dtype propagation
passes to execute many times, reaching the iteration limit set in the
`LowerToBackendContractPass`. To temporarily avoid this issue while
the migration to `TorchDtypeRefinementPipeline` is finished, this
commit switches `tanh` to `expm1`, since the latter is used a lot less
in large models.
This reverts commit eaab9be207, since it
is causing the post-merge CI tests to fail, causing subsequent PRs to be
blocked. Specifically, the tests
`ElementwiseAtenLogicalAndOpPromoteBroadcastModule_basic` and
`ElementwiseAtenLogicalXorOpPromoteBroadcastModule_basic` fail because
the oracle does not match the computed result. This patch reverts the
commit to make the post-merge builds green again.
This commit adds support for passing to `torch_mlir.compile` the
result of running `torch.jit.trace` on a model by relaxing the
condition that checks if the model is already in JIT IR to allow any
`torch.jit.ScriptModule`.
Fixes https://github.com/llvm/torch-mlir/issues/1739
-- The dtype of the result of `aten.embedding` should match that of
the `weight` operand's (operand[0]) instead of hardcoding to f32.
-- This commit aims to provide a fix for the same.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
pytorch/pytorch@140a3139 reverted a change from yesterday, causing the
RollPyTorch action to break. This patch reverts the corresponding
change in the torch-mlir LTC code.
This patch also re-enables tests that were previously marked as XFAIL.
As [@ezyang suggested](https://github.com/pytorch/pytorch/issues/90276#issuecomment-1339791275),
use `torch._dynamo.optimizations.training.aot_autograd` instead of raw
`make_fx`. This is more future proof and gives us the backward pass and
functionalization. We don't currently get functionalization because of
https://github.com/pytorch/pytorch/issues/90759
This also incidentally fixes the source location handling, which makes
`lockstep_basic.py` give an accurate source location!
* [custom op] Generalize shape library logic to work with dtypes
This commit generalizes the shape library logic, so that dtype rules
for ops can also be expressed using the same mechanism. In other
words, each op can now have a shape function and a dtype function
specified in Python that is imported during lowering to calculate the
shapes and dtypes throught a program. For more information about how
to specify a dtype function, see the updated
`docs/adding_a_shape_and_dtype_function.md`.
For those not familiar with how the shape library works, the file
`docs/calculations_lib.md` provides an overview.
This was an experimental attempt at rolling out own op-by-op executor
with `__torch_dispatch__`, but it proved difficult to make it robust.
Op-by-op execution is very easy to implement robustly now with the
PyTorch 2.0 stack, so we don't need eager_mode.
Downstream users were using eager_mode to implement lockstep numerical
accuracy debuggers. We implemented the same functionality with
TorchDynamo in https://github.com/llvm/torch-mlir/pull/1681 so now there
is not much reason to continue maintaining it.
This gives some decent improvements to memory consumption and latency of
testing. I would have expected buffer-deallocation to actually make a
big difference to the final process RSS but it doesn't appear to. Also
running buffer-deallocation later in the pipeline results in
miscompiles. I didn't have the time or interest to dig in deeper, but
something is off.
(numbers below are taken from a single run, but I did do a few runs to make
sure that the variance wasn't that great)
- Linalg-on-Tensors shows memory consumption improvements and some slight speedups.
```
./tools/e2e_test.sh -s -v -c refbackend
fuse=0 dealloc=0
RSS: 3071.33 MB
real 3m58.204s
user 6m22.299s
sys 0m51.235s
fuse=1 dealloc=0
RSS: 2515.89 MB
real 3m34.797s
user 5m56.902s
sys 0m44.933s
fuse=1 dealloc=post-bufferize:
RSS: 2290.25 MB
real 3m42.242s
user 6m0.560s
sys 0m46.335s
```
- TOSA ResNet18 gets significantly faster and uses significantly less memory.
```
time ./tools/e2e_test.sh -s -v -c tosa -f ResNet18
fuse=0 dealloc=0
rss 1328.56 MB
real 0m50.303s
user 0m55.355s
sys 0m12.260s
fuse=1 dealloc=0
rss 859MB
real 0m30.454s
user 0m35.551s
sys 0m11.879s
fuse=1 dealloc=post-bufferize:
rss 851MB
real 0m30.313s
user 0m39.889s
sys 0m11.941s
```
Big thanks to Ramiro for the methodology here for measuring the RSS with
`psutil`:
https://gist.github.com/ramiro050/5b5c2501f7389c008d9029210772c3a8
- Support for non-prefixed accessors has been removed. See:
https://reviews.llvm.org/D136727
- Rename `operands` to `methodOperands` in `prim.CallMethod` since the
name `operands` overlaps with a builtin method name. See:
https://reviews.llvm.org/D136727
- Add passes in refbackend to lower memref.subview. See:
https://reviews.llvm.org/D136377
- Replace `CopyToValueTensorOps` first in `RewriteViewLikeSubgraph` in
maximize-value-semantics.
The current implementation of the `RewriteViewLikeSubgraph` pass in
maximize-value-semantics creates temporarily invalid IR. In
particular, given a forward slice starting from a
`CopyToNonValueTensorOp` and ending in `CopyToValueTensorOp`s, the
pass first replaces all uses of the `CopyToNonValueTensorOp` with
its operand, which results in all the `CopyToValueTensorOp` users
having their operand have type `!torch.vtensor`, which is invalid.
The correct way to do things is to first replace all the
`CopyToValueTensorOp`s with their operand, and then replace all uses
of the `CopyToNonValueTensorOp` with its operand.
This only started failing now because the generated accessor
`getOperand` for the `CopyToValueTensorOp` now returns a
`TypedValue<NonValueTensorType>`, which has an assert checking that
the value returned is of the expected type.
This is a minor variation on our other resnet18 examples swapping in
TorchDynamo.
We replicate the refbackend_torchdynamo_backend out of the e2e test
config to avoid making that appear like a public API.
Also, some minor cleanups to TorchDynamoTestConfig.
This test has been disabled a long time, and since RefBackend is so slow
we don't want to add this unnecessarily. I believe it is covered by
downstream testing such as the Shark Tank.
Thanks to TorchDynamo's great layering and design, this is only about
100 lines of code for a basic lockstep debugger.
This should allow us to deprecate eager_mode, since AFAIK the only
interesting use case that it was really supporting is for downstream users to
write lockstep debuggers.
NOTE: The exact reporting and interface here is subject to change. Please
try it out and provide feedback (or patches :) ).
- make_fx should not drop source locations: https://github.com/pytorch/pytorch/issues/90276
- Report tensors better (huge tensors should be summarized)
- Maybe don't abort, but just warn?
- Allow customizing atol/rtol.
- How best to print the failing node? And include surrounding graph
context?
This commit changes the `InsertRngGlobalsPass` to `TorchConversionToMLProgram`
pass. This commit also adds the `MLProgramBufferize` pass for the
bufferization of ml_program dialect ops to run on refbackend.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
This commit replaces the LCG algorithm that was being used by the
`TorchToLinalg` lowering of `AtenUniformOp` to generate random numbers
with the `squares64` algorithm, for the LCG algorithm was producing
tensors that were highly correlated with one another.
Squares64 algorithm: https://arxiv.org/abs/2004.06278
Closes https://github.com/llvm/torch-mlir/issues/1608
The current implementation sets the `nextSeed` value to `temp & 127`,
which is wrong. The last step of the LCG algorithm for the multiplier
and increment chosen should be `temp % 2^{64} = temp & (1 <<
63)`. However, because we are dealing with i64 values, the modulus
operation happens automatically, so it is not needed.
See Donald Knuth's values for LCG here:
https://en.wikipedia.org/wiki/Linear_congruential_generator
There are a few e2e tests that take several very large tensors as
input, which leads to the e2e test suite leaking too much
memory. Running things locally resulted in a total memory usage of
12.5 GB when running the suite sequentially on the refbackend.
Many of the tests that take large tensors don't actually need
such large tensors to pass, and some that take several large tensors
as input are just doing the same thing multiple times. This commit
reduces the size of some of the tensors and removes repetitive parts
of tests to reduce the memory usage to a total of 3 GB.
`np.bool is bool` and will never be returned as a dtype of an
`np.ndarray`, so we don't need to handle it here.
```
>>> a = np.ndarray([1], dtype=bool)
>>> a.dtype.type is np.bool_
True
```
More info here:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
For reasons that I haven't yet fully tracked down, the TorchDynamo
TestConfig seems to result in tensors that cannot be pickled. They seem
to be holding some sort of weak handles to a `torch.fx.graph.Graph`.
Here is the object structure that leads to the unpickleable object:
```
(<function _rebuild_tensor_v2 at 0x7f56346d56c0>, <class 'torch.Tensor'>, ( 1.0...
{<object object at 0x7f557529e6b0>: <WeakKeyDictionary at 0x7f556a3efbb0>}
{'data': {<weakref at 0x7f5615372ed0; to 'PythonKeyTracer' at 0x7f556a3ee5c0>: _...
<class 'torch.fx.graph.Graph'>
<class 'torch._ops.OpOverloadPacket'>
TypeError("cannot pickle 'torch._C.FunctionSchema' object")
```
Upstream bug filed: https://github.com/pytorch/pytorch/issues/89626
This adds a basic e2e Config for TorchDynamo using
Linalg-on-Tensors/RefBackend.
But TorchDynamo is pretty orthogonal to
various other pieces, so it should compose nicely with variations like:
- Switching out all the backends (Linalg-on-Tensors, TOSA, MHLO)
- PyTorch functionalization and decompositions
- Taking the example inputs and compiling with all dynamic or all static
shapes without duplicating tests.
This adds it to the CI, but there are still a lot of XFAIL's.
This also adds a helper `from torch_mlir.dynamo import
make_simple_dynamo_backend` which simplifies some of the steps for
making a Torch-MLIR-based TorchDynamo backend. We include "simple" in
the name because we are going to be exploring various things next from
the long-term roadmap.
The next steps are:
- Burn down all the XFAIL's.
- Start working on the pieces from the [long-term roadmap](https://github.com/llvm/torch-mlir/blob/main/docs/long_term_roadmap.md).
- Add functionalization/decompositions into the TorchDynamo flow and
remove reliance on the current Torch-MLIR "frontend".
- Write a pure-Python direct FX->MLIR importer.
- Hook up the new PyTorch symbolic shape stuff.
- Explore PrimTorch decompositions for simplifying backends.
This commit fixes the aten.mean and aten.mean.dim op decomposition
for supporting large-sized inputs.
This commit also fixes the formatting for the file stats.py
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
The purpose of the test suite is to accelerate the development of the
compiler. However, we had various tests there that were not expected to
work, had no in-progress work being tested by the test, and nobody was
actively working on them. Having such tests in our test suite just adds
clutter and slows down development on the compiler.
-- aten.upsample_nearest2d.vec op is not present
owing to https://github.com/pytorch/pytorch/pull/85638
-- So this commit adds a lowering on aten.upsample_nearest2d.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
For AoT deployments models often have multiple exported methods.
This patch enables something like this:
```
class TwoMethodsModule(torch.nn.Module):
def sin(self, x):
return torch.ops.aten.sin(x)
def cos(self, x):
return torch.ops.aten.cos(x)
example_args = torch_mlir.ExampleArgs()
example_args.add_method("sin", torch.ones(2, 3))
example_args.add_method("cos", torch.ones(2, 4))
print(torch_mlir.compile(TwoMethodsModule(), example_args))
```
In the
[long-term](https://github.com/llvm/torch-mlir/blob/main/docs/long_term_roadmap.md#tools-for-advanced-aot-deployments)
we will need to reconcile this with our story for stateful models and the
backend contract being purely functional. For now, this provides some basic
infra that seems harmless. Arguably, we could tighten up the backend contract
even more to only allow a single compiled function which would prohibit this or
require building out a layer above.
Fixes#1557
Unless requested otherwise, PyTorch no longer installs most of the
header files under the caffe2 directory (see
https://github.com/pytorch/pytorch/pull/87986). This breaks our
importer code since we need to use the `MakeGuard()` function to execute
statements in the event of exceptions.
To fix this issue, this patch implements a rudimentary version of
PyTorch's ScopeGuard, where once the class variable goes out of scope,
it executes a predefined method.
This commit removes almost all of the valsem ops, since the value
semantics version of the ops now exist in PyTorch. The only op missing
is `aten.bernoulli_.float`. In addition, this commit also simplifies
the implementation of `aten.fill.Scalar` by moving it to the pattern
that converts elementwise ops.
* Add LazyGraphExecutor registration
* Update PyTorch version to 1.14.0.dev20221024
Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com>
* ci: cache PyTorch source builds
This patch reduces the time spent in regular CI builds by caching
PyTorch source builds. Specifically, this patch:
1. Makes CI lookup the cache entry for the PyTorch commit hash in
pytorch-version.txt
2. If lookup was successful, CI fetches the previously-generated WHL
file into the build_tools/python/wheelhouse directory
3. CI sets the `TM_PYTORCH_INSTALL_WITHOUT_REBUILD` variable to `true`
4. The build_libtorch.sh script then uses the downloaded WHL file
instead of rebuilding PyTorch
* ci: warm up PyTorch source cache during daily RollPyTorch action
This patch makes the RollPyTorch action write the updated WHL file to
the cache, so that it can be later retrieved by CI that runs for each
PR. We deliberately add the caching step to the end of the action since
the RollPyTorch action never needs to read from the cache, although
executing this step earlier in the process should not cause problems
either.
This commit makes the following changes needed to update bump LLVM:
- Replace `linalg.init_tensor` with `tensor.empty` (see:
https://reviews.llvm.org/D135129)
- Replace `NoSideEffect` with `Pure` (see
https://reviews.llvm.org/D135505)
- Replace `body` region accessor for `ReduceOp` and `ReduceWindowOp`
with `getBody`
- Fix incorrect use of `tosa::ReduceSumOp` in `AtenNativeLayerNormOp`
conversion pattern. The result type of `tosa::ReduceSumOp` must have
the same rank as the input type. (see:
https://www.mlplatform.org/tosa/tosa_spec.html#_reduce_sum)
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
Co-authored-by: Ashay Rane <ashay@users.noreply.github.com>
This commit replaces test inputs that were being linearly transformed
by multiplying and adding/subtracting to the input tensor with inputs
that use the `low` and `high` keyword arguments instead.
We originally added these to help bring up more complex models with
heavier dependencies. However, over time it has become clear that these
models usually require more than just heavier dependencies -- they often
require a nontrivial amount of "one-off" code to extract the relevant
parts of the model and compile them. This is not a good fit for a
component in the core Torch-MLIR repo.
However, in the community, nod.ai has developed the ["Shark
Tank"](https://github.com/nod-ai/SHARK/tree/main/tank) which has all the
appropriate code to wrangle these models and organize them. We intend to
more heaviliy lean on that as a community and improve the symbiosis
there to serve the role that these heavydep tests were meant to play.
Allow customizing `backend_legal_ops` for "torch" output type, since we
don't know which backend will be used (it might be a custom backend).
We don't allow customizing the `backend_legal_ops` for the other output
types (Linalg, TOSA, MHLO) since those backends control their set of
legal ops directly.
Fixes#1418
-- This commit adds e2e support for `aten.Mish` op.
-- `aten.Mish` op is decomposed as following :-
Mish(x) = x * Tanh(Softplus(x))
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
* build: disable LTC again so that we can bump PyTorch version
When built using PyTorch's master branch, the LTC code has been failing
to build for a few days. As a result, the PyTorch version referenced by
Torch-MLIR is stalled to the one from October 4th.
In an effort to advance to PyTorch version, this patch disables LTC, and
a subsequent patch will advance the PyTorch version.
* update PyTorch version to 1.14.0.dev20221010
Also disables the `UpSampleNearest2dDynamicFactor_basic` e2e test, since
the (PyTorch) oracle differs from the computed value for both the
refbackend and the eager_mode backends.
This commit adds lowering of `aten.div.int` and `aten.bitwise_or.Tensor`
ops. Both these ops are required in order to support bloom_560m model.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
* Fix c10::prim::Constant conversion; Added CAPI for passes; Added passes to base lazy backend
* Update ivalue_importer to use ImportOptions; Added tests for non-value/value tensor types
* Added tests for scalar Constant import; Updated MB::importFunction to use ImportOptions
* Test updates
* Move back module variable name
* Remove RefineTypes from TorchMlirLoweringContext::Build()
* Rename pass; Remove passes from base lazy backend
* Rename pass to VerifyBackendContractPass
* Aligned cmd pass name; Fixed TorchConversion passes registration
* test: allow spaces in path to Python executable
On Windows, the path to the Python binary may contain spaces, so this
patch adds quotes around the path to the python executable.
Thanks to @sstamenova for suggesting the fix!
* python: remove header file that causes Windows build failures
Similar to https://reviews.llvm.org/D125284, we can safely remove this
header file without affecting the build on either Linux. It is
necessary to remove this header file on Windows builds since otherwise
it causes build errors.
* python: drop `TORCH_API` from function defined in Torch-MLIR
`TORCH_API` should apply to functions that are either exported by
libtorch.so or ones that are imported from libtorch.so by its downstream
consumers (like Torch-MLIR). Neither case applies to the
`importJitFunctionAsFuncOp()` function, since it is defined in
Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem
by dropping `TORCH_API` from that function's declaration.
* python: make output of class anotations deterministic
The `class-annotator-repr.py` test checks for class annotations in a
specific order, but prior to this patch, the order was
non-deterministic, since the code iterated on an _unordered_ map.
This patch makes the iteration order deterministic through two changes:
1. using a sorted map
2. using the class qualified name instead of the address of the class in
memory
* test: use Python3_EXECUTABLE as interpreter path for consistency
This ensures that tests use the Python3 version that was detected using
CMake, instead of whichever python version that happens to be in the
PATH variable when invoking the test.
* test: fix RUN string
The parenthesis syntax does not run on Windows (the shell interprets the
`(` character as part of the path). Moreover, the ODR violation in the
comment no longer seems to apply.
* python: port parallel test framework to Windows
Since Windows does not support `fork` natively, Python's
`multiprocessing` module needs to use `spawn` on Windows. However, to
use `spawn`, the multiprocessing module serializes (or pickles) the
worker function and its arguments. Sadly, the multiprocessing module
(both the default one in Python and the one that is extended in PyTorch)
is unable to serialize lambda functions (see
https://stackoverflow.com/a/19985580) for detals.
Unfortunately, given how our tests are structured, we require that the
function under test is passed as an argument to another function, so we
cannot sidestep our use of lambda functions.
To resolve this problem, this patch makes use of the `multiprocess` and
`dill` Python modules, which together offers a multiprocessing mechanism
that can serialize lambda functions. The multiprocess module also
offers a process pool, which simplifies the code for our parallel
testing framework.
This commit adds support for TorchToTosa lowering of
`aten.broadcast_to` op for cases:
1.) When the rank of input and output tensor is equal.
2.) When the rank of input tensor is zero.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
* Propagate parameter name to MLIR
* Add TorchMlirNode Constructor Hook
* Make func_op mutable
- Purpose of this is to allow modification of func_op by subclass
backend
* Clean up unnecessary changes
* Remove unnecessary attribute case
* Address PR comments
This adds a very long and obnoxious option to disable crashing tests.
The right fix here is to use the right multiprocessing techniques to
ensure that segfaulting tests can be XFAILed like normal tests, but we
currently don't know how to implement "catch a segfault" in Python
(patches or even just ideas welcome).
Motivated by #1361, where we ended up removing two tests from *all*
backends due to a failure in one backend, which is undesirable.
Strength the shape inference for aten.arange-like op by
1. registering aten.sub and aten.ceil.Scalar op and design folders for them.
2. register a new constant-like op: Torch::ConstantNumberOp and design canonicalizer for it.
As @oroppas identified, literal strings that are over 16,380 characters
cause the MSVC compiler to throw an error (C2026), eventually causing
the Windows build of Torch-MLIR to fail because the length of the
generated MLIR for the shape library crosses the allowed threshold.
This patch fixes the problem by making the Python script generate one
literal string per line to satisfy the MSVC compiler.
Thanks to @oroppas for the bulk of the effort required to resolve this!
Summary of changes:
- Updated emitAccessorPrefix since the default value has changed
(https://reviews.llvm.org/D133179)
- Updated RefineTypes pass since Lattice::isUninitialized() is removed
(https://reviews.llvm.org/D132800)
- Updated MHLO tag so that it builds with the updated LLVM tag
- Disabled two tests that cause segfaults in the TOSA backend (see Issue
#1361)
* Add aten.frobenius_norm.dim op and init its conversion pattern to linalg and MHLO,
* run symbolic-shape-optimization before hlo-legalize-to-linalg to fit more mhlo e2e tests.
Summary of changes:
- Update the dataflow analysis in RefineTypes.cpp
- Add tosa-to-arith pass after tosa-to-linalg pass, since
tosa-to-linalg (and canonicalizations) can produce tosa.const() ops
- Fixed warning about not making `matchAndRewrite` as override
This commit adds decomposition of `aten.linear` op. Due to limited
support at tosa backend in case of dynamic dimensions, this
decomposition is currently disabled for tosa backend.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
We use it for more than TorchScript testing now. This is a purely
mechanical change to adjust some file paths to remove "torchscript".
The most perceptible change here is that now e2e tests are run with
```
./tools/e2e_test.sh
instead of:
./tools/torchscript_e2e_test.sh
```
Change logic so that we never run the multiprocessing codepath with only
1 worker. That configuration was causing all subsequent tests to
spuriously fail if one test failed with a crash (this was easy to see
after sorting the tests). That configuration was the one used by the CI.
Also, sort tests to make output nicer.
Also, make verbose mode more verbose so that it is easy to see in `-s`
mode which test is crashing.
This commit adds a method to `TestUtils` that generates random integer
tensors with a similar interface to the `TestUtils.rand`. This commit
also replaces with `tu.randint` all test inputs generated with
`torch.randint`.
We were already hitting many cases where backends different in terms of
the legal ops that they wanted. This caused unnecessary coupling between
the backends. Examples:
- https://github.com/llvm/torch-mlir/pull/1161
- https://github.com/llvm/torch-mlir/pull/862
This PR centralizes all compilation to go through `torch_mlir.compile`
so that we can keep the logic centralized there. We should move these
lists closer to each backend. Especially cases like
https://github.com/llvm/torch-mlir/pull/862 where blocking a
decomposition is necessary to avoid a crash emphasize that the set of
decompositions is tightly coupled to the backend, and should be
"controlled by the backend" and not something arbitrarily tweakable.
Also:
- Fix a small bug in the way we passed through the backendLegalOps
option.
- Add better error messages in `torch_mlir.compile` for import errors.
I recently fixed the handling of the `dim` argument in
`sum_mean_dim` (59fccab857). Therefore,
the checks that the `dim` input is `None` or `[]` are no longer needed.
Bumps the shape library:
- Updates the function signature for aten.arange.start_step
- upstream_shape_functions.mean_dim -> upstream_shape_functions.sum_mean_dim
* Propagate device data names
* Address PR comment
* Add example usage
* Add test for device data names
* Make TorchMlirComputation fields protected
* Add lazy backend device data name unit tests
* Disable lazy backend tests if LTC is disabled
* Add comments
* mac m1 cross compile
Add support for M1 cross compile
* Remove redundant ExecutionEngine
It is registered as part of RegisterEverything
* nuke non-universal zstd
disable LTC
follow up #761:
This patch updates the `torch_mlir::convertTensorToMlirElementsAttr()`
method to enable the creation of tensors whose base type is Float16.
This patch also adds a test to validate the IR generation, and it
updates the test for importing tensors of various types.
PyTorch recently added support for `dim=None` in the `torch.var`
(5ca9b2b6fa)
and `torch.std`op (eb0e30e0bc).
This commit adds the corresponding support in torch-mlir.
Signed-Off By: Vivek Khandelwal<vivek@nod-labs.com>
In some cases, users know that a traced graph is valid for a wider set
of shapes than they originally traced it with. Provide an option for
users to ignore the shapes in the traced graph when they know it is
legal.
Fixes#997
* Replace CHECK_EQ with TORCH_CHECK_EQ
* Check value of TORCH_MLIR_USE_INSTALLED_PYTORCH during LTC build
* Update LTC XFAIL with NewZerosModule ops
* Explicitly blacklist _like ops
* Automatically blacklist new_/_like ops
* Prune away unused Python dependencies from LTC
* Add flag to disable LTC
* Autogen dummy _REFERENCE_LAZY_BACKEND library when LTC is disabled
* Implement compute_shape_var
* Removed Var tests from XFAIL Set
* XFAIL tests using _local_scalar_dense or index.Tensor
* Add StdDim tests to XFAIL set
* Autogen aten::cat