Commit Graph

844 Commits (aca33f1742096e7e6cb3152be15140cf9f71e508)

Author SHA1 Message Date
zjgarvey 140cad5659
Add More Scalarize Shapes Patterns (#3810)
### new patterns:

1. Propagates `aten.broadcast_to` ops of a single value to an
`aten.full` op
2. Propagates arithmetic operations through a templated class which
associates some tensor arithmetic ops to their integer-scalar
counterparts. These are a major blocker right now, since some models
have a bunch of rank 0 arithmetic being done with tensor ops. See the
lit test for an interesting example that pads an input to the smallest
shape which will become divisible by twelve in `dim0`. If you think this
is convoluted, you haven't been staring at ONNX generated IR long
enough.
3. Adds a stronger folder for `aten.eq.int` to fold `size.int == 0` to
`false`. See the comment in that conversion pattern for more
justification as to why it is acceptable to make this assumption here.
This is another major blocker for models, since this lack of folding
propagates to lack of folding for subsequent `where.self` operations.
4. Add `AtenSqueezeDim` to the existing `FoldAtenSqueezeOpPattern`

### other changes:
 
1. Add two new anchor ops: `AtenArangeStartStepOp` and
`Torch::RuntimeAssertOp`. I've checked all possible sources of the
runtime assert ops and it is always shape related. The Arange op only
takes int inputs, and these are all shape related. Adds a size check to
getting a list from literal ops.
2. Improved folders for int arithmetic ops to fold some common patterns.
3. adds the ability to get some values from scalar-tensor ops to
getListFromTensor.
4. further cleans up getListFromTensor for readability.

### points to scrutinize:

1. I made the choice to scalarize `div.Tensor` (int dtype result) to
`floordiv.int`. This is because our shape computations involving this
kind of arithmetic are never negative in practice, and we don't have a
"round towards zero" scalar int divide counterpart.
2. Anchoring on `RuntimeAssertOp` sounds really suspicious, and if
someone happens to add a runtime assert in the future that doesn't boil
down to shapes, then it would add to the worklist considerably. We might
be able to get around this by adding "NoMemoryEffect" to ops which are
"ReadOnly" so that the inputs for the runtime asserts get cse'd with
existing elements of the worklist before we even get to this pass.
2024-10-21 19:42:39 -05:00
zjgarvey a83e106f92
Rework Scalarize Shapes Pass (#3799)
This is a first step towards reworking the scalarize-shapes pass which
has been integral to our ONNX frontend path detangling shape
computations.

## Purpose:

1. Restrict the scope of the pass to only apply to op sequences which
are used to compute shapes.
2. Make the pass more efficient by applying patterns in an appropriate
order for scalarization propagation.
3. Report failed scalarization patterns for easier debugging (Not yet
implemented). I can't seem to find a good path for this right now to
capture the right diagnostics. I'd like to defer this addition to a
later patch so we can add some high-value patterns to this pass in the
meantime.

With these changes, some reworking of the conversions themselves will be
necessary.

1. The removal of the SqueezeDim fold pattern was an appropriate fix to
avoid folding a pattern that may be needed to propagate further. The
reversal of pattern application order uncovered this bug. The addition
of rank 0 item logic was added to replace the functionality needed from
the squeeze dim pattern.
2. Rework getListFromTensor to modify a `SmallVector<OpFoldResult>` to
allow processing value tensor literals without immediately materializing
the ints. This should factor out a significant portion of code that was
used in specific cases to handle constants.

## RFC 1:

Currently, we are going to add all prim list of int ops to the worklist.
Can anyone identify problems with uniformly anchoring on prim lists of
ints? E.g. Does there exist a Torch Op satisfying all of the following
conditions:

1. Accepts a list of constant ints, LIST, as an input
2. The role of LIST is **not** shape related. All the examples I can
think of are indeed shape related: padding ints passed to a pad op,
kernel size ints passed to a conv op, size ints passed to a view op,
etc.
4. The LIST is not gotten entirely from scalars already. 

If there does not exist a torch op satisfying all three of those
conditions, I think it will be safe to "anchor" on prim lists of ints.

### Conclusion for RFC 1: 

I just scanned through the `GeneratedTorchOps.td` and `TorchOps.td` for
all references of `AnyTorchListOfTorchIntType` and verified this will
not be problematic to apply in any of those cases.

## RFC 2:

What should I use to report failed scalarization?

Like my dumb idea was just to walk back through the func op after
applying the passes and check if anything in the worklist is still a
tensor. If so, emit/log a warning. It certainly works, since you can
just look at the warnings and start debugging from the last printed
warning upwards, but there has to be a better way to handle this without
walking back through the func.func op.

### Conclusion for RFC 2:

I tried a few things without much success. The fundamental problem is
that identifying the cause of a failed scalarization could be myriad:

1. We could be missing a pattern for an op entirely: E.g., a pattern we
need is scalarizing rank0 arithmetic ops (e.g. AtenMulTensorOp ->
AtenMulIntOp).
2. We could fail a scalarization pattern because it should fold instead.
This is specifically the case for rank0 where.self ops. These ops MUST
fold, or we need to have custom lowering logic for the rank 0 case.
3. Walking through the func op a second time and emiting a warning for
ops that have tensor result types seems to give locations that are
inconsistent or hard to track in the converted IR. Doing this on IR that
doesn't apply any patterns seems to give decent information, but it's
still dramatically insufficient considering how complex these patterns
can get, and still takes manually reading IR to try and figure out what
is really blocking the simplification.

I'd like to skip out on fleshing out the error reporting for now and
come back to it after iterating a few time on the patterns.
2024-10-21 12:47:19 -05:00
Vivek Khandelwal fa4794dae2
[MLIR][TORCH] Add torch-onnx-to-torch-backend pipeline (#3801)
This commit adds the torch-onnx-to-torch-backend pipeline which
converts the Torch Onnx IR to Torch Backend IR.

This commit also moves the `ScalarizeShapes` pass from the
`torch-backend-to-linalg-on-tensors-backend-pipeline` to the
`torch-onnx-to-torch-backend` pipeline since the primary goal of
this pass is to scalarize the shapes in the IR coming from the
Onnx models.
2024-10-21 11:20:44 -05:00
Vivek Khandelwal 9c7067649b
build: manually update PyTorch version (#3727)
Set PyTorch and TorchVision version to nightly release 2024-10-15.

Tracker issue for the failing tests added to xfail_set in this PR.
Issue: https://github.com/llvm/torch-mlir/issues/3796
This commit disables the failing sparse tensor tests since they are not 
maintained on day-to-day basis and blocks the roll PyTorch update for now.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-10-18 13:32:14 +05:30
yyp0 dc7a1ff7d9
[Torch] add fold logic for some ops (#3794) 2024-10-16 16:00:58 +08:00
zjgarvey 1e431c6a90
Add AtenSliceTOp Canonicalization to SimplifyShapeCalculations pass (#3791)
Some ops were failing to infer the static component of partially dynamic
shapes, and the cause was a missing aten.slice.t pattern.

The lit test included here is an IR dump created before
DropAbstractInterpCalculations for an unflatten op that was failing to
infer shapes before the change.
2024-10-14 14:41:31 -05:00
Marius Brehler edd1bbec46
Integrate LLVM at llvm/llvm-project@c13f806 (#3789) 2024-10-14 15:00:45 +02:00
yyp0 b176939808
[Torch] support 1d aten tensor shape and dtype infer (#3776) 2024-10-12 17:51:15 +08:00
zjgarvey ab62f35373
Add more patterns to scalarize-shapes pass (#3781)
-Adds patterns for propagating shapes through AtenWhereSelf and
AtenEqTensor
-Adds fold pattern for a rank0 squeezeDim of a full op 
-Adds support for getting a list from a splat ValueTensorLiteralOp for
materializing scalar comparisons in where.self and eq.tensor

With a bit of hammering, these changes should unblock several IREE
inference failures.
2024-10-11 11:15:17 -05:00
yyp0 7b11dfc0ee
[Torch] support adaptive_max_pool1d when return_indices equals False (#3783) 2024-10-11 23:42:15 +08:00
Ian Wood 8787970afe
[Torch] Fold no-op reshape (#3769)
This was preventing dynamic dims in an ONNX model from being reified (causing the generation of `tensor.cast`s and preventing fusion in iree):

```mlir
%2 = torch.vtensor.literal(dense<[4, 256]> : tensor<2xsi64>) : !torch.vtensor<[2],si64>]
%7 = torch.prim.ListConstruct %int2 : (!torch.int) -> !torch.list<int>
%8 = torch.aten.reshape %2, %7 : !torch.vtensor<[2],si64>, !torch.list<int> -> !torch.vtensor<[2],si64>
//... chain of foldable ops linking %2 to the `shape` operand of a `torch.aten.broadcast_to ... -> !torch.vtensor<[?,?],si64>`
```
2024-10-10 18:54:27 -07:00
zjgarvey 2665ed343b
adds a few common patterns to scalarize shapes pass (#3779)
This patch adds two things:

1. support for folding scalar patterns like [1]---squeeze--->[]
---unsqueeze--->[1].
2. a canonicalizer for aten.view that applies when we can statically or
dynamically (through the scalarized view shapes) infer that it is a
flatten or unflatten op in the last dim.

I'm not sure if this is the right place to be adding such a view
canonicalizer. Catastrophically, there is a decomposition from flatten
and unflatten into aten.view. Until this gets deleted (and it definitely
should be deleted), I felt like this would be an appropriate temporary
home. We run scalarize shapes after lowering to the backend contract
(i.e., decomposing), and scalarize shapes is required to be able to
infer dynamic dims coming from size int ops.
2024-10-10 10:16:45 -05:00
Stephen Baione d49eabb3fc
Add Op for `torch.aten.unfold` (#3772)
# Description

Implementation of the op for `torch.aten.unfold`: [TorchToLinalg Op
Support #347](https://github.com/nod-ai/SHARK-ModelDev/issues/849)

Documentation of op can be found here: [PyTorch
Docs](https://pytorch.org/docs/stable/generated/torch.Tensor.unfold.html)

For this op, we apply a sliding window of some `size` along a single
`dimension`, with `step` in between iterations.

`Declaration: aten::unfold(Tensor(a) self, int dimension, int size, int
step) -> Tensor(a)`

The resulting `unfolded` tensor modifies the shape of `dimension` to be
equal to the number of blocks that the sliding windows extracts/inserts,
with an additional dimension of `size` appended (the number of cols of
the output tensor directly translates from the size of the sliding
window).

So if we had a tensor of rank 3 (A x B x C), with dimension = 1, size =
2 and step = 2:

    (A x B x C) |=> (A x (B - size) // step + 1 x C x size)

After extracting the window from the input tensor, we insert the (1 x
size) slice into the output tensor. We can make this simpler by mapping
the output indices from the input indices, like they do in the official
implementation:

[PyTorch
Code](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/lowering.py#L1694)
2024-10-08 21:10:43 +00:00
Prathamesh Tagore 617c1c76ce
[torch.bind_symbolic_shape] Fix verifier for shapeSymbol detection (#3751)
The op can be valid with no attached shape symbols if they are not
required by the corresponding affine map. Fix the verifier to consider
number of arguments for both.
2024-10-02 05:55:54 -07:00
yyp0 eb4e59e189
[Torch] support binary_cross_entropy_with_logits decomposition (#3741) 2024-09-29 17:41:20 +08:00
Xida Ren (Cedar) 9938abf25e
AtenCumprodOp (#3737) 2024-09-26 18:17:22 -04:00
yyp0 335cf5f6d0
[stablehlo] support aten_adaptive_max_pool1d lowering (#3728) 2024-09-26 11:42:38 +08:00
Xida Ren (Cedar) aa7e77ee64
Better errmsg upon getScalarTypeForType failure (#3734)
Instead of 
`Unhandled type in getScalarTypeForType`

You now get

Unhandled type in getScalarTypeForType: (type name)
Type properties:
  Is integer: yes
  Bit width: 
...


The root cause is https://github.com/llvm/torch-mlir/issues/3720, at
least for unsigned integer issues.
2024-09-25 16:32:26 +00:00
Vinayak Dev 67732883fa
[torch] Fix unsqueezed output shape in canonicalization of AtenUnflattenIntOp (#3730)
Fixes https://github.com/iree-org/iree/issues/18562.

During canonicalization pass on `AtenUnflattenIntOp`, if the second dim
was statically equal to one, we would create an `AtenAddIntOp` to add
one to the dimension obtained from `op.getDim()`. This, when passed into
`Torch::unsqueezeTensor()`, would make it get interpreted as
non-constant, which would lead to MLIR failing an assertion when
`UnsqueezeOp` would later get lowered into `ExpandShapeOp`, as the
output of the `UnsqueezeOp` would consist of only dynamic dims.

This patch fixes this behavior, by extracting the integer value from the
dim if it was constant, and then emitting a `ConstantIntOp` from
(dim+1). This creates an output with static shape.
2024-09-24 11:45:18 -05:00
zjgarvey d61986cfcf
Add Decompostion for `Aten_SafeSoftmaxOp` (#3708)
Co-authored-by: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-09-12 16:58:10 -05:00
yyp0 edf725ef42
[Torch] add AtenAsStridedOp in torch dialect (#3706) 2024-09-12 19:07:11 +08:00
Yuanqiang Liu 3f07077ff9
[Torch] enhance fold of aten.alias (#3705) 2024-09-12 17:04:57 +08:00
Branko Trifkovic 1c4b9d6a0e
Implement lowering of torch.aten.hstack (#3563) 2024-09-11 16:41:47 +05:30
Rob Suderman 6934ab81b0
Bump llvm/llvm-project@b6603e1bf1 (#3697)
Bump forward and refactor inline global slots to no longer track via
symlinks. This appears to make the tests past until we manage to remove
torchscript work.
2024-09-10 08:57:15 -07:00
Srinath Avadhanula 0a788e0467
Decompose aten.fmod into aten.mul,sub,div etc. (#3689)
As titled, create a new decomposition for `aten.fmod.Tensor` to
`aten.div`, `aten.trunc`, `aten.mul` and `aten.sub`. Note that we only
use `aten.trunc` for floating point operations. This further gets
decomposed to `aten.where` etc. by other existing decompositions.

This decomposition now makes TOSA pass for a simple model with
`aten.fmod` while it makes `stablehlo` fail. For now, we disallow this
decomposition for `stablehlo`

---------

Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>
2024-09-09 09:00:11 -07:00
Branko Trifkovic 70d5730c87
[LINALG] Implement lowering of torch.aten.rot90 (#3551) 2024-09-06 10:36:17 +05:30
zjgarvey 295bf418a4
Add a canonicalization pattern for `aten.unflatten.int` (#3656)
Addresses an issue in <https://github.com/llvm/torch-mlir/issues/3651>
where some unflatten ops generated from onnx models weren't propagating
static shape information. It may be necessary to add further
optimizations for the more general case when some static information is
present in the unflatten (or possibly reshape/view) op's `sizes` list,
but not reflected in the output shape. These ops will only successfully
infer shapes if the `sizes` list is gotten from a list of constant ints
(with possibly one -1). A common example where this fails is when some
of the `sizes` are determined from `aten.size.int` ops on dynamic
tensors, and other `sizes` are known statically.

This PR includes:
- a canonicalizer for `aten.unflatten.int` which converts to
`aten.unsqueeze` when it is expanding one dim to two, and one of the new
dims is statically 1.
- an improvement to the folder for `aten.__or__.bool` which does not
rely on *both* operands being static.
2024-09-03 16:38:20 -07:00
Ze Zhang b3942ff984
Add canonicalize pattern for aten.mul.int and aten.floordiv.int (#3680)
This PR add `floordiv` to the `PY_BUILTIN_TO_TORCH_OP`. For
`aten.mul.int` and `aten.floordiv.int` ops, we add new Canonicalization
Patterns as follow:

```
%1 = torch.aten.mul.int %input, %const-5
%2 = torch.aten.mul.int %1, %const-6
```

Will be replaced by

`torch.aten.mul.int %input, %const-30`


And 

```
%1 = torch.aten.mul.int %input, %const-5
%2 = torch.aten.floordiv.int %1, %const-5
```
Will directly return `%input`


This PR also relaxes the `float` type constraint in TorchToTosa for the
`AtenRsubScalarOp` conversion.



To test:

`cmake --build build --target check-torch-mlir-all`
2024-09-03 09:13:59 -07:00
Vivek Khandelwal 567ed44fd0
[MLIR][TORCH] Add E2E support for aten.polar op (#3671)
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-09-03 10:51:03 +05:30
lingzhiz1998 5bc59ce1fa
[TorchToLinalg] Support lowering MaxPool3dWithIndices (#3652)
Support torch.MaxPool3dWithIndices lowering to linalg backend.
2024-08-27 14:14:25 -05:00
Xida Ren (Cedar) eb7bf78a9c
Add RestructureNonConstantAxes pass to address reduce op tests failing on non constant axes (#3600) 2024-08-26 14:06:06 -07:00
Rob Suderman f9766c89f6
[onnx] Handle `torch.aten` for inner product case (#3634)
The following case was failing to lower for einsum. This fixes up the
inner product issue.
2024-08-24 11:41:25 -07:00
Vivek Khandelwal fcc5f444cd
MLIR][TORCH] Fix GroupNorm decomposition by adding shape info (#3658)
This commit adds the shape info for the tensors created during the
decomposition of GroupNorm op.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-08-22 21:20:40 +05:30
Vivek Khandelwal 0a86deb59a
build: manually update PyTorch version (#3627)
Set PyTorch and TorchVision version to nightly release 2024-08-18.
This commit also updates the `scaled_dot_product_attention` op. 
A new attribute `enable_gqa` has been added. As of now, only the
default value for the same is supported.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-08-19 12:03:56 +05:30
pkapris-syrmia 23ec5399e5
Implement lowering of aten.atleast_2d (#3546)
This operator is needed to implement aten.vstack, which will be
submitted in a subsequent PR
2024-08-14 18:52:31 +05:30
pkapris-syrmia 10fe5d08d1
Implement lowering for torch.aten.rad2deg (#3586) 2024-08-14 16:37:28 +05:30
Rob Suderman 9ab93436c4
[torch] Support diagonal `einsum.Diagonal` (#3618)
The einsum lowering was missing the behavior for duplicate indices in
the equation. This amounts to a diagonalization along duplicate pairs of
indices in the equation.
2024-08-13 09:38:43 -07:00
Yuanqiang Liu c5b3cf299a
[Torch] emit upsample_nearest1d/2d/vec, and add shape/dtype functions (#3629) 2024-08-13 19:14:24 +08:00
Felix Schneider 0314188dbe
[torch] Basic support for per-channel quantized graphs (#3623)
This patch adds basic support for lowering graphs with per-channel
quantization. Per-channel quantized ops have to be excluded from
`FuseQuantizedOps` for now but can be used in QDQ quantized form.

Using this patch, we're able to import and execute (on the linalg
backend) graphs with per-channel quantization applied using the "new"
PyTorch 2.0 Export Quantization.
2024-08-10 15:51:09 +02:00
Rob Suderman fd98476f77
[torch] Unpacking sometimes misses shape inference (#3609)
It is possible that the unpacked tensor does not match the same inferred
shapes. This is pretty common when ingesting form the `onnx` frontend.
2024-08-08 16:17:31 -07:00
Rob Suderman 59a4c6fda4
[onnx] Fix transposition code for `onnx.OneHot` (#3606)
The post onehot transposition code was unexercised. Fixed the test and
transformation to check use.
2024-08-07 18:20:26 -07:00
Chi_Liu a51b4e014a
[Torch] Disable 1-d quantized convolution (#3601)
To fix https://github.com/nod-ai/SHARK-Turbine/issues/253#issuecomment-2271815640
Prevent fusion for 1d convolution ops and just do it as an f32 conv
since there isn't a linalg named op for quantized 1-d convolution yet.  
Get 24 onnx eca* models passed in iree-comiple.
2024-08-07 09:01:16 -07:00
Rob Suderman 7e7af67080
Avoid warnings-as-errors build failure (#3588)
Lambda needs a return value to avoid a build failure.
2024-08-02 12:27:31 -07:00
yyp0 22cd4441e7
[Torch] Add support for static uneven divisible AdaptiveAvgPool2d (#3566)
The static uneven divisible AdaptiveAvgPool2d means that although the
input size is not an integer multiple of ouput size, but the kernel and
stride size can also be fixed (not dynamic). The derivation logic of
kernel and stride size is consistent with
torch/_decomp/decomposations.py:adaptive_avg_pool2d as described in the
following:

1. Stride Size
Firstly , derive the start index in each reduce operation according to
the output size (`n`), `start_index = ([0, 1, ..., n - 1] * input_size)
// output_size`. For each index `k`, if `k * (input_size % output_size)
< output_size`, then the current and previous stride keeps the same as
`input_size // output_size`. So suppose `(n-1) * (input_size %
output_size) < output_size`, the stride in the whole AdaptiveAvgPool2d
process keeps static, as `input_size // output_size`.

2. Kernel Size
torch/_decomp/decomposations.py:adaptive_avg_pool2d calculates a static
kernel size when the input/output sizes satisfy either of the two
conditions, `input_size % output_size == 0` or `output_size %
(input_size % output_size) == 0`. Here if `input_size % output_size ==
0`, then the kernel size equals `input_size // output_size`, otherwise
`input_size // output_size + 1.`
2024-08-01 11:37:53 +08:00
yyp0 f49b9c14f1
[Torch] Add support for Aten__Or__BoolOp (#3574) 2024-07-31 17:23:53 +08:00
Ivan Butygin 8bd1b9751f
`max_unpool3d` linalg lowering (#3536)
An attempt of  `aten.max_unpool3d` to linalg lowering.
There are known issues with this implementation (see comment in code).
2024-07-30 20:59:17 +03:00
Vivek Khandelwal b6e4725259
[ONNX] Add OnnxToTorch lowering for NonMaxSuppression op (#3501)
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-07-26 21:01:27 +05:30
yyp0 ea60d72489
[Torch] Add AtenMaskedFillTensorOp support (#3561) 2024-07-26 15:32:13 +08:00
Yuanqiang Liu 003b06dfa1
[Torch] enhance naryFolderHelper to support mixed dtypes (#3559)
* so that it could support like `i64 + f64 => f64`.
* also unify `aten.log`'s folder code to use `naryFolderHelper`.
2024-07-24 17:54:59 +08:00
Yuanqiang Liu aad1604046
[Torch] enhance fold of aten.squeeze.dim (#3558) 2024-07-24 14:13:48 +08:00