I've upstreamed the necessary quantized linalg Op with the
"channel-first" ordering used by torch
(https://github.com/llvm/llvm-project/pull/107740) for 2d convolution.
This patch changes the lowering for the quantized 2d case of
`aten.convolution` accordingly, which saves three transpositions per
convolution (input, weights, result) and therefore removes the
requirement to try to optimize these away in downstream passes.
### new patterns:
1. Propagates `aten.broadcast_to` ops of a single value to an
`aten.full` op
2. Propagates arithmetic operations through a templated class which
associates some tensor arithmetic ops to their integer-scalar
counterparts. These are a major blocker right now, since some models
have a bunch of rank 0 arithmetic being done with tensor ops. See the
lit test for an interesting example that pads an input to the smallest
shape which will become divisible by twelve in `dim0`. If you think this
is convoluted, you haven't been staring at ONNX generated IR long
enough.
3. Adds a stronger folder for `aten.eq.int` to fold `size.int == 0` to
`false`. See the comment in that conversion pattern for more
justification as to why it is acceptable to make this assumption here.
This is another major blocker for models, since this lack of folding
propagates to lack of folding for subsequent `where.self` operations.
4. Add `AtenSqueezeDim` to the existing `FoldAtenSqueezeOpPattern`
### other changes:
1. Add two new anchor ops: `AtenArangeStartStepOp` and
`Torch::RuntimeAssertOp`. I've checked all possible sources of the
runtime assert ops and it is always shape related. The Arange op only
takes int inputs, and these are all shape related. Adds a size check to
getting a list from literal ops.
2. Improved folders for int arithmetic ops to fold some common patterns.
3. adds the ability to get some values from scalar-tensor ops to
getListFromTensor.
4. further cleans up getListFromTensor for readability.
### points to scrutinize:
1. I made the choice to scalarize `div.Tensor` (int dtype result) to
`floordiv.int`. This is because our shape computations involving this
kind of arithmetic are never negative in practice, and we don't have a
"round towards zero" scalar int divide counterpart.
2. Anchoring on `RuntimeAssertOp` sounds really suspicious, and if
someone happens to add a runtime assert in the future that doesn't boil
down to shapes, then it would add to the worklist considerably. We might
be able to get around this by adding "NoMemoryEffect" to ops which are
"ReadOnly" so that the inputs for the runtime asserts get cse'd with
existing elements of the worklist before we even get to this pass.
This is a first step towards reworking the scalarize-shapes pass which
has been integral to our ONNX frontend path detangling shape
computations.
## Purpose:
1. Restrict the scope of the pass to only apply to op sequences which
are used to compute shapes.
2. Make the pass more efficient by applying patterns in an appropriate
order for scalarization propagation.
3. Report failed scalarization patterns for easier debugging (Not yet
implemented). I can't seem to find a good path for this right now to
capture the right diagnostics. I'd like to defer this addition to a
later patch so we can add some high-value patterns to this pass in the
meantime.
With these changes, some reworking of the conversions themselves will be
necessary.
1. The removal of the SqueezeDim fold pattern was an appropriate fix to
avoid folding a pattern that may be needed to propagate further. The
reversal of pattern application order uncovered this bug. The addition
of rank 0 item logic was added to replace the functionality needed from
the squeeze dim pattern.
2. Rework getListFromTensor to modify a `SmallVector<OpFoldResult>` to
allow processing value tensor literals without immediately materializing
the ints. This should factor out a significant portion of code that was
used in specific cases to handle constants.
## RFC 1:
Currently, we are going to add all prim list of int ops to the worklist.
Can anyone identify problems with uniformly anchoring on prim lists of
ints? E.g. Does there exist a Torch Op satisfying all of the following
conditions:
1. Accepts a list of constant ints, LIST, as an input
2. The role of LIST is **not** shape related. All the examples I can
think of are indeed shape related: padding ints passed to a pad op,
kernel size ints passed to a conv op, size ints passed to a view op,
etc.
4. The LIST is not gotten entirely from scalars already.
If there does not exist a torch op satisfying all three of those
conditions, I think it will be safe to "anchor" on prim lists of ints.
### Conclusion for RFC 1:
I just scanned through the `GeneratedTorchOps.td` and `TorchOps.td` for
all references of `AnyTorchListOfTorchIntType` and verified this will
not be problematic to apply in any of those cases.
## RFC 2:
What should I use to report failed scalarization?
Like my dumb idea was just to walk back through the func op after
applying the passes and check if anything in the worklist is still a
tensor. If so, emit/log a warning. It certainly works, since you can
just look at the warnings and start debugging from the last printed
warning upwards, but there has to be a better way to handle this without
walking back through the func.func op.
### Conclusion for RFC 2:
I tried a few things without much success. The fundamental problem is
that identifying the cause of a failed scalarization could be myriad:
1. We could be missing a pattern for an op entirely: E.g., a pattern we
need is scalarizing rank0 arithmetic ops (e.g. AtenMulTensorOp ->
AtenMulIntOp).
2. We could fail a scalarization pattern because it should fold instead.
This is specifically the case for rank0 where.self ops. These ops MUST
fold, or we need to have custom lowering logic for the rank 0 case.
3. Walking through the func op a second time and emiting a warning for
ops that have tensor result types seems to give locations that are
inconsistent or hard to track in the converted IR. Doing this on IR that
doesn't apply any patterns seems to give decent information, but it's
still dramatically insufficient considering how complex these patterns
can get, and still takes manually reading IR to try and figure out what
is really blocking the simplification.
I'd like to skip out on fleshing out the error reporting for now and
come back to it after iterating a few time on the patterns.
This commit adds the torch-onnx-to-torch-backend pipeline which
converts the Torch Onnx IR to Torch Backend IR.
This commit also moves the `ScalarizeShapes` pass from the
`torch-backend-to-linalg-on-tensors-backend-pipeline` to the
`torch-onnx-to-torch-backend` pipeline since the primary goal of
this pass is to scalarize the shapes in the IR coming from the
Onnx models.
Adds onnx ConvTranspose support for autopadding
(https://github.com/nod-ai/SHARK-ModelDev/issues/839).
- Adds support for attribute auto_pad="SAME_UPPER" or "SAME_LOWER" which
will automatically calculate padding of input based on output shape.
- Adds support, during auto-padding, for output_shape=[H,W] which
overrides the default output shape of input_shape[i]*stride[i] (for
spatial dimensions only).
- Adds lit test for auto-padding.
- Tests are added by https://github.com/nod-ai/SHARK-TestSuite/pull/370
NOTE: ConvTranspose still doesn't support asymmetric padding, therefore
multiple original onnx tests still won't pass.
Set PyTorch and TorchVision version to nightly release 2024-10-15.
Tracker issue for the failing tests added to xfail_set in this PR.
Issue: https://github.com/llvm/torch-mlir/issues/3796
This commit disables the failing sparse tensor tests since they are not
maintained on day-to-day basis and blocks the roll PyTorch update for now.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
- Add Torch to TOSA legalization for the following ops:
+ aten.empty.memory_format
+ aten.scatter.src
+ aten.slice_scatter
+ aten.diag_embed
- Update xfail_sets.py with new e2e results
- Update basic.mlir with new LIT tests
Change-Id: I817ecf207bcfcf97ca54f30c10c76c4f0f4145ae
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Torch-to-linalg pass fails for `EmbeddingBag` when the training only
specific properties of the operator are set to `true.` For instance,
this operator's `sparse` input/property is training-specific, and if the
value of this property is `true,` the existing lowering bails out.
However, we don't need to check for training-specific parameters and
bailout from the legalization since we don't care about these properties
during the eval/inference mode.
---------
Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
Some ops were failing to infer the static component of partially dynamic
shapes, and the cause was a missing aten.slice.t pattern.
The lit test included here is an IR dump created before
DropAbstractInterpCalculations for an unflatten op that was failing to
infer shapes before the change.
-Adds patterns for propagating shapes through AtenWhereSelf and
AtenEqTensor
-Adds fold pattern for a rank0 squeezeDim of a full op
-Adds support for getting a list from a splat ValueTensorLiteralOp for
materializing scalar comparisons in where.self and eq.tensor
With a bit of hammering, these changes should unblock several IREE
inference failures.
This was preventing dynamic dims in an ONNX model from being reified (causing the generation of `tensor.cast`s and preventing fusion in iree):
```mlir
%2 = torch.vtensor.literal(dense<[4, 256]> : tensor<2xsi64>) : !torch.vtensor<[2],si64>]
%7 = torch.prim.ListConstruct %int2 : (!torch.int) -> !torch.list<int>
%8 = torch.aten.reshape %2, %7 : !torch.vtensor<[2],si64>, !torch.list<int> -> !torch.vtensor<[2],si64>
//... chain of foldable ops linking %2 to the `shape` operand of a `torch.aten.broadcast_to ... -> !torch.vtensor<[?,?],si64>`
```
This patch adds two things:
1. support for folding scalar patterns like [1]---squeeze--->[]
---unsqueeze--->[1].
2. a canonicalizer for aten.view that applies when we can statically or
dynamically (through the scalarized view shapes) infer that it is a
flatten or unflatten op in the last dim.
I'm not sure if this is the right place to be adding such a view
canonicalizer. Catastrophically, there is a decomposition from flatten
and unflatten into aten.view. Until this gets deleted (and it definitely
should be deleted), I felt like this would be an appropriate temporary
home. We run scalarize shapes after lowering to the backend contract
(i.e., decomposing), and scalarize shapes is required to be able to
infer dynamic dims coming from size int ops.
# Description
Implementation of the op for `torch.aten.unfold`: [TorchToLinalg Op
Support #347](https://github.com/nod-ai/SHARK-ModelDev/issues/849)
Documentation of op can be found here: [PyTorch
Docs](https://pytorch.org/docs/stable/generated/torch.Tensor.unfold.html)
For this op, we apply a sliding window of some `size` along a single
`dimension`, with `step` in between iterations.
`Declaration: aten::unfold(Tensor(a) self, int dimension, int size, int
step) -> Tensor(a)`
The resulting `unfolded` tensor modifies the shape of `dimension` to be
equal to the number of blocks that the sliding windows extracts/inserts,
with an additional dimension of `size` appended (the number of cols of
the output tensor directly translates from the size of the sliding
window).
So if we had a tensor of rank 3 (A x B x C), with dimension = 1, size =
2 and step = 2:
(A x B x C) |=> (A x (B - size) // step + 1 x C x size)
After extracting the window from the input tensor, we insert the (1 x
size) slice into the output tensor. We can make this simpler by mapping
the output indices from the input indices, like they do in the official
implementation:
[PyTorch
Code](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/lowering.py#L1694)
- Support Bidirectional LSTM (utilising the forward LSTM layer with
flipped Inputs and Outputs)
- Support layout 1
- Support default cases for attr `clip` and `input_forget`
- Support returning partial outputs (1-3)
- fixes for alt_e2e_tests lstm tests (1,2,3)
This commit adds the support for the 1-d depthwise convolution as a
special case of 1-d group convolution.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
This commit adds the support for negative step values in
aten.slice.Tensor op. Although, PyTorch does not allow negative step
value for slice op but the Onnx.Slice op supports negative step value
which eventually lowers to torch.aten.slice.Tensor op. Hence, the
support is added for handling those kind of values during the
Torch->Linalg lowering of aten.slice.Tensor op.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
- Add Torch to TOSA lowering for aten.fill.Scalar/Tensor, aten.flip, and
aten.round
- Fix torchScalarToTosaTensor function to correctly convert Torch scalar
input to TOSA tensor
- Update xfail_sets.py with new e2e results
- Update basic.mlir with LIT tests for new ops
Change-Id: If1e42c2e582710dd8ad0465eed29806fbcdbde41
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
- Add Torch to TOSA legalization for aten.index_select
- Fix createOneDimTfIndices function in TosaLegalizeCommon.cpp to
correctly convert Torch indices to TF-style indices, which is used in
convertGatherNdOp
- Update e2e tests in xfail_sets.py
- Update basic.mlir with new LIT test for aten.index_select
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Change-Id: I52519246183949353a3cf22f0a685fe3df8ec8ff
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Addresses ~200 onnx model compile failures in
<https://github.com/nod-ai/SHARK-TestSuite> related to
<https://github.com/iree-org/iree/issues/18631>.
This change simplifies the result of the generated broadcast op
substantially, but reduces the case coverage slightly.
The case which will become unsupported:
- trying to actually broadcast a dynamic dim that is secretly 1.
When does this case appear in practical scenarios?
- for a model where onnx shape inference cannot figure out that a dim
should be 1.
Why do I think we should not support this case for now?
1. For all models with dynamic dim expand ops, the previous path
uniformly generates uglier linalg IR (making it harder for IREE to fuse
properly with other ops).
2. For models failing shape inference castastrophically enough to fail
to see a dim is statically 1, we can try to apply constant folding in
the onnx model before importing.
Leaving this as a draft PR, since it may be more appropriate to fix the
compilation failure in IREE rather than torch-mlir.
### Example of broadcast required in previous path:
```mlir
%300 = linalg.generic {indexing_maps = [#map11], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} outs(%299 : tensor<?x12x?x?xi1>) {
^bb0(%out: i1):
%306 = linalg.index 0 : index
%307 = linalg.index 3 : index
%308 = arith.index_cast %285 : i64 to index
%309 = arith.cmpi eq, %308, %c1 : index
%310 = arith.select %309, %c0, %306 : index
%311 = arith.index_cast %286 : i64 to index
%312 = arith.cmpi eq, %311, %c1 : index
%313 = arith.select %312, %c0, %307 : index
%extracted_79 = tensor.extract %reshape_78[%310, %c0, %c0, %313] : tensor<?x1x1x?xi1>
linalg.yield %extracted_79 : i1
} -> tensor<?x12x?x?xi1>
```
### Example of broadcast with simplified shape list:
```mlir
%409 = linalg.generic {indexing_maps = [#map15, #map11], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%reshape_135 : tensor<?x1x1x?xi1>) outs(%408 : tensor<?x12x?x?xi1>) {
^bb0(%in: i1, %out: i1):
linalg.yield %in : i1
} -> tensor<?x12x?x?xi1>
```
The op can be valid with no attached shape symbols if they are not
required by the corresponding affine map. Fix the verifier to consider
number of arguments for both.
- Add lowering from Torch to TOSA for aten.diagonal
- Clean up some code
- Update xfail_sets.py with the new e2e results
- Update basic_mlir with the new op mlir test
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Change-Id: I99bed685455752d09ed96edd837c4dfbee152701
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Instead of
`Unhandled type in getScalarTypeForType`
You now get
Unhandled type in getScalarTypeForType: (type name)
Type properties:
Is integer: yes
Bit width:
...
The root cause is https://github.com/llvm/torch-mlir/issues/3720, at
least for unsigned integer issues.
Fixes https://github.com/iree-org/iree/issues/18562.
During canonicalization pass on `AtenUnflattenIntOp`, if the second dim
was statically equal to one, we would create an `AtenAddIntOp` to add
one to the dimension obtained from `op.getDim()`. This, when passed into
`Torch::unsqueezeTensor()`, would make it get interpreted as
non-constant, which would lead to MLIR failing an assertion when
`UnsqueezeOp` would later get lowered into `ExpandShapeOp`, as the
output of the `UnsqueezeOp` would consist of only dynamic dims.
This patch fixes this behavior, by extracting the integer value from the
dim if it was constant, and then emitting a `ConstantIntOp` from
(dim+1). This creates an output with static shape.
- When the signal tensor is real, onnx allows its shape to be
`[batch][length]` as well as `[batch][length][1]`.
- Onnx also allows to specify `frame_length` together with `window` (not
empty), given that it matches the window size.
- Adding checks on signal and result shapes.
Current version does not work for a mixture of dynamic and static shaped
batch dimensions. Rework to grab the correct dynamic shapes.
---------
Co-authored-by: dan <danimal197@gmail.com>
Previously, if the value was absent, this conversion was creating a
dense resource of value 0 with shape equal to the result shape, then
later re-extracting a splat value. This only works if the shape is
statically known, and even when the shape is known, this is completely
unnecessary since the value's shape should be `[1]` and not the result
shape.
This patch simply sets the `splatvalue` to a `torch.constant.float 0.0`
when the onnx op's `value` attr is absent, and adds `nullptr` checks to
the subsequent conditionals to avoid them in the case where an `attr` is
not given.
Addresses <https://github.com/nod-ai/SHARK-Turbine/issues/831>.
- Add Torch to TOSA legalization for the following reduction ops:
+ aten.min.dim
+ aten.min
+ aten.max
+ aten.prod
+ aten.prod.dim_int
+ aten.all.dim
- Add dtype casting support for reduce sum and prod ops
- Extend aten.max.dim legalization to a template to support aten.min.dim
legalization
- Update end-to-end tests sets in xfail_sets.py
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Change-Id: I854dd6c0c55e570c1fb7242f20c85cf64d6e7fe0
Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Follow up cleanup for [this
PR](https://github.com/llvm/torch-mlir/pull/3689), which introduced a
decomposition for `aten.fmod.Tensor`. This means that the lowering for
this operator in linalg is no longer needed.
Thanks to @vivekkhandelwal1 for pointing this out.
---------
Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>