### new patterns:
1. Propagates `aten.broadcast_to` ops of a single value to an
`aten.full` op
2. Propagates arithmetic operations through a templated class which
associates some tensor arithmetic ops to their integer-scalar
counterparts. These are a major blocker right now, since some models
have a bunch of rank 0 arithmetic being done with tensor ops. See the
lit test for an interesting example that pads an input to the smallest
shape which will become divisible by twelve in `dim0`. If you think this
is convoluted, you haven't been staring at ONNX generated IR long
enough.
3. Adds a stronger folder for `aten.eq.int` to fold `size.int == 0` to
`false`. See the comment in that conversion pattern for more
justification as to why it is acceptable to make this assumption here.
This is another major blocker for models, since this lack of folding
propagates to lack of folding for subsequent `where.self` operations.
4. Add `AtenSqueezeDim` to the existing `FoldAtenSqueezeOpPattern`
### other changes:
1. Add two new anchor ops: `AtenArangeStartStepOp` and
`Torch::RuntimeAssertOp`. I've checked all possible sources of the
runtime assert ops and it is always shape related. The Arange op only
takes int inputs, and these are all shape related. Adds a size check to
getting a list from literal ops.
2. Improved folders for int arithmetic ops to fold some common patterns.
3. adds the ability to get some values from scalar-tensor ops to
getListFromTensor.
4. further cleans up getListFromTensor for readability.
### points to scrutinize:
1. I made the choice to scalarize `div.Tensor` (int dtype result) to
`floordiv.int`. This is because our shape computations involving this
kind of arithmetic are never negative in practice, and we don't have a
"round towards zero" scalar int divide counterpart.
2. Anchoring on `RuntimeAssertOp` sounds really suspicious, and if
someone happens to add a runtime assert in the future that doesn't boil
down to shapes, then it would add to the worklist considerably. We might
be able to get around this by adding "NoMemoryEffect" to ops which are
"ReadOnly" so that the inputs for the runtime asserts get cse'd with
existing elements of the worklist before we even get to this pass.
This is a first step towards reworking the scalarize-shapes pass which
has been integral to our ONNX frontend path detangling shape
computations.
## Purpose:
1. Restrict the scope of the pass to only apply to op sequences which
are used to compute shapes.
2. Make the pass more efficient by applying patterns in an appropriate
order for scalarization propagation.
3. Report failed scalarization patterns for easier debugging (Not yet
implemented). I can't seem to find a good path for this right now to
capture the right diagnostics. I'd like to defer this addition to a
later patch so we can add some high-value patterns to this pass in the
meantime.
With these changes, some reworking of the conversions themselves will be
necessary.
1. The removal of the SqueezeDim fold pattern was an appropriate fix to
avoid folding a pattern that may be needed to propagate further. The
reversal of pattern application order uncovered this bug. The addition
of rank 0 item logic was added to replace the functionality needed from
the squeeze dim pattern.
2. Rework getListFromTensor to modify a `SmallVector<OpFoldResult>` to
allow processing value tensor literals without immediately materializing
the ints. This should factor out a significant portion of code that was
used in specific cases to handle constants.
## RFC 1:
Currently, we are going to add all prim list of int ops to the worklist.
Can anyone identify problems with uniformly anchoring on prim lists of
ints? E.g. Does there exist a Torch Op satisfying all of the following
conditions:
1. Accepts a list of constant ints, LIST, as an input
2. The role of LIST is **not** shape related. All the examples I can
think of are indeed shape related: padding ints passed to a pad op,
kernel size ints passed to a conv op, size ints passed to a view op,
etc.
4. The LIST is not gotten entirely from scalars already.
If there does not exist a torch op satisfying all three of those
conditions, I think it will be safe to "anchor" on prim lists of ints.
### Conclusion for RFC 1:
I just scanned through the `GeneratedTorchOps.td` and `TorchOps.td` for
all references of `AnyTorchListOfTorchIntType` and verified this will
not be problematic to apply in any of those cases.
## RFC 2:
What should I use to report failed scalarization?
Like my dumb idea was just to walk back through the func op after
applying the passes and check if anything in the worklist is still a
tensor. If so, emit/log a warning. It certainly works, since you can
just look at the warnings and start debugging from the last printed
warning upwards, but there has to be a better way to handle this without
walking back through the func.func op.
### Conclusion for RFC 2:
I tried a few things without much success. The fundamental problem is
that identifying the cause of a failed scalarization could be myriad:
1. We could be missing a pattern for an op entirely: E.g., a pattern we
need is scalarizing rank0 arithmetic ops (e.g. AtenMulTensorOp ->
AtenMulIntOp).
2. We could fail a scalarization pattern because it should fold instead.
This is specifically the case for rank0 where.self ops. These ops MUST
fold, or we need to have custom lowering logic for the rank 0 case.
3. Walking through the func op a second time and emiting a warning for
ops that have tensor result types seems to give locations that are
inconsistent or hard to track in the converted IR. Doing this on IR that
doesn't apply any patterns seems to give decent information, but it's
still dramatically insufficient considering how complex these patterns
can get, and still takes manually reading IR to try and figure out what
is really blocking the simplification.
I'd like to skip out on fleshing out the error reporting for now and
come back to it after iterating a few time on the patterns.
This commit adds the torch-onnx-to-torch-backend pipeline which
converts the Torch Onnx IR to Torch Backend IR.
This commit also moves the `ScalarizeShapes` pass from the
`torch-backend-to-linalg-on-tensors-backend-pipeline` to the
`torch-onnx-to-torch-backend` pipeline since the primary goal of
this pass is to scalarize the shapes in the IR coming from the
Onnx models.
Set PyTorch and TorchVision version to nightly release 2024-10-15.
Tracker issue for the failing tests added to xfail_set in this PR.
Issue: https://github.com/llvm/torch-mlir/issues/3796
This commit disables the failing sparse tensor tests since they are not
maintained on day-to-day basis and blocks the roll PyTorch update for now.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Some ops were failing to infer the static component of partially dynamic
shapes, and the cause was a missing aten.slice.t pattern.
The lit test included here is an IR dump created before
DropAbstractInterpCalculations for an unflatten op that was failing to
infer shapes before the change.
-Adds patterns for propagating shapes through AtenWhereSelf and
AtenEqTensor
-Adds fold pattern for a rank0 squeezeDim of a full op
-Adds support for getting a list from a splat ValueTensorLiteralOp for
materializing scalar comparisons in where.self and eq.tensor
With a bit of hammering, these changes should unblock several IREE
inference failures.
This patch adds two things:
1. support for folding scalar patterns like [1]---squeeze--->[]
---unsqueeze--->[1].
2. a canonicalizer for aten.view that applies when we can statically or
dynamically (through the scalarized view shapes) infer that it is a
flatten or unflatten op in the last dim.
I'm not sure if this is the right place to be adding such a view
canonicalizer. Catastrophically, there is a decomposition from flatten
and unflatten into aten.view. Until this gets deleted (and it definitely
should be deleted), I felt like this would be an appropriate temporary
home. We run scalarize shapes after lowering to the backend contract
(i.e., decomposing), and scalarize shapes is required to be able to
infer dynamic dims coming from size int ops.
# Description
Implementation of the op for `torch.aten.unfold`: [TorchToLinalg Op
Support #347](https://github.com/nod-ai/SHARK-ModelDev/issues/849)
Documentation of op can be found here: [PyTorch
Docs](https://pytorch.org/docs/stable/generated/torch.Tensor.unfold.html)
For this op, we apply a sliding window of some `size` along a single
`dimension`, with `step` in between iterations.
`Declaration: aten::unfold(Tensor(a) self, int dimension, int size, int
step) -> Tensor(a)`
The resulting `unfolded` tensor modifies the shape of `dimension` to be
equal to the number of blocks that the sliding windows extracts/inserts,
with an additional dimension of `size` appended (the number of cols of
the output tensor directly translates from the size of the sliding
window).
So if we had a tensor of rank 3 (A x B x C), with dimension = 1, size =
2 and step = 2:
(A x B x C) |=> (A x (B - size) // step + 1 x C x size)
After extracting the window from the input tensor, we insert the (1 x
size) slice into the output tensor. We can make this simpler by mapping
the output indices from the input indices, like they do in the official
implementation:
[PyTorch
Code](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/lowering.py#L1694)
Bump forward and refactor inline global slots to no longer track via
symlinks. This appears to make the tests past until we manage to remove
torchscript work.
As titled, create a new decomposition for `aten.fmod.Tensor` to
`aten.div`, `aten.trunc`, `aten.mul` and `aten.sub`. Note that we only
use `aten.trunc` for floating point operations. This further gets
decomposed to `aten.where` etc. by other existing decompositions.
This decomposition now makes TOSA pass for a simple model with
`aten.fmod` while it makes `stablehlo` fail. For now, we disallow this
decomposition for `stablehlo`
---------
Co-authored-by: Srinath Avadhanula <srinath.avadhanula@getcruise.com>
This commit adds the shape info for the tensors created during the
decomposition of GroupNorm op.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
Set PyTorch and TorchVision version to nightly release 2024-08-18.
This commit also updates the `scaled_dot_product_attention` op.
A new attribute `enable_gqa` has been added. As of now, only the
default value for the same is supported.
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
The einsum lowering was missing the behavior for duplicate indices in
the equation. This amounts to a diagonalization along duplicate pairs of
indices in the equation.
This patch adds basic support for lowering graphs with per-channel
quantization. Per-channel quantized ops have to be excluded from
`FuseQuantizedOps` for now but can be used in QDQ quantized form.
Using this patch, we're able to import and execute (on the linalg
backend) graphs with per-channel quantization applied using the "new"
PyTorch 2.0 Export Quantization.
The static uneven divisible AdaptiveAvgPool2d means that although the
input size is not an integer multiple of ouput size, but the kernel and
stride size can also be fixed (not dynamic). The derivation logic of
kernel and stride size is consistent with
torch/_decomp/decomposations.py:adaptive_avg_pool2d as described in the
following:
1. Stride Size
Firstly , derive the start index in each reduce operation according to
the output size (`n`), `start_index = ([0, 1, ..., n - 1] * input_size)
// output_size`. For each index `k`, if `k * (input_size % output_size)
< output_size`, then the current and previous stride keeps the same as
`input_size // output_size`. So suppose `(n-1) * (input_size %
output_size) < output_size`, the stride in the whole AdaptiveAvgPool2d
process keeps static, as `input_size // output_size`.
2. Kernel Size
torch/_decomp/decomposations.py:adaptive_avg_pool2d calculates a static
kernel size when the input/output sizes satisfy either of the two
conditions, `input_size % output_size == 0` or `output_size %
(input_size % output_size) == 0`. Here if `input_size % output_size ==
0`, then the kernel size equals `input_size // output_size`, otherwise
`input_size // output_size + 1.`
This PR adds a conversion in the TorchOnnxToTorch pass for the ONNX
Multinomial operation. It also adds a TorchToLinalg lowering for the
`aten.Multinomial` op and does a light refactor of some repeated code
that generates random floating point numbers in
`TorchToLinalg/Random.cpp`.
This patch adds a few misc pad op related changes:
1. Addresses issue <https://github.com/llvm/torch-mlir/issues/3457>
2. Addresses issue <https://github.com/llvm/torch-mlir/issues/3442>
3. Fixes the padding order for asymmetrically padded onnx.Conv ops
4. Enables passing quantization through those onnx.Conv op pre-paddings
5. Modifies the torch-to-linalg lowering of AtenReplicationPad2d op to
enable support for input rank != 4
Unfortunately, even with all of these changes, the e2e tests for the
ReplicationPad2d still fail the onnx config, since the torch export
procedure for rearranging the pad order is complicated enough that the
padding ints end up not being able to fold back to constants.