Commit Graph

1061 Commits (main)

Author SHA1 Message Date
Giacomo Serafini 46a5772d92
[TorchToLinalg] Add `aten.fft_rfft` and lowering (#3857)
- Add `AtenFftRfftOp` to Torch dialect.
- Add conversion of `AtenFftRfftOp` to Linalg, using a `linalg.matmul`
per output component (real and imaginary). Computing the DFT is
_O(n^2)_.
- Add decomposition of `AtenFftRfftOp` into Torch-level ops (same
paradigm as above).
- Add unit and end-to-end tests.
2024-11-27 10:24:36 -06:00
Giacomo Serafini 44985690a7
[Torch Dialect] Emit `torch.aten.mul.float_int`, add folder and conversion to Arith. (#3750)
Folder is required to simplify the shape calculation of
`torch.aten.__interpolate.size_list_scale_list`:

5eab669c4a/lib/Dialect/Torch/Transforms/AbstractInterpLibrary.cpp (L6900-L6907)

(I've re-run `build_tools/update_abstract_interp_lib.sh`)

---------

Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
2024-11-27 10:23:35 -06:00
jinchen c9ed993603
Support NMS op lowering (#3871)
TODO: support multiple batches and classes
2024-11-26 16:49:56 -08:00
jinchen 7452460aab
Support stash_type attribute for onnx.LayerNormalization (#3888)
Fixes https://github.com/nod-ai/SHARK-ModelDev/issues/888

If stash_type is different from input_dtype/result_dtype:
1. convert x dtype to stash_type
2. calculate mean and var in stash_type since x is in stash_type already
3. convert back to result_dtype before stage two calculation
4. convert mean_dtype and var_dtype if they are different from
stash_type

e2e test added in https://github.com/nod-ai/SHARK-TestSuite/pull/399
2024-11-26 16:47:32 -08:00
jinchen 0a85486375
Add TorchToArith rewrite pattern for AtenGtFloatOp (#3892) 2024-11-26 09:01:19 -08:00
Giacomo Serafini 1b8d7e094b
[Torch Dialect] Add `torch.aten.mul.int_float` (required to simplify shape calculation of `upsample_nearest2d`) (#3764)
As per title. See also
[PR](https://github.com/llvm/torch-mlir/pull/3750) for
`torch.aten.mul.float_int`.

---------

Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
2024-11-21 00:43:06 +08:00
yyp0 bdbc64a205
[TorchToStablehlo] support l1_loss, deg2rad, logit (#3865) 2024-11-18 11:25:00 +08:00
Justin Ngo 95f77817b9
[TOSA] Add reflection and replication pad lowering (#3874)
- Add Torch to TOSA legalization for the following ops:
  + aten.reflection_pad1d
  + aten.reflection_pad2d
  + aten.replication_pad2d
- Update xfail sets with new e2e results
- Add new LIT tests to basic.mlir


Change-Id: I1689d1778d8e472c3317aca1e2425ef8774a07fa

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-11-15 15:19:09 -08:00
Longsheng Mou 0a607a410d
[TorchToLinalg] Use `linalg.transpose` instead of `generic` in `permuteTensor` (#3872)
This PR changes the lowering to use `linalg.transpose` instead of
`linalg.generic` in `torch_to_linalg::permuteTensor`.
2024-11-15 17:13:14 +08:00
Vivek Khandelwal fe2f64919d
[ONNX] Remove kernel shape and weight shape equivalence check from Onnx.Conv lowering (#3869)
This commit removes the equivalence check for kernel shape and weight
shape from the Onnx.conv lowering since those checks seem to be of no
use (not sure why were they part of the lowering in the first place).

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-11-15 10:36:41 +05:30
zjgarvey 1201babb9f
[ONNX] rework some reduction op lowerings (#3870)
- Refactors more "onnx.ReduceXXX" patterns through helper function.
- Fixes bug with iterating unconditionally on `output_dim == 1` during
`dimList` inference.

This change results in passes for the following 11 models:

crossvit_15_240
crossvit_15_dagger_240
crossvit_15_dagger_408
crossvit_18_240
crossvit_18_dagger_240
crossvit_18_dagger_408
crossvit_9_240
crossvit_9_dagger_240
crossvit_base_240
crossvit_small_240
crossvit_tiny_240

---------

Co-authored-by: Vinayak Dev <104419489+vinayakdsci@users.noreply.github.com>
2024-11-14 16:25:28 +00:00
Hanumanth 30c519369e
Support default padding case for tosa::AvgPool in the presence of count_include_pad (#3868)
Essentially, as part of my earlier
[change](7f9f99c6f8)
, I didn't consider the `padding` value while erroring out for
unsupported `count_include_pad` during `torch-to-tosa` lowering for
AvgPool2d. The fix captured in this change addresses this. Please see
[issue](https://github.com/llvm/torch-mlir/issues/3862) for more details
on this.

Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
2024-11-12 13:48:20 -08:00
zjgarvey cd38ecf6c2
Add Scalarization Patterns for `AtenToDtypeOp`, `AtenNegOp`, `AtenRemainderTensorOp` (#3861)
1. adds a lowering for `aten.neg.int` and `aten.remainder.int` to arith.
2. adds a scalarization pattern for `aten.neg` and
`aten.remainder.Tensor` ops.
3. improves folding of `aten.mul.int`
4. adds a scalarization pattern for `aten.to.dtype` which relies on
scalar cast ops and basic C++ casting between `double` and `int64_t`.
5. improves rank-0 case handling for `FoldAtenSplatPattern`
6. removes a bug with `aten.unflatten.int` decomposition incorrectly
generating a constant size int from a dynamic shape.
7. simplifies the dim list for `aten.unflatten.int` ops generated from
the `aten.view` canonicalization in scalarize shapes.

All of these changes were necessary to unblock
<https://github.com/iree-org/iree/issues/18899>.
2024-11-12 14:25:02 -06:00
aldesilv 889a836b3d
OnnxToTorch bicubic interpolation (#3802)
(https://github.com/nod-ai/SHARK-TestSuite/pull/391)
Repro (using SHARK TestSuite):
1. `python run.py --torchtolinalg -m cl-onnx-iree -t cubic_test`

---------

Co-authored-by: zjgarvey <zjgarvey@gmail.com>
2024-11-12 12:54:29 -06:00
Justin Ngo 8eb34dae78
[TOSA] Add promote type to unary ops and aten.cat lowering (#3860)
Change-Id: I2699bf9007723fe629edb1c524c10ef8142e0234

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-11-08 11:23:39 -08:00
Justin Ngo b6f04fa32b
[TOSA] Fix rsub; add clamp.Tensor, avg_pool1d, max_pool1d, prims.collapse (#3855)
- Fix aten.rsub.Scalar legalization with appropriate type casting
- Add legalization for aten.clamp.Tensor
- Resolve some unexpected test failures from PyTorch update by adding
legalization for the following ops:
  + aten.avg_pool1d
  + aten.max_pool1d
  + torch.prims.collapse
- Update xfail_sets with new e2e results
- Add new LIT tests to basic.mlir


Change-Id: I9762c7d36ca0b0f75ca68d0c71d7f5d5309a96ad

---------

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-11-07 14:09:43 -08:00
yyp0 7058f456b8
[Stablehlo] support aten.isfinite (#3850) 2024-11-07 16:52:39 +08:00
Ian Wood e88faf08ff
Create scatter op with unique indicies (#3853)
For the op `index_put_`, if accumulate == false, the behavior is
undefined if the indicies aren't unique
(https://pytorch.org/docs/stable/generated/torch.Tensor.index_put_.html).
So, when converting `AtenIndexPutHackedTwinOp` to a TMTensor scatter op,
mark the indices as unique if when `accumulate == false`.

This should have no functional effect (unless users are relying on UB)
and assuming unique indices has the benefit of unlocking better
optimizations in further compiler stages.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2024-11-05 12:48:34 -08:00
Jiawei Wu b75d0e3f8b
[stablehlo] fix: enhance torch's index-like op lowering to stablehlo's gather/scatter (#3829)
In torch.index_put like ops, `values` is only required to be
broadcastable to `input[indices]`, rather than exact dimension match.
This patch fixes the problem by add additional
stablehlo.dynamic_broadcast_in_dim before creating stablehlo.scatter op.
BTW, this patch also enhance the `getBroadcastResultShape` utility in
hlo namespace.
2024-11-05 19:15:11 +08:00
Justin Ngo 4c1518d365
[TOSA] Add legalization for aten.as_strided (#3848)
- Add Torch to TOSA legalization for aten.as_strided op
- Update xfail_sets with the following:
  + New aten.as_strided results
+ Changes from this commit:
7f9f99c6f8
  + Failed tests from new PyTorch version update
- Add new LIT test to basic.mlir


Change-Id: I6f471ea116ca47f2bf9537b62950fce75a2c624f

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-11-04 09:57:59 -08:00
jinchen 6aa46967b6
Add tosa::getConstTensor with int8_t template (#3845)
Add tosa::getConstTensor with int8_t template used in
https://github.com/llvm/torch-mlir/pull/3827
2024-11-01 21:22:27 +00:00
zjgarvey 3104b66560
Fix Slice Folder OOB Crash and onnx.Shape lowering (#3843)
1. Clamps OOB start index to 0 in slice folder
2. Adds a more descriptive `emitError` in slice folder if the creation
of the `DenseElementsAttr` would fail due to a bad result shape.
3. Fixes the `onnx.Shape` lowering to default to `inputRank` for `end`
instead of `-1`. When `end==-1` the last element was missing when
slicing.
2024-11-01 15:33:21 -05:00
jinchen 39d69db5ca
Cast static/dynamic shape for onnx.If branches to match result type (#3828) 2024-11-01 12:10:59 -07:00
zjgarvey a82ba1c422
[TorchToArith] add lowerings for some scalar bool binary ops (#3823)
Added lit tests since these scalar operations don't trace well through
the `fx_importer` route.

`XOR` and `NE` are equivalent binary operators, so `aten.ne.bool` is
lowered to `arith.xori`.
2024-11-01 10:40:20 -05:00
Xinyu Yang 3dbeda9082
[Stablehlo] fix template typo (#3842)
I think we should use template parameters. @yyp0 @qingyunqu
2024-11-01 21:10:38 +08:00
Hanumanth 7f9f99c6f8
Fix torchToTosa lowering for avgpool2d to handle unsupported parameters (#3822)
The existing TorchToTosa lowering logic for `torch.aten.avg_pool2d`
doesn't handle some unsupported properties well, leading to a silent
wrong answer(SWA) when we go through
`torch-backend-to-tosa-backend-pipeline.` For instance, with the
existing TOSA avgpool2d specification, we can not represent
`count_include_pad` and `divisor_override,` so during TorchToTosa
lowering, we silently ignore these properties which leads to SWA in some
cases—the fix captured in this change errors for unsupported scenarios.

For details on `count_include_pad` and `divisor_override,` please see
the below link.

https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html

---------

Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
2024-11-01 08:25:59 -04:00
jinchen 032a636c35
Fix onnx.If lowering with scalar condition tensor (#3846)
Fixes
https://github.com/nod-ai/SHARK-ModelDev/issues/696#issuecomment-2442016530
2024-10-31 20:34:50 -07:00
Rob Suderman 25738b8c19
[linalg] Broadcast batch for mask on sdpa lowering (#3824)
Attention often broadcasts a mask across the batch dimension as masking
is usually performed the same across attention heads. Added this
materialization to the mask dimensions optionally.
2024-10-31 17:59:24 -07:00
Rob Suderman 5aa323dd29
[linalg] Fix torch.aten.add of `torch.bool` (#3820)
Addition of bools saturate which equates to an `or` operator. Updated to
avoid some noticed downstream failures.
2024-10-31 17:37:25 -07:00
yyp0 9ce2a69703
[Torch] support AtenExp2Op (#3832)
- support AtenExp2Op by decomposing it to aten.pow.scalar
- refine stablehlo pow.scalar pow.Tensor_Scalar pow.Tensor_Tensor
lowering according to https://github.com/llvm/torch-mlir/pull/2983
- Close https://github.com/llvm/torch-mlir/pull/2983
2024-10-31 19:14:05 +08:00
Justin Ngo 4dd213b042
[TOSA] Expand Torch to TOSA legalization coverage (#3827)
- Add/Extend Torch to TOSA legalization for the following ops:
  + Add aten.threshold_backward
  + Fix aten.threshold
  + Re-implement aten.broadcast_to using tosa.reshape and tosa.tile
  + Add support for rank 0 index for aten.index_select
  + Fix aten.index_put.hacked_twin
  + Add aten.uniform
  + Add aten.logical_and
- Update xfail_sets.py with new e2e results
- Add LIT tests to basic.mlir for newly added ops


Change-Id: I8910564a049d18293284fe2e55e82bc1d2cf10e3

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-10-30 16:26:10 -07:00
Sayan Saha 2b01f8b7f3
[Tosa] : Add support for negative indices in index.tensor and index.Tensor_hacked_twin for TorchToTosa lowering. (#3790)
1. Negative indices for tensor indexing is handled by wrapping around
the index values by checking their values at run time. Without the fix,
there was a runtime error.
2. Added a lit test to lock down the behavior.
3. Updated the `xfails_set` for `fx_importer_tosa` config to lockdown
the behavior with e2e test as well.

"THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY."
2024-10-25 15:37:19 -07:00
Sriram Kumar d6feb2179c
Added support for Maxpool (Autopad) (#3774)
Added autopad. and passed 3 tests 

test_maxpool_2d_precomputed_same_upper
test_maxpool_2d_same_lower'
test_maxpool_2d_same_upper

Address : https://github.com/nod-ai/SHARK-ModelDev/issues/843 

2 attributes yet to complete : storage_order, indices output
2024-10-23 13:04:50 +00:00
Felix Schneider aca33f1742
[TorchToLinalg] Use Op with native channel order for quantized conv2d (#3807)
I've upstreamed the necessary quantized linalg Op with the
"channel-first" ordering used by torch
(https://github.com/llvm/llvm-project/pull/107740) for 2d convolution.

This patch changes the lowering for the quantized 2d case of
`aten.convolution` accordingly, which saves three transpositions per
convolution (input, weights, result) and therefore removes the
requirement to try to optimize these away in downstream passes.
2024-10-22 20:26:16 +02:00
David Tanner 02327af998
Adds onnx ConvTranspose support for autopadding. (#3797)
Adds onnx ConvTranspose support for autopadding
(https://github.com/nod-ai/SHARK-ModelDev/issues/839).

- Adds support for attribute auto_pad="SAME_UPPER" or "SAME_LOWER" which
will automatically calculate padding of input based on output shape.
- Adds support, during auto-padding, for output_shape=[H,W] which
overrides the default output shape of input_shape[i]*stride[i] (for
spatial dimensions only).
- Adds lit test for auto-padding.
- Tests are added by https://github.com/nod-ai/SHARK-TestSuite/pull/370


NOTE: ConvTranspose still doesn't support asymmetric padding, therefore
multiple original onnx tests still won't pass.
2024-10-18 12:31:33 -05:00
Justin Ngo 45bb17ebfe
[TOSA] Add legalization for empty, scatter, slice_scatter, diag_embed (#3792)
- Add Torch to TOSA legalization for the following ops:
  + aten.empty.memory_format
  + aten.scatter.src
  + aten.slice_scatter
  + aten.diag_embed
- Update xfail_sets.py with new e2e results
- Update basic.mlir with new LIT tests


Change-Id: I817ecf207bcfcf97ca54f30c10c76c4f0f4145ae

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-10-15 08:38:02 -07:00
Hanumanth04 895f490cf5
Remove checking for training specific parameters in EmbeddingBag lowering (#3782)
Torch-to-linalg pass fails for `EmbeddingBag` when the training only
specific properties of the operator are set to `true.` For instance,
this operator's `sparse` input/property is training-specific, and if the
value of this property is `true,` the existing lowering bails out.
However, we don't need to check for training-specific parameters and
bailout from the legalization since we don't care about these properties
during the eval/inference mode.

---------

Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
2024-10-15 09:37:26 -04:00
yyp0 d0041dc310
[stablehlo] support aten.view.dtype lowering (#3778) 2024-10-10 15:50:17 +08:00
Vivek Khandelwal 94f5410913
[LINALG] Add complex tensor support for `create[Zero|One]InitTensor` utility (#3777)
Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-10-09 16:15:08 +05:30
Stephen Baione d49eabb3fc
Add Op for `torch.aten.unfold` (#3772)
# Description

Implementation of the op for `torch.aten.unfold`: [TorchToLinalg Op
Support #347](https://github.com/nod-ai/SHARK-ModelDev/issues/849)

Documentation of op can be found here: [PyTorch
Docs](https://pytorch.org/docs/stable/generated/torch.Tensor.unfold.html)

For this op, we apply a sliding window of some `size` along a single
`dimension`, with `step` in between iterations.

`Declaration: aten::unfold(Tensor(a) self, int dimension, int size, int
step) -> Tensor(a)`

The resulting `unfolded` tensor modifies the shape of `dimension` to be
equal to the number of blocks that the sliding windows extracts/inserts,
with an additional dimension of `size` appended (the number of cols of
the output tensor directly translates from the size of the sliding
window).

So if we had a tensor of rank 3 (A x B x C), with dimension = 1, size =
2 and step = 2:

    (A x B x C) |=> (A x (B - size) // step + 1 x C x size)

After extracting the window from the input tensor, we insert the (1 x
size) slice into the output tensor. We can make this simpler by mapping
the output indices from the input indices, like they do in the official
implementation:

[PyTorch
Code](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/lowering.py#L1694)
2024-10-08 21:10:43 +00:00
Phaneesh Barwaria 7830c00ca2
onnx.LSTM - bidirectional, layout attr (#3771)
- Support Bidirectional LSTM (utilising the forward LSTM layer with
flipped Inputs and Outputs)
- Support layout 1 
- Support default cases for attr `clip` and `input_forget`
- Support returning partial outputs (1-3)  
- fixes for alt_e2e_tests lstm tests (1,2,3)
2024-10-08 11:29:49 -07:00
jinchen 58489faf7f
torch.aten.squeeze.dim lowering with dynamic dims (#3749)
Address https://github.com/nod-ai/SHARK-ModelDev/issues/846

Assume the dynamic squeezed dim is 1.
2024-10-08 10:37:31 -07:00
Vivek Khandelwal 614fcdd153
[MLIR][TORCH] Add support for 1-d group convolution (#3770)
This commit adds the support for the 1-d depthwise convolution as a
special case of 1-d group convolution.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-10-08 10:48:47 +05:30
Vivek Khandelwal f6721e5999
[MLIR][TORCH] Add support for negative step in aten.slice.Tensor op (#3763)
This commit adds the support for negative step values in
aten.slice.Tensor op. Although, PyTorch does not allow negative step
value for slice op but the Onnx.Slice op supports negative step value
which eventually lowers to torch.aten.slice.Tensor op. Hence, the
support is added for handling those kind of values during the
Torch->Linalg lowering of aten.slice.Tensor op.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-10-08 10:34:27 +05:30
Justin Ngo b08d08682f
[TOSA] Add legalization for fill, flip, and round (#3768)
- Add Torch to TOSA lowering for aten.fill.Scalar/Tensor, aten.flip, and
aten.round
- Fix torchScalarToTosaTensor function to correctly convert Torch scalar
input to TOSA tensor
- Update xfail_sets.py with new e2e results
- Update basic.mlir with LIT tests for new ops


Change-Id: If1e42c2e582710dd8ad0465eed29806fbcdbde41

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-10-07 10:28:26 -07:00
Chi_Liu f4840ed886
[ONNX] Fix onnx.ScatterElements with AtenScatterReduceTwoOp lowering to tm_tensor/linalg_ext dialect (#3754)
- To fix issue onnx.ScatterElements: https://github.com/nod-ai/SHARK-ModelDev/issues/823
- E2E test: https://github.com/nod-ai/SHARK-TestSuite/pull/363
2024-10-05 22:22:41 -07:00
Rob Suderman 53f7532e76
Revert "[TorchToLinalg] perform rank0 elementwise computations outside linalg generic ops (#3762)" (#3767)
Reverted due to downstream model changes. Will reland with fixes post
integration.

This reverts commit 6e8c7bed4b.
2024-10-04 14:48:02 -07:00
Justin Ngo e9ed4af9ce
[TOSA] Add legalization for aten.index_select (#3760)
- Add Torch to TOSA legalization for aten.index_select
- Fix createOneDimTfIndices function in TosaLegalizeCommon.cpp to
correctly convert Torch indices to TF-style indices, which is used in
convertGatherNdOp
- Update e2e tests in xfail_sets.py
- Update basic.mlir with new LIT test for aten.index_select

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
Change-Id: I52519246183949353a3cf22f0a685fe3df8ec8ff

Signed-off-by: Justin Ngo <justin.ngo@arm.com>
2024-10-04 12:24:22 -07:00
Rob Suderman 2374b9e02d
Bump to llvm/llvm-project@e813750354 (#3765)
Includes stablehlo bump
2024-10-04 12:08:35 -07:00
zjgarvey 6e8c7bed4b
[TorchToLinalg] perform rank0 elementwise computations outside linalg generic ops (#3762)
This is motivated by the fact that shapes are stored as tensors in ONNX,
and IREE tries to perform tensor arithmetic on the device. This causes
unnecessary dispatches, and makes it harder for the compiler to reason
about shapes.

Here is a small snippet of torch-IR that is typical seen coming from
ONNX models:

```mlir
module {
  func.func @main_graph(%arg0: !torch.vtensor<[?,?,768],f32>, %arg1: !torch.vtensor<[?,?,768],f32>) -> !torch.vtensor<[],si64> {
    %int0 = torch.constant.int 0
    %0 = torch.vtensor.literal(dense<0> : tensor<1xsi64>) : !torch.vtensor<[1],si64>
    %1 = torch.aten._shape_as_tensor %arg1 : !torch.vtensor<[?,?,768],f32> -> !torch.vtensor<[3],si64>
    %2 = torch.aten.index_select %1, %int0, %0 : !torch.vtensor<[3],si64>, !torch.int, !torch.vtensor<[1],si64> -> !torch.vtensor<[1],si64>
    %3 = torch.aten.squeeze.dim %2, %int0 : !torch.vtensor<[1],si64>, !torch.int -> !torch.vtensor<[],si64>
    %4 = torch.aten.item %3 : !torch.vtensor<[],si64> -> !torch.int
    %5 = torch.aten.eq.int %4, %int0 : !torch.int, !torch.int -> !torch.bool
    %6 = torch.aten.Int.bool %5 : !torch.bool -> !torch.int
    %7 = torch.aten.size.int %arg0, %int0 : !torch.vtensor<[?,?,768],f32>, !torch.int -> !torch.int
    %8 = torch.prim.NumToTensor.Scalar %6 : !torch.int -> !torch.vtensor<[],i1>
    %9 = torch.prim.NumToTensor.Scalar %7 : !torch.int -> !torch.vtensor<[],si64>
    %10 = torch.prim.NumToTensor.Scalar %4 : !torch.int -> !torch.vtensor<[],si64>
    %11 = torch.aten.where.self %8, %9, %10 : !torch.vtensor<[],i1>, !torch.vtensor<[],si64>, !torch.vtensor<[],si64> -> !torch.vtensor<[],si64>
    return %11 : !torch.vtensor<[],si64>
  }
}
```

Without the change in this PR, the result would be:

```mlir
#map = affine_map<() -> ()>
module {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @main_graph(%arg0: tensor<?x?x768xf32>, %arg1: tensor<?x?x768xf32>) -> tensor<i64> {
    %c0_i64 = arith.constant 0 : i64
    %c0 = arith.constant 0 : index
    %dim = tensor.dim %arg1, %c0 : tensor<?x?x768xf32>
    %0 = arith.index_cast %dim : index to i64
    %1 = tensor.empty() : tensor<1xi64>
    %collapsed = tensor.collapse_shape %1 [] : tensor<1xi64> into tensor<i64>
    %2 = linalg.fill ins(%0 : i64) outs(%collapsed : tensor<i64>) -> tensor<i64>
    %extracted = tensor.extract %2[] : tensor<i64>
    %3 = arith.cmpi eq, %extracted, %c0_i64 : i64
    %dim_0 = tensor.dim %arg0, %c0 : tensor<?x?x768xf32>
    %4 = arith.index_cast %dim_0 : index to i64
    %5 = tensor.empty() : tensor<i1>
    %6 = linalg.fill ins(%3 : i1) outs(%5 : tensor<i1>) -> tensor<i1>
    %7 = tensor.empty() : tensor<i64>
    %8 = linalg.fill ins(%4 : i64) outs(%7 : tensor<i64>) -> tensor<i64>
    %9 = linalg.fill ins(%extracted : i64) outs(%7 : tensor<i64>) -> tensor<i64>
    %10 = linalg.generic {indexing_maps = [#map, #map, #map, #map], iterator_types = []} ins(%6, %8, %9 : tensor<i1>, tensor<i64>, tensor<i64>) outs(%7 : tensor<i64>) {
    ^bb0(%in: i1, %in_1: i64, %in_2: i64, %out: i64):
      %11 = arith.select %in, %in_1, %in_2 : i64
      linalg.yield %11 : i64
    } -> tensor<i64>
    return %10 : tensor<i64>
  }
}
```

With the change in this PR, we would instead get:

```mlir
module {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @main_graph(%arg0: tensor<?x?x768xf32>, %arg1: tensor<?x?x768xf32>) -> tensor<i64> {
    %c0_i64 = arith.constant 0 : i64
    %c0 = arith.constant 0 : index
    %dim = tensor.dim %arg1, %c0 : tensor<?x?x768xf32>
    %0 = arith.index_cast %dim : index to i64
    %1 = tensor.empty() : tensor<1xi64>
    %collapsed = tensor.collapse_shape %1 [] : tensor<1xi64> into tensor<i64>
    %2 = linalg.fill ins(%0 : i64) outs(%collapsed : tensor<i64>) -> tensor<i64>
    %extracted = tensor.extract %2[] : tensor<i64>
    %3 = arith.cmpi eq, %extracted, %c0_i64 : i64
    %dim_0 = tensor.dim %arg0, %c0 : tensor<?x?x768xf32>
    %4 = arith.index_cast %dim_0 : index to i64
    %5 = arith.select %3, %4, %extracted : i64
    %6 = tensor.empty() : tensor<i64>
    %7 = linalg.fill ins(%5 : i64) outs(%6 : tensor<i64>) -> tensor<i64>
    return %7 : tensor<i64>
  }
}
```

Some related issues for context:
1. <https://github.com/iree-org/iree/issues/18677>
2. <https://github.com/iree-org/iree/issues/18631>
2024-10-04 11:27:00 -05:00