Commit Graph

79 Commits (revert-2869-AtenSlice-folder)

Author SHA1 Message Date
saienduri bfcf93ea21
Rename torch_mlir.compile APIs and introduce FX based analogs (#2842)
Link to related RFC:
https://discourse.llvm.org/t/rfc-rename-torch-mlir-compile-apis-and-introduce-fx-based-analogs/76646
This commit updates the documentation, tests, CMake files, and API for
the proposed changes in the RFC. There is a new torch_mlir/fx.py for
user level APIs related to importing modules and a corresponding test
for this path can be found at test/python/fx_importer/basic_test.py.

---------

Co-authored-by: MaheshRavishankar <mravisha@amd.com>
2024-02-06 19:07:59 -08:00
Xida Ren (Cedar) cc06391630
AtenSortOp Folder (#2864)
A chunk off

https://github.com/llvm/torch-mlir/pull/2856
https://github.com/llvm/torch-mlir/pull/2860

---------

Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
2024-02-06 21:12:12 +00:00
Dave Liddell 1cb14f6879
Rob's atenTensor folder (#2867)
If a tensor is initialized by a list with a single constant integer,
this folder turns it into a torch.vtensor.literal

---------

Co-authored-by: Dave Liddell <dliddell@xilinx.com>
2024-02-05 17:10:42 -08:00
Rob Suderman 041a54ae0c
[torch] Supporting `torch.aten.mul.float` lowering to `arith` (#2833)
Simple missing scalar operation for multiply floats was missing.
2024-02-05 16:23:04 -08:00
Xida Ren (Cedar) 24b8c8672a
[torch] Add folders for `torch.fill`, `torch.ones`, `torch.zeros` and `aten.getItem` (#2849)
So that the CumSum Op in OPT can get the constant that it requires to be lowered to TMTensor

---------

Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
2024-02-02 10:46:33 -08:00
Rob Suderman 34f6948533
[torch] Support `!countIncludePad` when unpadded for average pool (#2836)
We do not support average pool when `countIncludePad is set to false.
However if the input is unpadded then the setting of the boolean is
unneeded. Extended use by checking if padding is zero before rejecting
the lowering.
2024-01-31 15:09:36 -08:00
Rob Suderman 0114a570e3
[torch] Support lowering `torch.item` to `tensor.extract` (#2835)
Extracting scalar values from tensors can be implemented via a lowering
to tensor.extract.
2024-01-31 15:09:12 -08:00
Ilija Kalinić 54ef18c556
Implement lowering of torch.aten.lerp.Scalar (#2773)
Closes nod-ai/SHARK-Turbine#356
2024-01-31 09:39:38 -08:00
Stella Laurenzo 7301aa80fd
Enable -Werror in lib/ and LTC. (#2841)
Required some massaging of LTC to make it warning clean, and I had to
manually disable some warnings on the generated source files (which we
don't control).

The project is warning clean now.

The `-Werror` flag is disabled by default as we can't control everywhere
people will try to build/install. The CI enables it via
-DTORCH_MLIR_ENABLE_WERROR_FLAG=ON.
2024-01-30 23:33:21 -08:00
Yuanqiang Liu d778950f45
[Torch Dialect] add fold pattern for aten.clone (#2804) 2024-01-31 09:43:21 +08:00
Rob Suderman 25a5a22cbd
[torch] Support `torch.convolution` quantized lowering to `linalg` (#2811)
Linalg has quantized specific operations. We can lower to these
operations when there is a known zeropoint and scale operations. This
allows the `convolution` to occur with lower bitwidth's, improving the
overall performance.
2024-01-30 13:46:47 -08:00
James Newling 1e882f5803
Additional information in error message (#2783)
See change in test for what the new message looks like.
2024-01-30 08:28:08 -08:00
Quinn Dawkins 494089d53d
Clang format refresh (#2812)
After noticing a number of commits with unrelated formatting changes,
I think something was changed with clang-format at one point and we're
seeing a number of unrelated changes. Doing a refresh can help avoid
this.

The changes made here came from
```
find lib -iname *.h -o -iname *.cpp  | xargs clang-format -i --style=llvm
find include -iname *.h -o -iname *.cpp  | xargs clang-format -i --style=llvm
find projects -iname *.h -o -iname *.cpp  | xargs clang-format -i --style=llvm
```
2024-01-29 12:59:33 -05:00
Stella Laurenzo 77c14ab22b
[ci] Upgrade to new runners and disable unsupported jobs. (#2818)
Per the RFC and numerous conversations on Discord, this rebuilds the
torch-mlir CI and discontinues the infra and coupling to the binary
releases
(https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371).

I iterated on this to get latency back to about what it was with the old
(much larger and non-ephemeral) runners: About 4m - 4.5m for an
incremental change.

Behind the scenes changes:

* Uses a new runner pool operated by AMD. It is currently set to manual
scaling and has two runners (32-core, 64GiB RAM) while we get some
traction. We can either fiddle with some auto-scaling or use a schedule
to give it an increase during certain high traffic hours.
* Builds are now completely isolated and cannot have run-to-run
interference like we were getting before (i.e. lock file/permissions
stuff).
* The GHA runner is installed directly into a manylinux 2.28 container
with upgraded dev tools. This eliminates the need to do sub-invocations
of docker on Linux in order to run on the same OS that is used to build
wheels.
* While not using it now, this setup was cloned from another project
that posts the built artifacts to the job and fans out testing. Might be
useful here later.
* Uses a special git cache that lets us have ephemeral runners and still
check out the repo and deps (incl. llvm) in ~13s.
* Running in an Azure VM Scale Set.

In-repo changes:

* Disables (but does not yet delete):
  * Old buildAndTest.yml jobs
  * releaseSnapshotPackage.yml
* Adds a new `ci.yml` pipeline and scripts the steps in `build_tools/ci`
(by decomposing the existing `build_linux_packages.sh` for in-tree
builds and modularizing it a bit better).
* Test framework changes:
* Adds a `TORCH_MLIR_TEST_CONCURRENCY` env var that can be used to bound
the multiprocess concurrency. Ended up not using this in the final
version but is useful to have as a knob.
* Changes the default concurrency to `nproc * 0.8 + 1` vs `nproc * 1.1`.
We're running on systems with significantly less virtual memory and I
did a bit of fiddling to find a good tradeoff.
* Changed multiprocess mode to spawn instead of fork. Otherwise, I was
getting instability (as discussed on discord).
* Added MLIR configuration to disable multithreaded contexts globally
for the project. Constantly spawning `nproc * nproc` threads (more than
that actually) was OOM'ing.
* Added a test timeout of 5 minutes. If a multiprocess worker crashes,
the framework can get wedged indefinitely (and then will just be reaped
after multiple hours). We should fix this, but this at least keeps the
CI pool from wedging with stuck jobs.

Functional changes needing followup:

* No matter what I did, I couldn't get the LTC tests to work, and I'm
not 100% sure they were being run in the old setup as the scripts were a
bit twisty. I disabled them and left a comment.
* Dropped out-of-tree build variants. These were not providing much
signal and increase CI needs by 50%.
* Dropped MacOS and Windows builds. Now that we are "just a library" and
not building releases, there is less pressure to test these commit by
commit. Further, since we bump torch-mlir to known good commits on these
platforms, it has been a long time since either of these jobs have
provided much signal (and they take ~an hour+ to run). We can add them
back later post-submit if ever needed.
2024-01-27 18:35:45 -08:00
Rob Suderman 2ef228328f
[torch] `torch.dequantize` for per channel tensors to` linalg` (#2769)
Support a lowering for dequantization for per channel tensors from
`torch` dialect to a linalg decomposition. Tested via a numerical
`torch` test.
2024-01-25 16:40:21 -08:00
Rob Suderman f6f890520b
[torch][quant] Quantized `torch.mm` for linalg with end-to-end test (#2750)
This includes custom op matching for decomposed operations and fusing
dequantization into dense operations. As a validation we compare
to the dequant+mm torch implementation.
2024-01-24 14:02:50 -08:00
zjgarvey c531f5495b
AtenAdaptiveMaxPool2d Conversion to Linalg (#2779)
The logic here is very similar to the conversion for AdaptiveAvgPool1d
#2661 with a few modifications:

1. buffVal = -inf instead of 0
2. the main linalg generic op accumulates a max, instead of a sum, to
the first output tensor
3. avg pooling requires dividing the sum pool by the kernel width, which
we stored as an auxilliary tensor (kSizeTensor). Here, the auxiliary
tensor will be recording the indices. Strangely enough, the only
signature available for this function is to return indices, and it
appears that they must be computed whether the user desires them or not.
See
[pytorch/torch/nn/functional.py](https://github.com/pytorch/pytorch/blob/main/torch/nn/functional.py#L1174).

Before writing other adaptive pooling conversions, the logic of this
decomposition should be rolled into a helper function that will work for
both max and avg pooling ops. Even the auxiliary tensor should likely be
automated. This code was written in a slightly more tedious way than
strictly necessary (often using loops to fill SmallVectors up to rank-2,
which is only two in this case), in order to more easily facilitate the
transition to a helper function.
2024-01-24 09:09:56 -08:00
Xida Ren (Cedar) ccaac85788
implement aten.conv1d, aten.conv3d, and aten.conv_tbc (#2757)
convolution with [time,batch,channel] ordering, as opposed to the
default [batch, channel, time]. Currently implementing by transposing
the input and output, but may need to get its own implementation in the
future because this is supposed to be an op that gives a speedup. This
is used by fairseq
(https://github.com/facebookresearch/fairseq/issues/172).

(in case you were wondering like me, this is different from transposed
convolution. Transposed convolution has fractional strides).

---------

Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
Co-authored-by: Frederik Harwath <frederik.harwath@amd.com>
2024-01-23 21:30:03 -08:00
Franz Haniel b9806cfa38
[TorchToLinalg] Add lowering for torch.aten.diagonal (#2632) 2024-01-22 12:47:13 -05:00
John Wu 704cfdaf08
Add aten.pool_max3d support to torch-to-linalg (#2735)
Added verification logic to the abstract_interpreter_lib_gen.py

Also made some unit tests

Initially, I thought we can use `linalg::pooling_ndhwc_max` to help
implement this problem. However, on a 5-dimensional matrix it does the
pooling on dimensions (2, 3, 4) which is not what we want. We want
pooling on dimensions (3, 4, 5).

To achieve this, we would need to lower our code using the `linalg`
dialect.


Turns out the pooling code in `linalg` looks like this.

```
func @max_pooling_ncdhw(%I: memref<?x?x?x?x?xf32>, %K: memref<3xindex>, %O: memref<?x?x?x?x?xf32>,
                        %strides: memref<3xindex>, %dilations: memref<3xindex>) {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %N = memref.dim %I, %c0 : memref<?x?x?x?x?xf32>
    %C = memref.dim %I, %c1 : memref<?x?x?x?x?xf32>
    %D = memref.dim %I, 2 : memref<?x?x?x?x?xf32>
    %H = memref.dim %I, 3 : memref<?x?x?x?x?xf32>
    %W = memref.dim %I, 4 : memref<?x?x?x?x?xf32>

    %kernel_d = memref.load %K[%c0] : memref<3xindex>
    %kernel_h = memref.load %K[%c1] : memref<3xindex>
    %kernel_w = memref.load %K[2] : memref<3xindex>
    %stride_d = memref.load %strides[%c0] : memref<3xindex>
    %stride_h = memref.load %strides[%c1] : memref<3xindex>
    %stride_w = memref.load %strides[2] : memref<3xindex>
    %dilation_d = memref.load %dilations[%c0] : memref<3xindex>
    %dilation_h = memref.load %dilations[%c1] : memref<3xindex>
    %dilation_w = memref.load %dilations[2] : memref<3xindex>

    linalg.generic {
        indexing_maps = [
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d * %stride_d + kd * %dilation_d, h * %stride_h + kh * %dilation_h, w * %stride_w + kw * %dilation_w)>,  // Map for input tensor
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (kd, kh, kw)>,                                              // Map for kernel tensor
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d, h, w)>                                            // Map for output tensor
        ],
        iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"],
        doc = "3D Max Pooling NCDHW with Strides, Dilations, and Kernel Size"
    } ins(%I, %K : memref<?x?x?x?x?xf32>, memref<3xindex>) outs(%O : memref<?x?x?x?x?xf32>) {
        ^bb0(%input_elem: f32, %kernel_elem: index, %output_elem: f32):
            %max_val = arith.maxf %input_elem, %output_elem : f32
            linalg.yield %max_val : f32
    }
    return
}

```

This was implemented based on it's source code with the adjustments
mentioned above:

4ca1b5e094/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml (L5647)

Issues related to this can be found here

https://github.com/nod-ai/SHARK-Turbine/issues/324
2024-01-19 21:09:46 +05:30
Ilija Kalinić faa4517e83
Implement lowering of torch.aten.remainder.Tensor (#2763)
Closes nod-ai/SHARK-Turbine#349
2024-01-19 18:09:08 +05:30
Ze Zhang 77a03f2069
torch-to-tosa lowering support for AtenLinalgVectorNormOp (#2734)
This PR add torch-to-tosa lowering support for AtenLinalgVectorNormOp

e2e test:
python -m e2e_testing.main --config=tosa

LIT tests:
cmake --build build --target tools/torch-mlir/all

---------

Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
2024-01-18 12:32:23 -08:00
Sungsoon Cho a8538e1e3f
Decompose AtenNormalFunctionalOp into AtenRandn* and other arithmetic. (#2737) 2024-01-15 22:49:29 -08:00
lonely eagle f85e5c932b
[Torch Dialect] support aten.isneginf, aten.isposinf, aten.nan_to_num (#2743) 2024-01-16 14:29:34 +08:00
James Newling f78ec78ac8
Adjust bound check to be the same as PyTorch native (i.e. stricter) (#2755)
prims.expand expects the start and end dimensions to be strictly less
than the rank of the tensor.
2024-01-15 11:44:45 -08:00
lisaliu1 09421b1cf3
[TorchToLinalg] Add lowering for aten.replication_pad2d (#2715)
Co-authored-by: Lisa Liu <lingl@xilinx.com>
2024-01-15 14:02:27 -05:00
Rob Suderman dc37616d67
[torch][quant] Support quantize and dequantize for torch (#2731)
Handle both `torch.dequantize` and `torch.quantize_per_tensor` including
the op based quantization parameter tracking. This includes adding
`qint32` to torch types as it was missing during the initial type
inclusion.

For testing we only have `torch.int8` and `torch.float` types on
function boundaries as the `qint8` types require passing the scale
and zero point quantization information which is not supported yet.
2024-01-12 19:11:14 -08:00
Ze Zhang 670a99ae19
Handle torch.none type in tosa.clamp op (#2739)
This PR updates the torch-to-tosa conversion with following changes:

- Support torch.none as min/max input argument for tosa.clamp op
- Support negative value as start index for tosa.slice op
- Add tosa.logical_or lowering support

e2e test:
python -m e2e_testing.main --config=tosa

LIT tests:
cmake --build build --target tools/torch-mlir/all

---------

Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
2024-01-11 10:36:48 -08:00
Ilija Kalinić e1a86e480a
Implement lowering of torch.aten.logit (#2697)
Closes nod-ai/SHARK-Turbine#290
2024-01-11 20:25:42 +05:30
Frederik Harwath 0860c41ee2 Implement aten.reflection_pad2d lowering to linalg 2024-01-10 21:32:22 -10:00
zjgarvey 07d0645f64
[RFC] general support for Adaptive Pooling Ops (#2661)
Adaptive pooling ops can only be decomposed into their non-adaptive
counterparts in trivial cases.

For example, the current decomposition for AtenAdaptiveAvgPool1dOp in
DecomposeComplexOps.cpp supports outSize = inSize (i.e., do literally
nothing), and outSize = 1 (i.e., do a batched average).

The reason adaptive pooling ops are difficult to lower to linalg is that
they are not constantly strided. They are computed by taking an input
tensor of shape (N, C, Hin), and an output size Hout, and computing the
output tensor at position (n,c, h) in the following way:

1. compute st(h) = (h*Hin)//Hout
2. compute en(h) = 1 + ((h+1)*Hin -1)//Hout
3. apply a computation (max or avg) to the slice: INPUT[n, c,
st(h):en(h)]

The provided sample implementation (for ConvertAtenAdaptiveAvgPool1dOp)
uses tensor.extract to access the input tensor inside the payload of a
linalg generic op. This is likely an unattractive use of linalg generic
ops, which is why I am asking for some more targeted feedback on the
validity of this approach before attempting to support the many other
adaptive pooling ops.

Specifically:

- Is the performance of this implementation bad enough to warrant
targeting different dialects entirely? e.g. TMtensor/linalg ext/ etc.
- If the provided implementation is of acceptable performance to the
community, then is it permissable to remove the Adaptive pooling
decompositions from DecomposeComplexOps.cpp? Based on the current
structure of the -torch-decompose-complex-ops pass, it does not seem
possible to only decompose the adaptive ops in special cases (it seems
to get stuck in an infinite loop on a match failure). I would be happy
to instead incorporate the case logic into the conversion directly, and
remove the decompositions once they are rendered completely obsolete.

As long as this approach is acceptable, I can clean up the
implementation with some helper functions, and quickly add support for
each of the remaining Adaptive pooling ops.
2024-01-09 11:14:10 -08:00
Rob Suderman 985e7796a4
[linalg] Added `aten.clamp` support with integers to `torch-to-linalg` (#2718)
The lowering for `aten.clamp` did not support integer types. Added
support for integer types including a signed integer test.
2024-01-05 15:16:49 -08:00
Aart Bik aa7e95f7c8
[torch-mlir] remove trailing whitespace from e2e test files (#2727) 2024-01-04 14:09:12 -08:00
Aart Bik 3e9bacdb51
[torch-mlir] update e2e test class documentation (#2722)
The doc seems copy-and-paste from the linalg-on-tensors class
2024-01-03 16:10:50 -08:00
Vivek Khandelwal 690827fe52 build: manually update PyTorch version
Set PyTorch and TorchVision version to nightly release 2024-01-02.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-01-03 11:47:12 +05:30
kumardeepakamd 9adad9bc40
Add support for reflection_pad1d (#2706)
Adds a lowering to Linalg for reflection_pad1d. Based on ideas/code from draft PR
https://github.com/llvm/torch-mlir/pull/2693.

---------

Co-authored-by: Kumar Deepak <kumar@xilinx.com>
2024-01-02 14:05:11 -05:00
Xida Ren (Cedar) 6660a26594
lower torch.aten.isinf to linalg (#2638)
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
2023-12-28 17:20:32 -08:00
Sungsoon Cho 8e389ff2ff
Implement lowering of torch.aten.exponential (#2680)
https://github.com/llvm/torch-mlir/issues/2646

Decompose aten.exponential() into: -exp(1-x)/lambda
2023-12-27 20:33:18 -08:00
Stella Laurenzo 1b40b6384e
[onnx] Add torch-mlir-import-onnx native port as an optional tool/library. (#2694)
As noted in the plan when this work started, we need to produce an ORT
EP plugin for a downstream project, and this will necessitate a C-based
ONNX importer (as opposed to the existing Python one). Because this
comes with dependencies that we do not want to impart on various
projects, this is optional in torch-mlir. It is also factored so that it
can be used as standalone sources in downstreams that need it. Since it
only depends on public C APIs on the MLIR side, this will make build
coupling a lot better (since a C++ dep is not needed on the compiler and
it is trivial to dynamically load).

Our original plan was just to maintain this fork off to the side in our
ORT plugin, but once work started, it seemed better to write it clean
and contribute it upstream for anyone to use. We expect that for non-ORT
use, the Python importer will have better ergonomics for most folks.

I will follow-up with a test suite refactor so that we can drive the
Python or C importer.

This is a relatively mechanical port from Python to C, borrowing some
scaffolding from the old JitIR importer. It does attempt to lay some
groundwork for external data, which will need to be implemented on the
Python side as well.
2023-12-27 12:13:34 -08:00
Rik Huijzer 8328998172
Allow printing all IR in `torch_mlir.compile` (#2669)
This PR adds the `enable_ir_printing` option to `torch_mlir.compile`,
which can be used to print the IR for all intermediate passes.

When running the added test file via:
```shell
$ python test/python/compile.py 2> tiny.stderr
```
the file `tiny.stderr` is about 700 KB.
2023-12-20 15:08:21 -06:00
Rob Suderman 11cc92d4ab
[onnx] Lowerings from `onnx.tan` (#2642)
Started work on the `tan` lowerings for ONNX to Torch. Uses `sin` and
`cos` to represent a `tan`.
2023-12-20 10:09:39 -08:00
Rob Suderman 61888690bb
[onnx] Add support for `onnx.sinh` (#2643)
Adds a lowering from `onnx.sinh` to `aten.sinh`. This includes adding
the `aten.sinh` operator.
2023-12-15 21:23:51 -08:00
Quinn Dawkins 030b0140d4
[TorchToLinalg] Lower aten.cat to tensor.concat (#2650)
This replaces the lowering of aten.cat with tensor.concat, allowing more
efficient handling of concatenations in downstream flows. The refbackend
populates concat decomposition patterns that can be used to recover the
previous lowering.
2023-12-15 15:45:32 -05:00
Sungsoon Cho 55e9401c5c
Implement lowering of aten.cosh op. (#2635) 2023-12-15 11:19:26 -08:00
JianzheXiao 6ddeb1a6ef
[torch] Add support for aten.selu (#2640)
Add `aten.selu` operation to `torch` dialect.
2023-12-13 20:28:08 -08:00
JianzheXiao 7cf52ae73f
[Torch Dialect]Add Support for AtenGroupNormOp and AtenNativeGroupNormOp (#2591)
Co-authored-by: LiuYuanqiang <liuyuanqiang.yqliu@bytedance.com>
2023-12-13 11:05:12 +08:00
Stella Laurenzo 74f7a0c9d6
Upstream the ONNX importer. (#2636)
This is part 1 of 2, which will also include upstreaming the FX
importer. I started with ONNX because it forces some project layout
updates and is more self contained/easier as a first step.

Deviating somewhat from the RFCs on project layout, I made the following
decisions:

* Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks
already has opened up that namespace and it seemed to fit. Better to
have fewer things at that level.
* Setup the build so that the root project only contains MLIR Python and
pure Python deps (like the importers), but this can be augmented with
the `projects/` adding more depending on which features are enabled.
* The default build continues to build everything whereas in
`TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a
`torch-mlir-core` wheel with the pure contents only.

`onnx_importer.py` and `importer_smoke_test.py` are almost verbatim
copies from SHARK-Turbine. I made some minor local alterations to adapt
to paths and generalize the way they interact with the outer project. I
expect I can copy these back to Turbine verbatim from here. I also
updated the license boilerplate (they have the same license but slightly
different project norms for the headers) but retained the correct
copyright.

Other updates:

* Added the ONNX importer unit test (which also can generate test data)
in lit, conditioned on the availability of the Python `onnx` package. In
a followup once I know everything is stable, I'll add another env var
that the CI can set to always enable this so we know conclusively if
tests pass.
* Moved the ONNX conversion readme to `docs/`.
* Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` ->
`TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the
JitIR importer and LTC options `cmake_dependent_options` for robustness.
2023-12-12 19:02:51 -08:00
Eric Kunze f67249d34f
Sort the TOSA passing test list (#2630)
For easier tracking of issues, sort the TOSA passing list. It is still
significantly smaller then the XFAIL list would be.

Resolves #2620, at least until the xfail list gets smaller than the
passing list.

Signed-off-by: Eric Kunze <eric.kunze@arm.com>
2023-12-12 14:22:25 -08:00
Frederik Harwath b656c674ee Implement e2e support for aten.acos op
This depends on a change in the LLVM core repository which adds acos
support to the MLIR Math dialect.
2023-12-12 10:52:02 +01:00
Sambhav Jain 7acabafd84
Remove folder from `AtenStackOp` for single element list inputs (#2626)
`AtenStackOp` defines this folder for list operand containing single
element:
```
OpFoldResult AtenStackOp::fold(FoldAdaptor adaptor) {
  auto list = getOperand(0).getDefiningOp<PrimListConstructOp>();
  if (!list || !list->hasOneUse() || list.getElements().size() != 1)
    return nullptr;
  return list.getElements()[0];
}
```
However, unlike `AtenCatOp`, `AtenStackOp` cannot be folded away for
single element list operand because the result from a stack operation
contains an additional dimension (of size 1, like expand_shape).

This PR removes the `AtenStackOp::fold` method, and adds an e2e test for
single element list input case, which fails on current `main` as
follows:
```
Unexpected outcome summary: (linalg)                                                                                                                                                                   
                                                                                                                                                                                                       
****** Failed tests - 1 tests                                                                                                                                                                          
    FAIL - "TensorsStackSingleElementListModule_basic"                                                                                                                                                 
        @ trace item #0 - call to "forward"                                                                                                                                                            
        @ output of call to "forward"                                                                                                                                                                  
        ERROR: shape (torch.Size([10, 32])) is not equal to golden shape (torch.Size([10, 1, 32]))     
```
Thanks Chris Lalau Keraly for the bug report.
2023-12-11 10:52:50 -08:00