Commit Graph

183 Commits (99511cef82997fd9faf6c5d2be4b932a37ec0f96)

Author SHA1 Message Date
Rob Suderman 25a5a22cbd
[torch] Support `torch.convolution` quantized lowering to `linalg` (#2811)
Linalg has quantized specific operations. We can lower to these
operations when there is a known zeropoint and scale operations. This
allows the `convolution` to occur with lower bitwidth's, improving the
overall performance.
2024-01-30 13:46:47 -08:00
Rob Suderman 2ef228328f
[torch] `torch.dequantize` for per channel tensors to` linalg` (#2769)
Support a lowering for dequantization for per channel tensors from
`torch` dialect to a linalg decomposition. Tested via a numerical
`torch` test.
2024-01-25 16:40:21 -08:00
Rob Suderman f6f890520b
[torch][quant] Quantized `torch.mm` for linalg with end-to-end test (#2750)
This includes custom op matching for decomposed operations and fusing
dequantization into dense operations. As a validation we compare
to the dequant+mm torch implementation.
2024-01-24 14:02:50 -08:00
Xida Ren (Cedar) ccaac85788
implement aten.conv1d, aten.conv3d, and aten.conv_tbc (#2757)
convolution with [time,batch,channel] ordering, as opposed to the
default [batch, channel, time]. Currently implementing by transposing
the input and output, but may need to get its own implementation in the
future because this is supposed to be an op that gives a speedup. This
is used by fairseq
(https://github.com/facebookresearch/fairseq/issues/172).

(in case you were wondering like me, this is different from transposed
convolution. Transposed convolution has fractional strides).

---------

Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
Co-authored-by: Frederik Harwath <frederik.harwath@amd.com>
2024-01-23 21:30:03 -08:00
John Wu 704cfdaf08
Add aten.pool_max3d support to torch-to-linalg (#2735)
Added verification logic to the abstract_interpreter_lib_gen.py

Also made some unit tests

Initially, I thought we can use `linalg::pooling_ndhwc_max` to help
implement this problem. However, on a 5-dimensional matrix it does the
pooling on dimensions (2, 3, 4) which is not what we want. We want
pooling on dimensions (3, 4, 5).

To achieve this, we would need to lower our code using the `linalg`
dialect.


Turns out the pooling code in `linalg` looks like this.

```
func @max_pooling_ncdhw(%I: memref<?x?x?x?x?xf32>, %K: memref<3xindex>, %O: memref<?x?x?x?x?xf32>,
                        %strides: memref<3xindex>, %dilations: memref<3xindex>) {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %N = memref.dim %I, %c0 : memref<?x?x?x?x?xf32>
    %C = memref.dim %I, %c1 : memref<?x?x?x?x?xf32>
    %D = memref.dim %I, 2 : memref<?x?x?x?x?xf32>
    %H = memref.dim %I, 3 : memref<?x?x?x?x?xf32>
    %W = memref.dim %I, 4 : memref<?x?x?x?x?xf32>

    %kernel_d = memref.load %K[%c0] : memref<3xindex>
    %kernel_h = memref.load %K[%c1] : memref<3xindex>
    %kernel_w = memref.load %K[2] : memref<3xindex>
    %stride_d = memref.load %strides[%c0] : memref<3xindex>
    %stride_h = memref.load %strides[%c1] : memref<3xindex>
    %stride_w = memref.load %strides[2] : memref<3xindex>
    %dilation_d = memref.load %dilations[%c0] : memref<3xindex>
    %dilation_h = memref.load %dilations[%c1] : memref<3xindex>
    %dilation_w = memref.load %dilations[2] : memref<3xindex>

    linalg.generic {
        indexing_maps = [
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d * %stride_d + kd * %dilation_d, h * %stride_h + kh * %dilation_h, w * %stride_w + kw * %dilation_w)>,  // Map for input tensor
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (kd, kh, kw)>,                                              // Map for kernel tensor
            affine_map<(n, c, d, h, w, kd, kh, kw) -> (n, c, d, h, w)>                                            // Map for output tensor
        ],
        iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"],
        doc = "3D Max Pooling NCDHW with Strides, Dilations, and Kernel Size"
    } ins(%I, %K : memref<?x?x?x?x?xf32>, memref<3xindex>) outs(%O : memref<?x?x?x?x?xf32>) {
        ^bb0(%input_elem: f32, %kernel_elem: index, %output_elem: f32):
            %max_val = arith.maxf %input_elem, %output_elem : f32
            linalg.yield %max_val : f32
    }
    return
}

```

This was implemented based on it's source code with the adjustments
mentioned above:

4ca1b5e094/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml (L5647)

Issues related to this can be found here

https://github.com/nod-ai/SHARK-Turbine/issues/324
2024-01-19 21:09:46 +05:30
Ze Zhang 77a03f2069
torch-to-tosa lowering support for AtenLinalgVectorNormOp (#2734)
This PR add torch-to-tosa lowering support for AtenLinalgVectorNormOp

e2e test:
python -m e2e_testing.main --config=tosa

LIT tests:
cmake --build build --target tools/torch-mlir/all

---------

Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
2024-01-18 12:32:23 -08:00
Sungsoon Cho a8538e1e3f
Decompose AtenNormalFunctionalOp into AtenRandn* and other arithmetic. (#2737) 2024-01-15 22:49:29 -08:00
lonely eagle f85e5c932b
[Torch Dialect] support aten.isneginf, aten.isposinf, aten.nan_to_num (#2743) 2024-01-16 14:29:34 +08:00
Rob Suderman dc37616d67
[torch][quant] Support quantize and dequantize for torch (#2731)
Handle both `torch.dequantize` and `torch.quantize_per_tensor` including
the op based quantization parameter tracking. This includes adding
`qint32` to torch types as it was missing during the initial type
inclusion.

For testing we only have `torch.int8` and `torch.float` types on
function boundaries as the `qint8` types require passing the scale
and zero point quantization information which is not supported yet.
2024-01-12 19:11:14 -08:00
Ze Zhang 670a99ae19
Handle torch.none type in tosa.clamp op (#2739)
This PR updates the torch-to-tosa conversion with following changes:

- Support torch.none as min/max input argument for tosa.clamp op
- Support negative value as start index for tosa.slice op
- Add tosa.logical_or lowering support

e2e test:
python -m e2e_testing.main --config=tosa

LIT tests:
cmake --build build --target tools/torch-mlir/all

---------

Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
2024-01-11 10:36:48 -08:00
Ilija Kalinić e1a86e480a
Implement lowering of torch.aten.logit (#2697)
Closes nod-ai/SHARK-Turbine#290
2024-01-11 20:25:42 +05:30
zjgarvey 07d0645f64
[RFC] general support for Adaptive Pooling Ops (#2661)
Adaptive pooling ops can only be decomposed into their non-adaptive
counterparts in trivial cases.

For example, the current decomposition for AtenAdaptiveAvgPool1dOp in
DecomposeComplexOps.cpp supports outSize = inSize (i.e., do literally
nothing), and outSize = 1 (i.e., do a batched average).

The reason adaptive pooling ops are difficult to lower to linalg is that
they are not constantly strided. They are computed by taking an input
tensor of shape (N, C, Hin), and an output size Hout, and computing the
output tensor at position (n,c, h) in the following way:

1. compute st(h) = (h*Hin)//Hout
2. compute en(h) = 1 + ((h+1)*Hin -1)//Hout
3. apply a computation (max or avg) to the slice: INPUT[n, c,
st(h):en(h)]

The provided sample implementation (for ConvertAtenAdaptiveAvgPool1dOp)
uses tensor.extract to access the input tensor inside the payload of a
linalg generic op. This is likely an unattractive use of linalg generic
ops, which is why I am asking for some more targeted feedback on the
validity of this approach before attempting to support the many other
adaptive pooling ops.

Specifically:

- Is the performance of this implementation bad enough to warrant
targeting different dialects entirely? e.g. TMtensor/linalg ext/ etc.
- If the provided implementation is of acceptable performance to the
community, then is it permissable to remove the Adaptive pooling
decompositions from DecomposeComplexOps.cpp? Based on the current
structure of the -torch-decompose-complex-ops pass, it does not seem
possible to only decompose the adaptive ops in special cases (it seems
to get stuck in an infinite loop on a match failure). I would be happy
to instead incorporate the case logic into the conversion directly, and
remove the decompositions once they are rendered completely obsolete.

As long as this approach is acceptable, I can clean up the
implementation with some helper functions, and quickly add support for
each of the remaining Adaptive pooling ops.
2024-01-09 11:14:10 -08:00
Vivek Khandelwal 690827fe52 build: manually update PyTorch version
Set PyTorch and TorchVision version to nightly release 2024-01-02.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2024-01-03 11:47:12 +05:30
Xida Ren (Cedar) 6660a26594
lower torch.aten.isinf to linalg (#2638)
Co-authored-by: Rob Suderman <rob.suderman@gmail.com>
2023-12-28 17:20:32 -08:00
Sungsoon Cho 8e389ff2ff
Implement lowering of torch.aten.exponential (#2680)
https://github.com/llvm/torch-mlir/issues/2646

Decompose aten.exponential() into: -exp(1-x)/lambda
2023-12-27 20:33:18 -08:00
JianzheXiao 6ddeb1a6ef
[torch] Add support for aten.selu (#2640)
Add `aten.selu` operation to `torch` dialect.
2023-12-13 20:28:08 -08:00
JianzheXiao 7cf52ae73f
[Torch Dialect]Add Support for AtenGroupNormOp and AtenNativeGroupNormOp (#2591)
Co-authored-by: LiuYuanqiang <liuyuanqiang.yqliu@bytedance.com>
2023-12-13 11:05:12 +08:00
Eric Kunze f67249d34f
Sort the TOSA passing test list (#2630)
For easier tracking of issues, sort the TOSA passing list. It is still
significantly smaller then the XFAIL list would be.

Resolves #2620, at least until the xfail list gets smaller than the
passing list.

Signed-off-by: Eric Kunze <eric.kunze@arm.com>
2023-12-12 14:22:25 -08:00
JianzheXiao 96fcde4d77
[Torch Dialect] Support Einsum Op (#2230)
As title, support torch.aten.einsum op

Right now only support Static Shape, because of the known issue, the
fixed solution is here: https://github.com/llvm/torch-mlir/pull/2154

Co-authored-by: Jiawei Wu
[wujiawei.aml@bytedance.com](mailto:wujiawei.aml@bytedance.com)
2023-12-10 12:30:37 +08:00
Felix Schneider fb21a85874
[TorchToLinalg] Lower grouped conv2d to linalg Op with correct dimension ordering (#2623)
The linalg Op `linalg.conv_2d_ngchw_fgchw` had a bug where

1. Weights were accessed as G,F,C,H,W instead of as F,G,C,H,W
2. Output was accessed as N,F,G,H,W instead of as N,G,F,H,W

Now this has been fixed in
https://github.com/llvm/llvm-project/pull/73855 which broke the
torch-mlir lowering to that Op.

This patch switches lowering in torch-mlir to the newly introduced
`linalg.conv_2d_ngchw_gfchw` op which accesses weights in an order that
is compatible with PyTorch's memory layout.

Fix https://github.com/llvm/torch-mlir/issues/2622
2023-12-08 14:18:23 +01:00
Stella Laurenzo 8252656b6d
Advance llvm-project and stablehlo. (#2619)
llvm-project: bbd2b08b95fe76bea138c1b03c1cd42ed3ee04df
stablehlo: ab709fe48de88c67717abfbd7ef17425eb95ddaf

These commits were chosen in order to account for an MLIR API break from
3dbac2c007
which required a patch to stablehlo. We integrate a bit beyond that
commit to deal with some revert/reapply cycles in the intervening range
which were discovered in another downstream.

Further, it requires adaptation to the stablehlo API breaks introduced
from https://github.com/openxla/stablehlo/pull/1872 which are along for
the ride.

Since some stablehlo builders were changed to directly take int64_t
array refs, also traced that up some call stacks to eliminate some
signed/unsigned mismatches that result.

Also adds a few TOSA tests to the passing set that seem to work now.
2023-12-07 23:13:42 -08:00
Frederik Harwath 6248216dca
Add aten.min.dim to linalg lowering (#2600) 2023-12-05 07:16:35 -08:00
Vivek Khandelwal 10b5432e7d build: manually update PyTorch version
Set PyTorch and TorchVision version to nightly release 2023-12-04.

Signed-Off By: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2023-12-05 13:18:47 +05:30
James Newling 1b7d6f2af9
Improve decomposition of pixel_shuffle (support dynamic shapes) (#2590)
The aten.reshape ops in the decomposition are replaced with prims.collapse 
and prims.split_dim ops, which means that the cases where the lowering of
reshape from torch to linalg which are not supported, are avoided.

Essentially, by using the collapse and split_dim ops instead of the
reshape ops, we are not "losing" the information that the reshapes do not
arbitrarily mix dimensions. Which makes lowering easy. 

3 additional tests added: 
- fully dynamic, 
- dynamic only the spatial dimensions, 
- dynamic only in the non-spatial dimensions.
2023-11-22 12:31:06 -08:00
James Newling 03e8f99730
Lowering to linalg of prims split_dim op (#2576)
Adds support for lowering to prims split_op. 

Similar design to collapse op lowering in 
https://github.com/llvm/torch-mlir/pull/2572, with some 
small differences, because the split_dim op (in pytorch) is
view-changing whereas the collapse is not. The difference 
means that 

1) it must be registered in the function Torch::isViewLikeOp
2) it must be be added to the "expected fail" set for the torch dynamo backend.
2023-11-21 07:56:09 -08:00
Yuanqiang Liu facbe5d96b
[Torch Dialect] support AtenArangeStartOutOp in ReduceOpVariants like… (#2563)
… AtenBernoulli_FloatOp

It fixing case like: `%2110 = torch.aten.arange.start_out %int1,
%int1517, %int1, %2109 : !torch.int, !torch.int, !torch.int,
!torch.tensor -> !torch.tensor`.
`aten.arange.start_out` doesn't have value semantics also, means`%2110`
is an alias for %2109.
So I decompose it to `aten.arange.start` + `torch.contents.overwrite`.  
The complex decomposition logic is target to handle cases like view and
dtype cast which I add in e2e tests.
2023-11-17 00:51:55 +08:00
James Newling e81282ae8f
Support for prims collapse op (lowering to linalg) (#2572)
Steps taken:
1) add generator code to torch_ods_gen.py, run update_torch_ods.sh
2) add (custom) shape and type inference generator code to
abstract_interp_lib_gen.py, run update_abstract_interp_lib.sh
3) Implement lowering to tensor.collapse_dims. Requires the `start` and
`end` values to be constant, else lowering fails
4) Update xfail_sets.py (append to LTC_XFAIL_SET) after running
/tools/e2e_test.sh --filter Collapse --verbose -c XX for all support
backends (XX).

Motivation: 
- Supporting the collapse operation will be useful for lowering of
pixel_shuffle (see Issue #2559)
2023-11-15 08:34:38 -08:00
Yuanqiang Liu 3ab790c50a
[Torch Dialect] add canonicalize for aten.numel (#2562) 2023-11-11 12:16:53 +08:00
James Newling b6e551c7b8
Decomposition of aten.pixel_shuffle with static input shape (#2550)
For static tests (that is when the shape is know) for example:

 ```
 @annotate_args([None, ([3, 18, 2, 2], torch.float32, True)])
 ```
 
The e2e passes. But only if the replacement op's return type is set as
undefined (optional shape and type must be explicitly made unset),
otherwise there's a error about the function return type.
 
 For dynamic cases, for example if the above is replaced with 
 
  ```
 @annotate_args([None, ([-1, -1, -1, -1], torch.float32, True)])
 ```

There is a failure to lower to linalg from torch ("view op explicitly
labelled as illegal"). This seems to be because the support for lowering
from torch to linalg with dynamic shapes is limited.
2023-11-08 08:52:44 -05:00
JianzheXiao a42d4c18ff
[Torch Dialect]Support aten.cosine_similarity (#2364)
As title, add support for aten.cosine_similarity, support broadcast
inputA/inputB to the same shape
2023-11-08 15:28:30 +08:00
Jiawei Wu d5ee8ee73a
[Torch Dialect] emit aten.reshape_as op and add decomposition pattern. (#2553) 2023-11-05 11:38:36 +08:00
Yuanqiang Liu 0378da0abd
[Torch Dialect] support aten.isinf (#2544)
Also fix linalg lowering from `UEQ` to `OEQ`.  
I will check other comparison's lowering later.
2023-11-04 22:26:01 +08:00
Stella Laurenzo 6961f0a247
Re-organize project structure to separate PyTorch dependencies from core project. (#2542)
This is a first step towards the structure we discussed here:
https://gist.github.com/stellaraccident/931b068aaf7fa56f34069426740ebf20

There are two primary goals:

1. Separate the core project (C++ dialects and conversions) from the
hard PyTorch dependencies. We move all such things into projects/pt1 as
a starting point since they are presently entangled with PT1-era APIs.
Additional work can be done to disentangle components from that
(specifically LTC is identified as likely ultimately living in a
`projects/ltc`).
2. Create space for native PyTorch2 Dynamo-based infra to be upstreamed
without needing to co-exist with the original TorchScript path.

Very little changes in this path with respect to build layering or
options. These can be updated in a followup without commingling
directory structure changes.

This also takes steps toward a couple of other layering enhancements:

* Removes the llvm-external-projects/torch-mlir-dialects sub-project,
collapsing it into the main tree.
* Audits and fixes up the core C++ build to account for issues found
while moving things. This is just an opportunistic pass through but
roughly ~halves the number of build actions for the project from the
high 4000's to the low 2000's.

It deviates from the discussed plan by having a `projects/` tree instead
of `compat/`. As I was thinking about it, this will better accommodate
the follow-on code movement.

Once things are roughly in place and the CI passing, followups will
focus on more in-situ fixes and cleanups.
2023-11-02 19:45:55 -07:00