Handle both `torch.dequantize` and `torch.quantize_per_tensor` including
the op based quantization parameter tracking. This includes adding
`qint32` to torch types as it was missing during the initial type
inclusion.
For testing we only have `torch.int8` and `torch.float` types on
function boundaries as the `qint8` types require passing the scale
and zero point quantization information which is not supported yet.
This PR updates the torch-to-tosa conversion with following changes:
- Support torch.none as min/max input argument for tosa.clamp op
- Support negative value as start index for tosa.slice op
- Add tosa.logical_or lowering support
e2e test:
python -m e2e_testing.main --config=tosa
LIT tests:
cmake --build build --target tools/torch-mlir/all
---------
Co-authored-by: Ze Zhang <ze.zhang@getcruise.com>
Adaptive pooling ops can only be decomposed into their non-adaptive
counterparts in trivial cases.
For example, the current decomposition for AtenAdaptiveAvgPool1dOp in
DecomposeComplexOps.cpp supports outSize = inSize (i.e., do literally
nothing), and outSize = 1 (i.e., do a batched average).
The reason adaptive pooling ops are difficult to lower to linalg is that
they are not constantly strided. They are computed by taking an input
tensor of shape (N, C, Hin), and an output size Hout, and computing the
output tensor at position (n,c, h) in the following way:
1. compute st(h) = (h*Hin)//Hout
2. compute en(h) = 1 + ((h+1)*Hin -1)//Hout
3. apply a computation (max or avg) to the slice: INPUT[n, c,
st(h):en(h)]
The provided sample implementation (for ConvertAtenAdaptiveAvgPool1dOp)
uses tensor.extract to access the input tensor inside the payload of a
linalg generic op. This is likely an unattractive use of linalg generic
ops, which is why I am asking for some more targeted feedback on the
validity of this approach before attempting to support the many other
adaptive pooling ops.
Specifically:
- Is the performance of this implementation bad enough to warrant
targeting different dialects entirely? e.g. TMtensor/linalg ext/ etc.
- If the provided implementation is of acceptable performance to the
community, then is it permissable to remove the Adaptive pooling
decompositions from DecomposeComplexOps.cpp? Based on the current
structure of the -torch-decompose-complex-ops pass, it does not seem
possible to only decompose the adaptive ops in special cases (it seems
to get stuck in an infinite loop on a match failure). I would be happy
to instead incorporate the case logic into the conversion directly, and
remove the decompositions once they are rendered completely obsolete.
As long as this approach is acceptable, I can clean up the
implementation with some helper functions, and quickly add support for
each of the remaining Adaptive pooling ops.
Adds a lowering to Linalg for reflection_pad1d. Based on ideas/code from draft PR
https://github.com/llvm/torch-mlir/pull/2693.
---------
Co-authored-by: Kumar Deepak <kumar@xilinx.com>
As noted in the plan when this work started, we need to produce an ORT
EP plugin for a downstream project, and this will necessitate a C-based
ONNX importer (as opposed to the existing Python one). Because this
comes with dependencies that we do not want to impart on various
projects, this is optional in torch-mlir. It is also factored so that it
can be used as standalone sources in downstreams that need it. Since it
only depends on public C APIs on the MLIR side, this will make build
coupling a lot better (since a C++ dep is not needed on the compiler and
it is trivial to dynamically load).
Our original plan was just to maintain this fork off to the side in our
ORT plugin, but once work started, it seemed better to write it clean
and contribute it upstream for anyone to use. We expect that for non-ORT
use, the Python importer will have better ergonomics for most folks.
I will follow-up with a test suite refactor so that we can drive the
Python or C importer.
This is a relatively mechanical port from Python to C, borrowing some
scaffolding from the old JitIR importer. It does attempt to lay some
groundwork for external data, which will need to be implemented on the
Python side as well.
Changes made during upstreaming:
* Removed comments attributing some copied code back to torch-mlir
(since it is now repatriated).
* Re-organized imports.
* Inlined RefMapping/RefTracker and TypeSubclassMap from an external
utility module.
* Added FxImporter class comments.
* Updated stack trace extraction to be fail safe.
* Added an entry-point for `import_frozen_exported_program` which uses
the shiny new upstream `torch.export.export()` API (versus the
lower-level/older API that Turbine is presently using). This
necessitated a small FX rewrite to line external state management up
with current conventions.
* Adapted one of Turbine's importer tests to go with this initial
submission. Turbine unfortunately has a lot of more-integration-ey
tests, and I would like to extract those as more of unit tests of the
importer features and upstream them that way vs trying to copy directly.
For now, one overall test with the initial submission gets us moving.
I acknowledge that there are some code quality things that could be
improved in this submission: this was authored over the course of many
months (and often via some trial and error). I would like to keep it
relatively converged with the downstream for the next few steps while
getting the test suite upstreamed. And then it will be easier to take a
hygienic pass through the code.
Including co-authors for contributors in the git log of the original
repository.
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: Avinash Sharma <aviator1994@gmail.com>
Co-authored-by: Arham Khan <arhammkhan@gmail.com>
Co-authored-by: brucekimrokcmu <kwangkyk@alumni.cmu.edu>
Co-authored-by: saienduri <77521230+saienduri@users.noreply.github.com>
The expression for HardSigmoid in Onnx
(https://onnx.ai/onnx/operators/onnx__HardSigmoid.html): max(0, min(1,
alpha * x + beta))
is inherently different from HardSigmoid in Torch
(https://pytorch.org/docs/stable/generated/torch.nn.Hardsigmoid.html)
which is: if x < -3 -> 0
elif x > 3 -> 1
else x/6 + 1/2
That being said, it was just better to compute out the entire expression
when translating the Onnx expression to Torch mlir, which is done in
this PR. Some of the logic is shared from the files in
`DecomposeComplexOps`. Therefore, refactored some shared logic between
`DecomposeComplexOps` and `DefaultDomainGToP` and put it in a `Utils`
file.
- Going through the `#torch-mlir` channel on the `llvm` discord, I
realize that there are some useful commands that would be extremely
helpful in creating Onnx lowers to Torch MLIR. Seems a lot of people are
contributing to this. So, I thought it would be good to add this
information to the docs.
These tools helped streamlined the development of this PR:
https://github.com/llvm/torch-mlir/pull/2682
This PR adds the `enable_ir_printing` option to `torch_mlir.compile`,
which can be used to print the IR for all intermediate passes.
When running the added test file via:
```shell
$ python test/python/compile.py 2> tiny.stderr
```
the file `tiny.stderr` is about 700 KB.
Adding the `--progress` flag shows the same output as what `git clone`
would show. This is very nice for slow connections. Without it, the
command may run for many minutes without providing any indication that
it is still doing something.
For `--depth=1`, I think it should be safe as most people have new
enough git versions nowadays, but let's be safe and make it an optional
suggestion. I ran all the tests fine with `--depth=1`, but I don't know
whether things will keep working when the submodules get updated for
systems with old git versions.