Prior to this patch, the canonicalizers for `AtenSizeOp` and
`AtenSizeIntOp` succeeded only if the tensor operand's type information
included the size of the requested dimension(s). We can extend the set
of optimizable cases by propagating types across operations whose result
type matches the input tensor type.
Specifically, this patch enables the canonicalizers for `AtenSizeOp` and
`AtenSizeIntOp` to see past `tensor_static_info_cast`,
`copy.to_vtensor`, and `copy.to_tensor` ops until it reaches the first
op whose result type contains size information for the requested
dimensions, with a maximum bound of 6 parent lookups to avoid indefinite
compilation times. All other encountered ops cause the canonicalizer to
give up.
This patch replaces the use of raw integers like 6, 4, etc. (that
represent PyTorch's scalar types) with named values from the ScalarType
enum (e.g. `ScalarType::Float`, `ScalarType::Long`, etc.) in code for
folding `prim.dtype` ops into numeric constants.
This patch isn't strictly a non-functional change, since its use of
`Torch::getScalarTypeForType()` implies that the input type has to be
one among the supported types, otherwise compilation will abort, whereas
previously, compilation proceeded without folding the unsupported data
type into a numeric constant.
A prior patch (63538de2) that added support for bfloat16 type did not
add the canonicalization pattern to fold `torch.prim.dtype` operations
on bfloat16 tensors into the integer constant 15. This patch fixes the
problem.
This commit adds lowering of `aten.ge.float`, `aten.ge.float_int`,
`aten.ne.float_int`, `aten.gt.float_int` and `aten.ceil.float` op.
This commit also fixes formatting for the file scalar.py and scalar_comparison.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit decomposes `aten.to.dtype_layout` op into `aten.to.dtype` op.
This commit also fixes the formatting for the file type_conversion.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
This commit adds lowering of `aten.ceil.float` op.
This commit also fixes formatting for the file scalar.py.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
The updated LLVM code includes a patch to create bfloat16 array
attributes, thus enabling a different patch to torch-mlir to flesh out
support for the bfloat16 type.
The issue was in the canonicalizer for torch.aten.ge.int -- in cases
where the operands were swapped, it would miscompile. This issue is
fixed and folding support generalized to `torch.aten.size.int < 0` as
well.
Fixes#716
A recent PyTorch commit made ConstantPad2d call a helper function with a
`Union[int, float]` type annotated. This commit adds minimal support for
representing and dealing with that.
https://github.com/pytorch/pytorch/pull/73287
Changes:
- Adding support for `!torch.union<T1, T2, T3>`/`Torch::UnionType`,
along with the importer and CAPI code.
- Add support in isValidSubtype for union types.
- Adding a canonicalizer for `torch.derefine` to help simplify some code
that derefines to a UnionType (this also fixes#664).
There is still more work to do for really supporting UnionType well,
such as canonicalizing UnionType's so that they can be compared with
pointer equality.
The reified code to compute the shape of torch.aten.constant_pad_nd
uses negative indices when setting list elements. This was not
converted to a positive offset in one place in SimplifyShapeCalculations
which prevented computation of the static shape.
This commit adds support for type refinement when
`torch.tensor_static_info_cast`s are involved, even when there are
users of the casted tensor that don't allow type refinements.
Originally the canonicalization pattern for
`torch.tensor_static_info_cast` would check if all the users of the
casted tensor allowed type refinements before making any changes. This
means that if at least one of the users did not allow type
refinements, the pattern would fail. This becomes an issue when doing
shape calculations because the calculations need the shape information
of each input tensor to be available before the calculation can be
simplified.
The ODS-generated code included via the `TorchOps.cpp.inc` file takes a
very long time to compile. This PR isolates it into its own file so that
the build system can cache it.
This PR creates a new file `TorchOpsODSGenerated.cpp` just to include
the `TorchOps.cpp.inc` file. Doing so required moving to the "new" way
to define verifiers, since the static `verify` free functions in
TorchOps.cpp weren't accessible from the .inc file after it was moved to
`TorchOpsODSGenerated.cpp`.
On my machine, this drops the build time of TorchOps.cpp (such as when
iterating on a canonicalizer) from >40 seconds to <10 seconds.
10 seconds still isn't great though, but at least it isn't "go get a
coffee" type of waiting.
See the documentation in `docs/shape_lib.md` and
`docs/adding_a_shape_function.md` for an overview of the system.
This completely overhauls how we represent shape functions. In
particular, RefineTypes does not infer shapes anymore (only dtypes).
Shape functions are now written in (TorchScript'able) Python.
Recommended review order:
1. Read `docs/shape_lib.md` and `docs/adding_a_shape_function.md`.
1. Code and tests for ReifyShapeCalculations, DropShapeCalculations.
1. Code and tests for SimplifyShapeCalculations.
1. shape_lib_gen.py
1. Code and tests for new RefineTypes pass.
1. Random folders/canonicalizers in TorchOps.cpp and associated test in
`canonicalize.mlir`.
1. New ReadOnly trait inferred from the registry.
1. Any miscellaneous remaining stuff.
Example `-print-ir-after-all` for ElementwiseUnaryModule:
[IR lowering dump](https://gist.github.com/silvasean/e4dc8cbc8d00aac7819602e3cbd8e212).
Example `-print-ir-after-all` for ElementwiseBinaryModule:
[IR lowering dump](https://gist.github.com/silvasean/daf6860ecced732af3568af6b1899113).
- This commit adds lowering of `aten.Bool.Tensor` and
`aten.Float.Tensor` op as a part of `convert-torch-to-linalg` pass.
- It also adds support for returning bool types.
- It also fixes lowering of the `aten.Int.Tensor` op for non-zero rank
input tensors.
- If a scalar number is converted to a 0-d tensor and passed on to the
`aten.Float.Tensor` op, it folds to the scalar number.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
The added e2e maxpool testcase from #545 was not getting a static shape
due to an unfolded prim.If when RefineTypes was called. This was because
of unfolded torch.iaten.__is__ and torch.prim.unchecked_cast operators
with torch.derefine operands.
Note that to enable folding of the code coming from an example
like the ConstantPad2dStaticModule e2e test, support for other
operations had to be added/improved:
- aten::neg.int
- aten::eq.float
- aten::eq.str
- prim::Uninitialized
- It folds `aten.to.dtype` when the input tensor type and result type
are exactly same.
- It folds `aten.view` when the rank of both the input tensor type and
result type is unity.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds lowering of `aten.squeeze.dim` op into
`linalg.TensorCollapseShape` op. Here, the dim(th) dimension of the
input tensor is not supposed to be dynamic.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This commit adds lowering of `aten.Squeeze` op into
`linalg.TensorCollapseShape` op. The size 1 dynamic dimensions are not
handled as a part of this commit.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
This is to fold the common pattern from Bert inference like:
```
%111 = torch.prim.NumToTensor.Scalar %110 : !torch.int ->
!torch.vtensor<[],si64>
%112 = torch.aten.Int.Tensor %111 : !torch.vtensor<[],si64> ->
!torch.int
```
This change is to unblock the work of some backprop ops returning more
than one tensors. We will need to think of a more scalable approach
in the future if more flexible return types combinations are needed.
- Added a DecomposeComplexOps pass to decompose complex torchOps.
- Refactored `visitAtenArgmaxOp` and `visitAtenAnyDimOp` to
`visitReductionAlongDimIntOp`.
- Moved some helper functions into
torch-mlir/Dialect/Torch/Utils/Utils.h to be shared by multiple files.
- Added support for f64 tensor as argument and return types.
This commit (with approval from all contributors) dual licenses
the torch-mlir project under both the standard LLVM license and the
standard PyTorch license. This will facilitate moving code between
torch-mlir and the two upstream projects.
The standard file comment is now:
```
// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// Also available under a BSD-style license. See LICENSE.
```
See `LICENSE` in the project root for the terms of both licenses.
This creates the `external/torch-mlir` directory as an
LLVM_EXTERNAL_PROJECTS-compatible project (analogous to
`iree-dialects`) and completes movement/rename of all pure MLIR C/C++
compiler code into there. The next step will be to move all the Python
code / code that links/includes PyTorch C++ code (which currently lives
in `frontends/pytorch`) into a subdirectory here.
I call this "earthmoving" because it is mostly mechanical changes and
renames. As a quick summary (we can change this down the road easily)
- C++ `mlir::NPCOMP::Torch -> mlir::torch::Torch`
- CAPI `npcompTorchListTypeGet -> torchMlirTorchListTypeGet`
- preprocessor `#ifndef NPCOMP_ -> #ifndef TORCHMLIR_`
- CMake `NPCOMPFoo -> TorchMLIRFoo`
The goal of this is to create a standalone project creating a center of
mass for entry into the MLIR ecosystem from PyTorch, suitable in scope
for eventual inclusion/ownership in PyTorch. The idea is that
`external/torch-mlir` will some day be pulled out into its own
repository, and then npcomp will simply pull it in as a submodule.
Layering-wise, what lives in `torch-mlir` lowers code from PyTorch
(currently TorchScript, but TorchFX or pytorch/xla-style tracing are
possible extensions) down to what we have been calling the "Torch
backend contract" which is cleaned up IR (inlining, simplifcation,
conversion to value tensors, ...) entirely in the `torch` dialect. This
is the branching off point for further lowering, of which npcomp takes
one opinion (outside `torch-mlir` of course!), namely the
`TorchConversion` dialect/transforms which lower to IR suitable for IREE
and other linalg-on-tensors based lower-level compilers.
Summary of changes:
- move `{include,lib,test}/Dialect/Torch` into `torch-mlir`
- move relevant parts of CAPI into `torch-mlir`.
- leave a few things related to the `torch-mlir` Python build commented
out, which should be resolved in a subsequent change.
This contains the following changes:
- Fix optional knowledge propagation. The initial knowledge should
always be NotNone for the operations we implemented.
- Add Folder for `prim.dtype`
- builder.getSymbolRefAttr is gone.
- OpAsmOpInterface's getAsmResultNames method needs explicit override
- a bunch of churn for builtin.func needing to be made explicit (and
sometimes implicit?)
- operation printers no longer need to print the operation name
themselves.
- snuck in beneficial trivial addition to TmpDeleteDeadIREEListsPass to
test a particular upstream change e2e with my local patchset.
- Add `!torch.optional` knowledge tracking
- Changes to improve type propagation for branches and terminators. See
examples in `refine-types-branch.mlir`
- Refator to separate handling of different ops from `visitOperation`
- Add refine types for a few new ops