2021-02-18 03:28:51 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2021-09-30 00:03:40 +08:00
|
|
|
// Also available under a BSD-style license. See LICENSE.
|
2021-02-18 03:28:51 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
[torch-mlir earthmoving (1/N)] C/C++ code movement.
This creates the `external/torch-mlir` directory as an
LLVM_EXTERNAL_PROJECTS-compatible project (analogous to
`iree-dialects`) and completes movement/rename of all pure MLIR C/C++
compiler code into there. The next step will be to move all the Python
code / code that links/includes PyTorch C++ code (which currently lives
in `frontends/pytorch`) into a subdirectory here.
I call this "earthmoving" because it is mostly mechanical changes and
renames. As a quick summary (we can change this down the road easily)
- C++ `mlir::NPCOMP::Torch -> mlir::torch::Torch`
- CAPI `npcompTorchListTypeGet -> torchMlirTorchListTypeGet`
- preprocessor `#ifndef NPCOMP_ -> #ifndef TORCHMLIR_`
- CMake `NPCOMPFoo -> TorchMLIRFoo`
The goal of this is to create a standalone project creating a center of
mass for entry into the MLIR ecosystem from PyTorch, suitable in scope
for eventual inclusion/ownership in PyTorch. The idea is that
`external/torch-mlir` will some day be pulled out into its own
repository, and then npcomp will simply pull it in as a submodule.
Layering-wise, what lives in `torch-mlir` lowers code from PyTorch
(currently TorchScript, but TorchFX or pytorch/xla-style tracing are
possible extensions) down to what we have been calling the "Torch
backend contract" which is cleaned up IR (inlining, simplifcation,
conversion to value tensors, ...) entirely in the `torch` dialect. This
is the branching off point for further lowering, of which npcomp takes
one opinion (outside `torch-mlir` of course!), namely the
`TorchConversion` dialect/transforms which lower to IR suitable for IREE
and other linalg-on-tensors based lower-level compilers.
Summary of changes:
- move `{include,lib,test}/Dialect/Torch` into `torch-mlir`
- move relevant parts of CAPI into `torch-mlir`.
- leave a few things related to the `torch-mlir` Python build commented
out, which should be resolved in a subsequent change.
2021-09-10 03:24:10 +08:00
|
|
|
#include "torch-mlir/Dialect/Torch/Transforms/Passes.h"
|
Support multiple instances of a class in GlobalizeObjectGraph.
This happens in practice with e.g. ResNet from torchvision (multiple
instances of the same BatchNorm class).
The key observation is that for this program, and the expected set of
programs, we can convert the program to the same globalized form with a
bit more static analysis and effort to suitably monomorphize the
program. Though what we are doing here is fairly annoying to implement,
it saves any nontrivial later pass from having to do similar analyses
(or worse). E.g. shape inference would need to be object-graph aware,
mutation/lifetime analyses would have to be aware, etc. Additionally, it
would make us front-load what it means to have a !torch.nn.Module type
on an ABI boundary, which we are just not ready to handle.
I'm really, really hoping that in practice we can get away with
this, otherwise it's going to be really rough designing a representation
(and implementing everything to back it) that is convenient to transform
and gracefully scales from full object graph (in the most dynamic case)
down to a fixed set of global slots like we have here (in the most
static case, which we presume a lot of practical programs fall into).
This also involved introducing a
`torch-prepare-for-globalize-object-graph` pass that does a minimal set of
lowerings to simplify the IR into a more orthogonal and analyzable form,
and a `torch-globalize-pipeline` helper.
Recommended review order:
- updated documentation in Passes.td
- new tests in `globalize-object-graph-multiple-instances*.mlir`
- implementation of GlobalizeObjectGraph.cpp
- PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir
- misc stuff like torch-globalize-pipeline pipeline definition.
With this, we can import, globalize, and inline resnet18 from
torchvision:
https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8
2021-03-10 12:33:21 +08:00
|
|
|
#include "mlir/Pass/PassManager.h"
|
2021-04-22 06:07:15 +08:00
|
|
|
#include "mlir/Transforms/Passes.h"
|
2021-02-18 03:28:51 +08:00
|
|
|
|
[torch-mlir earthmoving (1/N)] C/C++ code movement.
This creates the `external/torch-mlir` directory as an
LLVM_EXTERNAL_PROJECTS-compatible project (analogous to
`iree-dialects`) and completes movement/rename of all pure MLIR C/C++
compiler code into there. The next step will be to move all the Python
code / code that links/includes PyTorch C++ code (which currently lives
in `frontends/pytorch`) into a subdirectory here.
I call this "earthmoving" because it is mostly mechanical changes and
renames. As a quick summary (we can change this down the road easily)
- C++ `mlir::NPCOMP::Torch -> mlir::torch::Torch`
- CAPI `npcompTorchListTypeGet -> torchMlirTorchListTypeGet`
- preprocessor `#ifndef NPCOMP_ -> #ifndef TORCHMLIR_`
- CMake `NPCOMPFoo -> TorchMLIRFoo`
The goal of this is to create a standalone project creating a center of
mass for entry into the MLIR ecosystem from PyTorch, suitable in scope
for eventual inclusion/ownership in PyTorch. The idea is that
`external/torch-mlir` will some day be pulled out into its own
repository, and then npcomp will simply pull it in as a submodule.
Layering-wise, what lives in `torch-mlir` lowers code from PyTorch
(currently TorchScript, but TorchFX or pytorch/xla-style tracing are
possible extensions) down to what we have been calling the "Torch
backend contract" which is cleaned up IR (inlining, simplifcation,
conversion to value tensors, ...) entirely in the `torch` dialect. This
is the branching off point for further lowering, of which npcomp takes
one opinion (outside `torch-mlir` of course!), namely the
`TorchConversion` dialect/transforms which lower to IR suitable for IREE
and other linalg-on-tensors based lower-level compilers.
Summary of changes:
- move `{include,lib,test}/Dialect/Torch` into `torch-mlir`
- move relevant parts of CAPI into `torch-mlir`.
- leave a few things related to the `torch-mlir` Python build commented
out, which should be resolved in a subsequent change.
2021-09-10 03:24:10 +08:00
|
|
|
void mlir::torch::registerTorchPasses() {
|
2022-10-05 06:53:28 +08:00
|
|
|
mlir::torch::registerPasses();
|
2021-04-27 05:22:50 +08:00
|
|
|
mlir::PassPipelineRegistration<Torch::TorchLoweringPipelineOptions>(
|
2021-10-08 10:07:03 +08:00
|
|
|
"torchscript-module-to-torch-backend-pipeline",
|
Add TorchToIREE and factor out TorchConversion dialect.
This converts a basic list op (torch.prim.ListConstruct) to the IREE
dialect.
```
def forward(self, x: float):
return [x, x]
```
turns into:
```
builtin.func @forward(%arg0: !torch.float) -> !torch.list<!torch.float> {
%0 = torch.prim.ListConstruct %arg0, %arg0 : (!torch.float, !torch.float) -> !torch.list<!torch.float>
return %0 : !torch.list<!torch.float>
}
```
which turns into:
```
builtin.func @forward(%arg0: f64) -> !iree.list<f64> {
%c1 = constant 1 : index
%c0 = constant 0 : index
%c2 = constant 2 : index
%0 = iree.list.create %c2 : !iree.list<f64>
iree.list.set %0[%c0], %arg0 : !iree.list<f64>, f64
iree.list.set %0[%c1], %arg0 : !iree.list<f64>, f64
return %0 : !iree.list<f64>
}
```
As part of doing this, I realized that it was time to formalize the IR
form that we reach right before running TorchTo{Linalg,Std,...}. We now
call it the "Torch backend contract". We then lower the "Torch backend
contract" to the "npcomp backend contract", which involves the new
TorchConversion (`torch_c`) dialect, which holds ops that need to
operate on both the npcomp backend types (e.g. builtin tensors, i1, IREE
list, etc.) and the `!torch` types.
This made more sense, as I realized that if I didn't factor out
`torch_c` then the Torch dialect would have a dependency on IREE
dialect (we previously didn't notice this was an issue because we only
depended on `builtin` types), which seemed wrong to me.
Recommended review order:
- TorchToIREE.cpp / `TorchToIREE/basic.mlir`
- Look at the new structure of createTorchScriptToNpcompBackendPipeline.
It now lives in TorchConversion/Transforms/Passes.cpp and cleanly
calls into `Torch::createTorchScriptToTorchBackendPipeline` for the
frontend lowering to the Torch backend contract.
- Mechanical change extracting
`torch_c.{to,from}_{i1,i64,f64,builtin_tensor,iree_list}` into a new
TorchConversion dialect, and a few passes specific to the lowering
from the Torch backend contract to the npcomp backend contract.
- Minor fixes to TorchToLinalg.cpp to use unconverted operands (now that
we convert lists as part of operand materialization, we need to use
the original operands). Also added test for AtenMaxPool2dOp and fixed
m_TorchConstantIntList.
- TmpDeleteDeadIREELists pass. Temporary pass for deleting dead IREE lists that
are created as part of operand materialization for conv/max pool/avg pool ops
in TorchToLinalg.
2021-08-12 05:40:08 +08:00
|
|
|
"Pipeline lowering TorchScript object graph IR to Torch backend form.",
|
2021-10-08 10:07:03 +08:00
|
|
|
mlir::torch::Torch::createTorchScriptModuleToTorchBackendPipeline);
|
[Pipeline] Use dedicated simplification pipeline for TorchDynamo frontend (#3376)
Discord Thread:
https://discord.com/channels/636084430946959380/1238330633328005243
## Context:
[This](https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/fx.py#L61)
was updated to support e2e tests for the TorchDynamo frontend in
Torch-MLIR, where we run FX decompositions and import the FX IR to
generate Torch dialect, followed by
`torch-function-to-torch-backend-pipeline`, skipping only the shape/type
refinement for now. However, we should be able to skip many of the torch
simplification passes, as depicted in the [frontend
roadmap](https://github.com/llvm/torch-mlir/blob/main/docs/images/roadmap_frontend.png).
Based on IREE's TorchDynamo
[pipeline](https://github.com/iree-org/iree/blob/main/compiler/plugins/input/Torch/InputConversion/Passes.cpp#L29),
the only two passes we seem to require are: `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. This is inline with our findings as well
based on initial exploration.
This PR creates a dedicated frontend simplification pipeline for
TorchDynamo / FX Importer which calls only `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. We rely on the e2e fx_importer tests to
ensure we're not regressing by removing many of the passes that were
historically needed for TorchScript.
One notable change here is that we do not call the
`LowerToBackendContractPass` anymore, which used to call
`TorchSimplificationPipeline` iteratively until VerifyBackendContract
was clean. Some of this was required for the shape/type refinement to
converge, which seems a non-issue for Dynamo frontend. Do we anticipate
this (the iterative invocation of TorchSimplificationPipeline followed
by VerifyBackendContract) to be worth retaining in the Dynamo frontend
pipeline? If so, I can make those changes, PLMK.
2024-05-22 20:23:18 +08:00
|
|
|
mlir::PassPipelineRegistration<Torch::TorchLoweringPipelineOptions>(
|
|
|
|
"torchdynamo-export-to-torch-backend-pipeline",
|
|
|
|
"Pipeline lowering TorchDynamo exported graph IR to Torch backend form.",
|
|
|
|
mlir::torch::Torch::createTorchDynamoExportToTorchBackendPipeline);
|
2021-04-27 05:22:50 +08:00
|
|
|
mlir::PassPipelineRegistration<Torch::TorchLoweringPipelineOptions>(
|
2021-10-08 10:07:03 +08:00
|
|
|
"torch-function-to-torch-backend-pipeline",
|
|
|
|
"Pipeline lowering a Torch function to Torch backend form.",
|
|
|
|
mlir::torch::Torch::createTorchFunctionToTorchBackendPipeline);
|
2022-03-10 08:44:22 +08:00
|
|
|
mlir::PassPipelineRegistration<Torch::TorchLoweringPipelineOptions>(
|
2022-08-05 02:39:21 +08:00
|
|
|
"torch-simplification-pipeline",
|
|
|
|
"Pipeline simplifying computations in the program.",
|
|
|
|
mlir::torch::Torch::createTorchSimplificationPipeline);
|
2023-03-25 10:50:01 +08:00
|
|
|
mlir::PassPipelineRegistration<Torch::TorchLoweringPipelineOptions>(
|
2022-03-10 08:44:22 +08:00
|
|
|
"torch-shape-refinement-pipeline", "Pipeline refining shapes of tensors.",
|
|
|
|
mlir::torch::Torch::createTorchShapeRefinementPipeline);
|
Support multiple instances of a class in GlobalizeObjectGraph.
This happens in practice with e.g. ResNet from torchvision (multiple
instances of the same BatchNorm class).
The key observation is that for this program, and the expected set of
programs, we can convert the program to the same globalized form with a
bit more static analysis and effort to suitably monomorphize the
program. Though what we are doing here is fairly annoying to implement,
it saves any nontrivial later pass from having to do similar analyses
(or worse). E.g. shape inference would need to be object-graph aware,
mutation/lifetime analyses would have to be aware, etc. Additionally, it
would make us front-load what it means to have a !torch.nn.Module type
on an ABI boundary, which we are just not ready to handle.
I'm really, really hoping that in practice we can get away with
this, otherwise it's going to be really rough designing a representation
(and implementing everything to back it) that is convenient to transform
and gracefully scales from full object graph (in the most dynamic case)
down to a fixed set of global slots like we have here (in the most
static case, which we presume a lot of practical programs fall into).
This also involved introducing a
`torch-prepare-for-globalize-object-graph` pass that does a minimal set of
lowerings to simplify the IR into a more orthogonal and analyzable form,
and a `torch-globalize-pipeline` helper.
Recommended review order:
- updated documentation in Passes.td
- new tests in `globalize-object-graph-multiple-instances*.mlir`
- implementation of GlobalizeObjectGraph.cpp
- PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir
- misc stuff like torch-globalize-pipeline pipeline definition.
With this, we can import, globalize, and inline resnet18 from
torchvision:
https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8
2021-03-10 12:33:21 +08:00
|
|
|
}
|
|
|
|
|
2021-10-08 10:07:03 +08:00
|
|
|
void mlir::torch::Torch::createTorchScriptModuleToTorchBackendPipeline(
|
2021-04-27 05:22:50 +08:00
|
|
|
OpPassManager &pm, const TorchLoweringPipelineOptions &options) {
|
2021-04-30 06:13:21 +08:00
|
|
|
// When we import TorchScript IR, we import their entire "compilation unit",
|
|
|
|
// which can contain numerous functions unrelated to the current program,
|
|
|
|
// which breaks torch-globalization-pipeline; for example, there can be
|
|
|
|
// random functions referencing types that haven't been imported
|
|
|
|
// as part of the root `torch.nn.Module` we imported. Those will
|
|
|
|
// be unreferenced private functions which symbol-dce will clean up nicely.
|
|
|
|
pm.addPass(createSymbolDCEPass());
|
|
|
|
// Globalize the program. The rest of the compiler assumes a globalized
|
|
|
|
// program, which makes all analyses and transforms significantly easier
|
|
|
|
// to write.
|
Support multiple instances of a class in GlobalizeObjectGraph.
This happens in practice with e.g. ResNet from torchvision (multiple
instances of the same BatchNorm class).
The key observation is that for this program, and the expected set of
programs, we can convert the program to the same globalized form with a
bit more static analysis and effort to suitably monomorphize the
program. Though what we are doing here is fairly annoying to implement,
it saves any nontrivial later pass from having to do similar analyses
(or worse). E.g. shape inference would need to be object-graph aware,
mutation/lifetime analyses would have to be aware, etc. Additionally, it
would make us front-load what it means to have a !torch.nn.Module type
on an ABI boundary, which we are just not ready to handle.
I'm really, really hoping that in practice we can get away with
this, otherwise it's going to be really rough designing a representation
(and implementing everything to back it) that is convenient to transform
and gracefully scales from full object graph (in the most dynamic case)
down to a fixed set of global slots like we have here (in the most
static case, which we presume a lot of practical programs fall into).
This also involved introducing a
`torch-prepare-for-globalize-object-graph` pass that does a minimal set of
lowerings to simplify the IR into a more orthogonal and analyzable form,
and a `torch-globalize-pipeline` helper.
Recommended review order:
- updated documentation in Passes.td
- new tests in `globalize-object-graph-multiple-instances*.mlir`
- implementation of GlobalizeObjectGraph.cpp
- PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir
- misc stuff like torch-globalize-pipeline pipeline definition.
With this, we can import, globalize, and inline resnet18 from
torchvision:
https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8
2021-03-10 12:33:21 +08:00
|
|
|
pm.addPass(createPrepareForGlobalizeObjectGraphPass());
|
|
|
|
pm.addPass(createGlobalizeObjectGraphPass());
|
2021-04-30 06:13:21 +08:00
|
|
|
// "lower" `torch.global_slot` ops by deleting them if unused, which we
|
|
|
|
// currently require because we don't have a lowering path for backends to
|
|
|
|
// handle them.
|
|
|
|
// Torch usually inserts a few unused global slots so this ends up hitting
|
|
|
|
// every single module even if it doesn't have any explicit slots.
|
|
|
|
// TODO: Support global slots in backends.
|
|
|
|
pm.addPass(createSymbolDCEPass());
|
|
|
|
// Currently, our shape inference is not powerful enough to deal with
|
|
|
|
// calls, so inline everything.
|
|
|
|
// TODO: Improve shape inference.
|
|
|
|
pm.addPass(createInlinerPass());
|
2021-04-22 06:07:15 +08:00
|
|
|
|
2021-10-08 10:07:03 +08:00
|
|
|
createTorchFunctionToTorchBackendPipeline(pm, options);
|
2021-04-22 06:07:15 +08:00
|
|
|
}
|
|
|
|
|
[Pipeline] Use dedicated simplification pipeline for TorchDynamo frontend (#3376)
Discord Thread:
https://discord.com/channels/636084430946959380/1238330633328005243
## Context:
[This](https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/fx.py#L61)
was updated to support e2e tests for the TorchDynamo frontend in
Torch-MLIR, where we run FX decompositions and import the FX IR to
generate Torch dialect, followed by
`torch-function-to-torch-backend-pipeline`, skipping only the shape/type
refinement for now. However, we should be able to skip many of the torch
simplification passes, as depicted in the [frontend
roadmap](https://github.com/llvm/torch-mlir/blob/main/docs/images/roadmap_frontend.png).
Based on IREE's TorchDynamo
[pipeline](https://github.com/iree-org/iree/blob/main/compiler/plugins/input/Torch/InputConversion/Passes.cpp#L29),
the only two passes we seem to require are: `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. This is inline with our findings as well
based on initial exploration.
This PR creates a dedicated frontend simplification pipeline for
TorchDynamo / FX Importer which calls only `ReduceOpVariantsPass` and
`DecomposeComplexOpsPass`. We rely on the e2e fx_importer tests to
ensure we're not regressing by removing many of the passes that were
historically needed for TorchScript.
One notable change here is that we do not call the
`LowerToBackendContractPass` anymore, which used to call
`TorchSimplificationPipeline` iteratively until VerifyBackendContract
was clean. Some of this was required for the shape/type refinement to
converge, which seems a non-issue for Dynamo frontend. Do we anticipate
this (the iterative invocation of TorchSimplificationPipeline followed
by VerifyBackendContract) to be worth retaining in the Dynamo frontend
pipeline? If so, I can make those changes, PLMK.
2024-05-22 20:23:18 +08:00
|
|
|
void mlir::torch::Torch::createTorchDynamoExportToTorchBackendPipeline(
|
|
|
|
OpPassManager &pm, const TorchLoweringPipelineOptions &options) {
|
|
|
|
pm.addNestedPass<func::FuncOp>(
|
|
|
|
createReduceOpVariantsPass(options.extraLibrary));
|
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
|
|
|
if (options.decompose) {
|
|
|
|
pm.addNestedPass<func::FuncOp>(
|
|
|
|
Torch::createDecomposeComplexOpsPass(options.backendLegalOps));
|
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-10-08 10:07:03 +08:00
|
|
|
void mlir::torch::Torch::createTorchFunctionToTorchBackendPipeline(
|
2021-04-27 05:22:50 +08:00
|
|
|
OpPassManager &pm, const TorchLoweringPipelineOptions &options) {
|
2021-09-29 08:25:06 +08:00
|
|
|
// Incorporate user annotations and remove signature Python-isms.
|
|
|
|
pm.addPass(createAdjustCallingConventionsPass());
|
2022-08-05 02:39:21 +08:00
|
|
|
// Perform the bulk of lowering to the backend contract.
|
|
|
|
// See the pass documentation for more information.
|
2022-08-18 07:23:52 +08:00
|
|
|
pm.addPass(createLowerToBackendContractPass(
|
2024-05-10 02:44:36 +08:00
|
|
|
options.maxIterations, options.decompose, options.shapeDtypeRefine,
|
|
|
|
options.backendLegalOps, options.extraLibrary));
|
2022-08-05 02:39:21 +08:00
|
|
|
}
|
2021-09-29 08:25:06 +08:00
|
|
|
|
2022-08-05 02:39:21 +08:00
|
|
|
// A simplification pipeline to establish the invariants of the backend
|
|
|
|
// contract (see `satisfiedBackendContract` in `LowerToBackendContract`).
|
|
|
|
//
|
|
|
|
// We structure this so that a single run of this pipeline is enough for
|
|
|
|
// most models, but it is possible for it to take multiple runs to fully
|
|
|
|
// clean things up when there are cyclic dependencies between certain
|
|
|
|
// simplifications, such as a decomposition relying on shape refinement which
|
|
|
|
// depends on another decomposition.
|
|
|
|
//
|
|
|
|
// Although technically this pipeline is an implementation detail of
|
|
|
|
// LowerToBackendContract, we expose it here to help debugging.
|
|
|
|
//
|
|
|
|
// LowerToBackendContract will run this pipeline as many times as necessary, but
|
|
|
|
// in general, it is costly to re-run this pipeline, since all the passes do
|
|
|
|
// O(module size) work. We want the number of iterations of this pipeline
|
|
|
|
// to be bounded by meaningful "always in practice small" program properties,
|
|
|
|
// such as loop nesting depth, number of sequentially dependent steps of
|
|
|
|
// constant global slots proving that other global slots are dead, etc.
|
|
|
|
//
|
|
|
|
// It is generally always possible to construct a pathological input that will
|
|
|
|
// exceed the number of iterations. If we do find practical cases with
|
|
|
|
// O(module size) number of iterations of this simplification pipeline, then
|
|
|
|
// we may need to adjust the approach, such as to do some of the transformations
|
|
|
|
// together at finer granularity.
|
|
|
|
void mlir::torch::Torch::createTorchSimplificationPipeline(
|
|
|
|
OpPassManager &pm, const TorchLoweringPipelineOptions &options) {
|
|
|
|
// General cleanup.
|
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
|
|
|
// Inline global slots to expose a bunch of simplification opportunities
|
|
|
|
// from constant hyperparameters, weights, etc.
|
|
|
|
pm.addPass(createInlineGlobalSlotsPass());
|
|
|
|
// Erase the module initializer if we have proven that all the global slots
|
|
|
|
// are gone.
|
|
|
|
pm.addPass(createEraseModuleInitializerPass());
|
|
|
|
// Clean up again to avoid needing to to back around the fixed-point
|
|
|
|
// iteration.
|
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
2023-03-29 02:07:47 +08:00
|
|
|
pm.addNestedPass<func::FuncOp>(createRecomposeComplexOpsPass());
|
Significantly restructure torch/aten import design.
This is a really major and invasive restructuring of the way we get
torch operators (`torch::jit::Operator` / `c10::OperatorHandle`) into
MLIR. Please forgive the challenging review, but due to the sheer
invasiveness, it wasn't really practical do do it in sane smaller
pieces.
This fully replaces everything that was already working on the
TorchScript path (actually, more -- we added tanh support to
TorchToLinalg in order to delete the older code paths). Additionally,
I've kept the lights on for the acap path too, including what little e2e
stuff was working before (for expediency I made a few tiny compromises
along the way that will be easy to undo when we give that path proper
attention).
Overview of the new design:
- The torch operator `somens::someunqualname.someoverloadname` is
imported as `torch.somens.someunqualname.someoverloadname` (skip the
last dotted part if the overload name is empty), OR, if we don't have
such an op registered, it is imported as
`torch.operator "somens.someunqualname.someoverloadname" (...) : ...`.
- The addition of the "overload name" is a critical element here, as
the `(ns,unqual,overload)` triple is unique, which solves a lot of
problems we were having.
- This involves having separate MLIR ops for the `trailing_` and
`.out` variants and all the different overloads. This seemed
necessary, because the set of overloads is so wild and varied and
unstructured. The previous design was leaning into some underlying
structure that just isn't there -- the default situation is
the "random overload that we want to manage on the MLIR side",
rather than that being an exception. E.g. `aten::ne` (not-equal)
has 21 overloads, only 4 of which are c10 dispatcher ops see
[gist](https://gist.github.com/silvasean/190ba918c550c956260e21254e1b8aa1),
and the "out" variant is really called `.Tensor_out` instead of
`.out` as it frequently is for other ops.
- Rationale for all being in `torch` namespace: the set of operators
are so varied and unstructured that "dialect per namespace"
doesn't result in anything resembling the typical MLIR dialect
boundary expectations. We could maybe draw the boundary at
dispatcher ops vs non-dispatcher ops, but that doesn't seem to
really result in very much useful structure at this point in time.
- Note: within the torch operator registry, we effectively have a
mini-basicpy subdialect (already type-resolved), which is reasonably
structured.
- The existing Torch op interfaces are also removed -- now that we
track the overload name, we can losslessly find the original
operator.
- Instead of `ATenRecognizeKernelsPass`, we now have a
`ReduceOpVariantsPass` that keys off certain traits (and perhaps
eventually interfaces) to reduce variants of ops to a smaller set,
ideally operating on immutable tensors and using surrounding ops to
model the mutability/aliasing aspects.
- Note: `torch.ns.unqual.overload` ops allow both immutable and
mutable tensors (unlike the previous hard distinction in the common
case). This is a premonition for a future change that will introduce a
bona fide `!torch.tensor` type that will clean up a bunch of stuff.
- `TorchToLinalg` / `TorchToStd` supercede the existing
"ATen->TCF->TCP->Linalg" path.
- The new `torch_ods_gen.py` supercedes `torch_signature_ods_gen.py`.
It should look somewhat familiar, but the benefit of hindsight has
allowed a lot of simplifications.
The overall trend seems to be to make the `torch` dialect a nice layer
independent of anything else. It feels like as a natural result of
various future changes we will be removing the reliance on basicpy+numpy
dialects and have a nice self-contained type system too that properly
models the TorchScript type system (including proper subtyping,
mutable/immutable tensors, optional dtype, etc.).
Recommended review order:
- Start at some of the new import IR, e.g. in
`frontends/pytorch/test/node_import/prim.py`,
`frontends/pytorch/test/acap_export/test_export_add3.py`, and other
tests.
- `frontends/pytorch/python/torch_mlir_utils/codegen/torch_ods_gen.py`
and associated generated files:
- `include/npcomp/Dialect/Torch/IR/GeneratedAtenOps.td`
- `include/npcomp/Dialect/Torch/IR/GeneratedPrimOps.td`
- Inspect `ReduceOpVariants.cpp` / `reduce-op-variants.mlir` and the new
traits in `include/npcomp/Dialect/Torch/IR/TorchTraits.h`
- Various code changes in the import path in
`frontends/pytorch/csrc/builder`. Probably most interesting is the new
code in `torch_to_mlir_utils.cpp` that has the logic to create the
`torch.operator` ops or `torch.ns.unqual.overload` ops.
This is the [new ResNet IR](https://gist.github.com/silvasean/5407aafb710d07612b7b5b92eabecebe),
just to be able to look at a substantial sample of IR in the new style.
2021-05-05 05:42:50 +08:00
|
|
|
// Reduce variants of ops to a smaller set of primitives.
|
2023-03-25 10:50:01 +08:00
|
|
|
pm.addNestedPass<func::FuncOp>(
|
|
|
|
createReduceOpVariantsPass(options.extraLibrary));
|
2022-08-05 02:39:21 +08:00
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
|
|
|
// Remove dead global slots.
|
|
|
|
pm.addPass(createSymbolDCEPass());
|
2022-03-10 08:44:22 +08:00
|
|
|
// Convert the bulk of non-ABI-visible !torch.tensor's to !torch.vtensor's.
|
2022-04-27 03:27:51 +08:00
|
|
|
pm.addNestedPass<func::FuncOp>(Torch::createMaximizeValueSemanticsPass());
|
2022-08-05 02:39:21 +08:00
|
|
|
// Update the return op to return value tensors.
|
2022-07-14 08:11:15 +08:00
|
|
|
pm.addPass(Torch::createRefinePublicReturnPass());
|
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
2024-05-10 02:44:36 +08:00
|
|
|
if (options.shapeDtypeRefine) {
|
|
|
|
// Do shape and dtype refinement.
|
|
|
|
// Shape refinement should be run before dtype refinement because Torch type
|
|
|
|
// promotion rules actually depend on the shape of the operand.
|
|
|
|
createTorchShapeRefinementPipeline(pm, options);
|
|
|
|
createTorchDtypeRefinementPipeline(pm, options);
|
|
|
|
}
|
2021-04-30 06:13:21 +08:00
|
|
|
// Propagate to ABI return types the shape/dtype information discovered by
|
|
|
|
// the previous pass. Doing this is ABI-compatible for our backends.
|
Introduce `!torch.tensor` / `!torch.vtensor` types.
This removes our reliance on the numpy dialect and avoids our off-label
use of the builtin tnesor type for modeling unknown dtypes. The
`!torch.vtensor` (`ValueTensorType`) type is a value-semantic tensor.
The `!torch.tensor` (`NonValueTensorType`) type is a non-value-semantic
tensor. The new types look as follows syntactically:
```
// Least-static-information, non-value-semantic tensor.
!torch.tensor
// Explicit form of least-static-information variant.
!torch.tensor<*,unk>
// Least-static-information, value-semantic tensor.
!torch.vtensor
// Explicit form of least-static-information variant.
!torch.vtensor<*,unk>
// Fixed-set of allowable element types, with first-class support for
// Torch's frontend signedness semantics.
!torch.tensor<*,si32>
// First-class support for unknown dtypes.
!torch.tensor<[?,?,?],unk>
// Standard MLIR representation of `?` for unknown dimensions.
!torch.tensor<[?,2,?,4],unk>
// Statically shaped / dtyped example.
!torch.vtensor<[1,2,3,4],f32>
```
This required fairly significant changes throughout the compiler, but
overall it is a big cleanup. We now have a much clearer layering of "the
Torch frontend lowering" vs "lowering to std + linalg + etc.".
At the C++ level, there is `ValueTensorType`, `NonValueTensorType`.
We also have a helper `BaseTensorType` (kind of like ShapedType) which
interoperates with those two.
Included changes:
- New `torch.tensor(dense<0.0> : tensor<5xf32>) : !torch.tensor` op for
creating torch tensor literals in the frontend.
- Consistently use signedness for the types (except i1 which I didn't
touch -- we need to sort out the situation with !basicpy.BoolType
there anyway so will be attending to that soon)
- Frontend can annotate whether an argument to the function has value
semantics. We currently require this, as our backend contract does not
currently allow us to even model the non-value-semantic case. Before,
the value-semantic assumption was randomly injected in the middle of
the pass pipeline.
- Move ArrayToTensor (now called MaximizeValueSemantics) and
RefinePublicReturn passes to torch dialect.
- The TorchToStd and TorchToLinalg passes are now type conversions from
`!torch.vtensor` to `tensor` and use the dialect conversion infra.
The overall conversion pipeline is set up following the best practices
of the "Type Conversions the Not-So-Hard Way" talk. This required
introducing `torch-func-builtin-tensorize` and
`torch-finalizing-builtin-tensorize` passes analogous to the upstream
bufferization passes with the corresponding names (mostly just
copypasta from there).
- Misc Torch-level canonicalizations -- we now cleanly layer the
lowering to std later in the pipeline, so we are gradually lessening
our reliance on random std constant folding before we get to that
point.
Recommended review order:
- New types in TorchTypes.td/TorchTypes.h/TorchDialect.cpp
- New ops in TorchOps.td / TorchOps.cpp
- Less important / more mechanical stuff
- Frontend changes.
- Pass changes/additions in `Torch/Transforms` and `Conversion/`
2021-05-21 08:07:18 +08:00
|
|
|
pm.addPass(Torch::createRefinePublicReturnPass());
|
2022-08-05 02:39:21 +08:00
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
2022-07-01 13:02:31 +08:00
|
|
|
if (options.decompose) {
|
2022-08-19 08:01:54 +08:00
|
|
|
pm.addNestedPass<func::FuncOp>(
|
|
|
|
Torch::createDecomposeComplexOpsPass(options.backendLegalOps));
|
2022-07-01 13:02:31 +08:00
|
|
|
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
|
|
|
|
}
|
2021-04-22 06:07:15 +08:00
|
|
|
}
|
2022-03-10 08:44:22 +08:00
|
|
|
|
2022-12-14 00:25:41 +08:00
|
|
|
static void createRefinementPipeline(
|
|
|
|
mlir::OpPassManager &pm,
|
2023-03-25 10:50:01 +08:00
|
|
|
llvm::function_ref<
|
|
|
|
std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>(llvm::StringRef)>
|
2022-12-14 00:25:41 +08:00
|
|
|
reifyCalculationsPass,
|
|
|
|
llvm::function_ref<
|
|
|
|
std::unique_ptr<mlir::OperationPass<mlir::func::FuncOp>>()>
|
2023-03-25 10:50:01 +08:00
|
|
|
simplifyCalculationsPass,
|
|
|
|
const mlir::torch::Torch::TorchLoweringPipelineOptions &options) {
|
2022-12-14 00:25:41 +08:00
|
|
|
// Reify the library functions for each op that is present in the library.
|
2023-03-25 10:50:01 +08:00
|
|
|
pm.addPass(reifyCalculationsPass(options.extraLibrary));
|
2022-03-10 08:44:22 +08:00
|
|
|
|
2022-12-14 00:25:41 +08:00
|
|
|
// Inline the library functions to enable analysis and transformation.
|
|
|
|
// TODO: Only inline library functions (this will currently inline
|
|
|
|
// everything).
|
|
|
|
pm.addPass(mlir::createInlinerPass());
|
2022-03-10 08:44:22 +08:00
|
|
|
|
2022-12-14 00:25:41 +08:00
|
|
|
// Now, try to simplify calculations. This is unfortunately a "optimize
|
2022-03-10 08:44:22 +08:00
|
|
|
// as hard as possible" kind of thing, so it's inherently somewhat brittle.
|
2022-12-14 00:25:41 +08:00
|
|
|
// The idea is to keep strengthening what we do here to support the
|
|
|
|
// library functions. We don't need to support arbitrary programs, thankfully.
|
|
|
|
pm.addNestedPass<mlir::func::FuncOp>(simplifyCalculationsPass());
|
2022-03-10 08:44:22 +08:00
|
|
|
// Run CSE, then see if we can simplify further.
|
2022-12-14 00:25:41 +08:00
|
|
|
pm.addNestedPass<mlir::func::FuncOp>(mlir::createCSEPass());
|
|
|
|
pm.addNestedPass<mlir::func::FuncOp>(simplifyCalculationsPass());
|
|
|
|
|
|
|
|
// Drop calculations, leaving behind the-refined program.
|
|
|
|
pm.addNestedPass<mlir::func::FuncOp>(
|
|
|
|
mlir::torch::Torch::createDropAbstractInterpCalculationsPass());
|
|
|
|
}
|
|
|
|
|
2023-03-25 10:50:01 +08:00
|
|
|
void mlir::torch::Torch::createTorchShapeRefinementPipeline(
|
|
|
|
OpPassManager &pm, const TorchLoweringPipelineOptions &options) {
|
2022-12-14 00:25:41 +08:00
|
|
|
createRefinementPipeline(pm, Torch::createReifyShapeCalculationsPass,
|
2023-03-25 10:50:01 +08:00
|
|
|
Torch::createSimplifyShapeCalculationsPass, options);
|
2022-12-14 00:25:41 +08:00
|
|
|
}
|
2022-03-10 08:44:22 +08:00
|
|
|
|
2023-03-25 10:50:01 +08:00
|
|
|
void mlir::torch::Torch::createTorchDtypeRefinementPipeline(
|
|
|
|
OpPassManager &pm, const TorchLoweringPipelineOptions &options) {
|
2022-12-14 00:25:41 +08:00
|
|
|
createRefinementPipeline(pm, Torch::createReifyDtypeCalculationsPass,
|
2023-03-25 10:50:01 +08:00
|
|
|
Torch::createSimplifyDtypeCalculationsPass, options);
|
2022-03-10 08:44:22 +08:00
|
|
|
}
|