torch-mlir/include/npcomp/Dialect/Torch/Transforms/Passes.td

//===-- Passes.td - Pass definition file -------------------*- tablegen -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef NPCOMP_TORCH_PASSES
#define NPCOMP_TORCH_PASSES

include "mlir/Pass/PassBase.td"

def GlobalizeObjectGraph : Pass<"torch-globalize-object-graph", "ModuleOp"> {
  let summary = "Converts TorchScript object graphs to a globalized form";
  let constructor = "mlir::NPCOMP::Torch::createGlobalizeObjectGraphPass()";
  let description = [{
    This pass converts a subset of possible TorchScript modules into a
    more restrictive lower-level form that strips away the need to be
    concerned with instances of !torch.nn.Module<...> type. Specifically,
    the object graph is flattened into a set of discrete globals
    (`torch.global_slot`) that hold the program state.

    The overarching goal is for a strict correspondence between the original
    `torch.nn.Module` (call it `root`) that the user `torch.jit.script`'ed, and
    the public interface of the resulting MLIR module. Specifically:
      - The call `root.encoder.forward(...)` in Python corresponds to invoking
        the `func @encoder.forward` on the resulting MLIR module.
      - The data member access `root.decoder.ids_to_strings_table` in Python
        corresponds to accessing the
        `torch.global_slot @decoder.ids_to_strings_table` on the resulting
        MLIR module.
    In effect, the entire MLIR module corresponds to an instance of the `root`
    object. This matches with the intuitive behavior desired for deployment:
    When the MLIR module (or, more likely, a compiled artifact derived from it)
    is loaded in a deployed environment, it is equivalent to recreating the
    original `root` object.

    This pass performs a complete change of the externally visible calling
    convention of the MLIR module for a graph of objects and methods to a
    fixed set of globals and functions. Additionally, method signatures are
    changed such that all types of !torch.nn.Module are deleted from public
    interfaces since they are guaranteed to correspond to a unique instance and
    are thus redundant.

    Of course, only a subset of programs can be transformed, and this pass fails
    with an error if the conditions are violated.

    Specifically, the restrictions are:
    - There must be a unique torch.nn_module that is not the value of a slot
      of any other torch.nn_module
      - Rationale: Allows us to have a notion of a unique "root" op, which is
        used to define linkage. This also matches how TorchScript imports in
        practice (`torch.jit.script` imports a single root object).
    - Multiple instances of the same class type are allowed, as long as it is
      possible to monomorphize ("template instantiate") functions so that each
      argument of !torch.nn.Module type corresponds to a unique instance.
      In pratice, this limitation is either 1) (fundamental) due to truly
      dynamic use of modules, such as `m1 if cond() else m2` in Python code,
      or 2) (incidental) imprecision of the static analysis used in this pass
      which is used to calculate when a single intance is relevant. In general,
      this analysis is equivalent to the halting problem, but we can aim to
      improve this pass such that practical patterns are all handled.
      - Rationale: The fundamental limitation "1)" guarantees that the
        program can be lowered to a fixed set of globals without indirection
        across globals. In the absence of this property, most compiler
        analyses/transformations are significantly curtailed (or require very
        sophisticated implementations). For the moment, this restriction
        is deemed to be sufficiently reasonable to be a pragmatic choice to
        avoid front-loading the complexity of working with a representation that
        really does a good job of representing that kind of program.
        Additionally, it avoids front-loading the handling of programs which
        have !torch.nn.Module types at external calling convention boundaries.
    - All torch.nn_module's must be reachable by a unique path from the root
      - Rationale: Eliminates possibility of potentially exponential number of
        paths. Or worse, infinite number of paths when considering cyclic
        object graphs. Also as of Feb 2021, TorchScript won't import into
        this form (it has a bug related to the identity of submodules).
    - Two slots cannot have initial values that alias each other.
      - Rationale: This makes the representation of initial values simpler. Also
        as of Feb 2021, TorchScript won't import into this form except
        potentially for Tensors (it has a bug related to the identity of
        objects). And for tensors, the npcomp IValue importer only supports a
        very restricted form of aliasing anyway for other reasons. We are
        waiting for signals that more general handling of object aliasing is
        important to devote the effort to it.
  }];
}

def PrepareForGlobalizeObjectGraph
  : Pass<"torch-prepare-for-globalize-object-graph", "ModuleOp"> {
  let summary = "Lowering in preparation for globalizing";
  let constructor = "mlir::NPCOMP::Torch::createPrepareForGlobalizeObjectGraphPass()";
  let description = [{
    Establishes and the invariants needed by the
    torch-globalize-object-graph transformation. Fails if that cannot be
    accomplished.

    Currently, this just involves ensuring a small set of patterns have been
    applied.
  }];
}

def AdjustCallingConventions
  : Pass<"torch-adjust-calling-conventions", "ModuleOp"> {
  let summary = "Adjust the calling conventions of functions";
  let constructor = "mlir::NPCOMP::Torch::createAdjustCallingConventionsPass()";
  let description = [{
    Adjusts the calling conventions of functions in the module, with the aim of
    preparing them for backends and further lowering passes. As this changes
    the module calling convention, it should be considered a legalization
    step towards reaching IR that is suitable for an appropriate backend.
    All transformations are context-free and suitable for documenting
    at the user level if needed to clarify the eventual calling convention
    of compiled artifacts.
    This is not an optimization.

    The transformations performed are:
    - `torch.type_bound` annotations are incorporated into the type of the
      function arguments, which should be `!numpy.ndarray<...>`'s.
    - Python-isms are rewritten to MLIR-isms
      - NoneType return is rewritten to the absence of a return value.
      - (Not implemented yet) Tuple return is rewritten to multiple return
        values
  }];
}

#endif // NPCOMP_TORCH_PASSES
Implement GlobalizeObjectGraph transformation. This required restructuring of how we model TorchScript on import. The main difference is that now we split out a `torch.class_type` that holds methods and declarations of the types of each slot. This is more consistent with TorchScript (our previous representation was "denormalized"). Recommended reading order: 1. check out the description of `torch.class_type` in `TorchOps.td` and look at `test/Dialect/Torch/ops.mlir` and `frontends/pytorch/test/module_import/` to familiarize with the new representation. - Just look at the new IR. The diff between the old names and new names is confusing. 2. check out `test/Dialect/Torch/globalize-object-graph*.mlir` and read along with the pass description in `include/npcomp/Dialect/Torch/Transforms/Passes.td` 3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes in `ivalue_importer.cpp`, `TorchOps.cpp`, etc. 2021-02-18 03:28:51 +08:00			`//===-- Passes.td - Pass definition file -------------------- tablegen --===//`
			`//`
			`// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.`
			`// See https://llvm.org/LICENSE.txt for license information.`
			`// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception`
			`//`
			`//===----------------------------------------------------------------------===//`

			`#ifndef NPCOMP_TORCH_PASSES`
			`#define NPCOMP_TORCH_PASSES`

			`include "mlir/Pass/PassBase.td"`

			`def GlobalizeObjectGraph : Pass<"torch-globalize-object-graph", "ModuleOp"> {`
			`let summary = "Converts TorchScript object graphs to a globalized form";`
			`let constructor = "mlir::NPCOMP::Torch::createGlobalizeObjectGraphPass()";`
			`let description = [{`
			`This pass converts a subset of possible TorchScript modules into a`
			`more restrictive lower-level form that strips away the need to be`
			`concerned with instances of !torch.nn.Module<...> type. Specifically,`
			`the object graph is flattened into a set of discrete globals`
			(`torch.global_slot`) that hold the program state.

			`The overarching goal is for a strict correspondence between the original`
			`torch.nn.Module` (call it `root`) that the user `torch.jit.script`'ed, and
			`the public interface of the resulting MLIR module. Specifically:`
			- The call `root.encoder.forward(...)` in Python corresponds to invoking
			the `func @encoder.forward` on the resulting MLIR module.
			- The data member access `root.decoder.ids_to_strings_table` in Python
			`corresponds to accessing the`
			`torch.global_slot @decoder.ids_to_strings_table` on the resulting
			`MLIR module.`
			In effect, the entire MLIR module corresponds to an instance of the `root`
			`object. This matches with the intuitive behavior desired for deployment:`
			`When the MLIR module (or, more likely, a compiled artifact derived from it)`
			`is loaded in a deployed environment, it is equivalent to recreating the`
			original `root` object.

			`This pass performs a complete change of the externally visible calling`
			`convention of the MLIR module for a graph of objects and methods to a`
Support multiple instances of a class in GlobalizeObjectGraph. This happens in practice with e.g. ResNet from torchvision (multiple instances of the same BatchNorm class). The key observation is that for this program, and the expected set of programs, we can convert the program to the same globalized form with a bit more static analysis and effort to suitably monomorphize the program. Though what we are doing here is fairly annoying to implement, it saves any nontrivial later pass from having to do similar analyses (or worse). E.g. shape inference would need to be object-graph aware, mutation/lifetime analyses would have to be aware, etc. Additionally, it would make us front-load what it means to have a !torch.nn.Module type on an ABI boundary, which we are just not ready to handle. I'm really, really hoping that in practice we can get away with this, otherwise it's going to be really rough designing a representation (and implementing everything to back it) that is convenient to transform and gracefully scales from full object graph (in the most dynamic case) down to a fixed set of global slots like we have here (in the most static case, which we presume a lot of practical programs fall into). This also involved introducing a `torch-prepare-for-globalize-object-graph` pass that does a minimal set of lowerings to simplify the IR into a more orthogonal and analyzable form, and a `torch-globalize-pipeline` helper. Recommended review order: - updated documentation in Passes.td - new tests in `globalize-object-graph-multiple-instances*.mlir` - implementation of GlobalizeObjectGraph.cpp - PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir - misc stuff like torch-globalize-pipeline pipeline definition. With this, we can import, globalize, and inline resnet18 from torchvision: https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8 2021-03-10 12:33:21 +08:00			`fixed set of globals and functions. Additionally, method signatures are`
			`changed such that all types of !torch.nn.Module are deleted from public`
			`interfaces since they are guaranteed to correspond to a unique instance and`
			`are thus redundant.`
Implement GlobalizeObjectGraph transformation. This required restructuring of how we model TorchScript on import. The main difference is that now we split out a `torch.class_type` that holds methods and declarations of the types of each slot. This is more consistent with TorchScript (our previous representation was "denormalized"). Recommended reading order: 1. check out the description of `torch.class_type` in `TorchOps.td` and look at `test/Dialect/Torch/ops.mlir` and `frontends/pytorch/test/module_import/` to familiarize with the new representation. - Just look at the new IR. The diff between the old names and new names is confusing. 2. check out `test/Dialect/Torch/globalize-object-graph*.mlir` and read along with the pass description in `include/npcomp/Dialect/Torch/Transforms/Passes.td` 3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes in `ivalue_importer.cpp`, `TorchOps.cpp`, etc. 2021-02-18 03:28:51 +08:00
			`Of course, only a subset of programs can be transformed, and this pass fails`
			`with an error if the conditions are violated.`

			`Specifically, the restrictions are:`
			`- There must be a unique torch.nn_module that is not the value of a slot`
			`of any other torch.nn_module`
			`- Rationale: Allows us to have a notion of a unique "root" op, which is`
			`used to define linkage. This also matches how TorchScript imports in`
			practice (`torch.jit.script` imports a single root object).
Support multiple instances of a class in GlobalizeObjectGraph. This happens in practice with e.g. ResNet from torchvision (multiple instances of the same BatchNorm class). The key observation is that for this program, and the expected set of programs, we can convert the program to the same globalized form with a bit more static analysis and effort to suitably monomorphize the program. Though what we are doing here is fairly annoying to implement, it saves any nontrivial later pass from having to do similar analyses (or worse). E.g. shape inference would need to be object-graph aware, mutation/lifetime analyses would have to be aware, etc. Additionally, it would make us front-load what it means to have a !torch.nn.Module type on an ABI boundary, which we are just not ready to handle. I'm really, really hoping that in practice we can get away with this, otherwise it's going to be really rough designing a representation (and implementing everything to back it) that is convenient to transform and gracefully scales from full object graph (in the most dynamic case) down to a fixed set of global slots like we have here (in the most static case, which we presume a lot of practical programs fall into). This also involved introducing a `torch-prepare-for-globalize-object-graph` pass that does a minimal set of lowerings to simplify the IR into a more orthogonal and analyzable form, and a `torch-globalize-pipeline` helper. Recommended review order: - updated documentation in Passes.td - new tests in `globalize-object-graph-multiple-instances*.mlir` - implementation of GlobalizeObjectGraph.cpp - PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir - misc stuff like torch-globalize-pipeline pipeline definition. With this, we can import, globalize, and inline resnet18 from torchvision: https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8 2021-03-10 12:33:21 +08:00			`- Multiple instances of the same class type are allowed, as long as it is`
			`possible to monomorphize ("template instantiate") functions so that each`
			`argument of !torch.nn.Module type corresponds to a unique instance.`
			`In pratice, this limitation is either 1) (fundamental) due to truly`
			dynamic use of modules, such as `m1 if cond() else m2` in Python code,
			`or 2) (incidental) imprecision of the static analysis used in this pass`
			`which is used to calculate when a single intance is relevant. In general,`
			`this analysis is equivalent to the halting problem, but we can aim to`
			`improve this pass such that practical patterns are all handled.`
			`- Rationale: The fundamental limitation "1)" guarantees that the`
			`program can be lowered to a fixed set of globals without indirection`
			`across globals. In the absence of this property, most compiler`
			`analyses/transformations are significantly curtailed (or require very`
			`sophisticated implementations). For the moment, this restriction`
			`is deemed to be sufficiently reasonable to be a pragmatic choice to`
			`avoid front-loading the complexity of working with a representation that`
			`really does a good job of representing that kind of program.`
			`Additionally, it avoids front-loading the handling of programs which`
			`have !torch.nn.Module types at external calling convention boundaries.`
Implement GlobalizeObjectGraph transformation. This required restructuring of how we model TorchScript on import. The main difference is that now we split out a `torch.class_type` that holds methods and declarations of the types of each slot. This is more consistent with TorchScript (our previous representation was "denormalized"). Recommended reading order: 1. check out the description of `torch.class_type` in `TorchOps.td` and look at `test/Dialect/Torch/ops.mlir` and `frontends/pytorch/test/module_import/` to familiarize with the new representation. - Just look at the new IR. The diff between the old names and new names is confusing. 2. check out `test/Dialect/Torch/globalize-object-graph*.mlir` and read along with the pass description in `include/npcomp/Dialect/Torch/Transforms/Passes.td` 3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes in `ivalue_importer.cpp`, `TorchOps.cpp`, etc. 2021-02-18 03:28:51 +08:00			`- All torch.nn_module's must be reachable by a unique path from the root`
			`- Rationale: Eliminates possibility of potentially exponential number of`
			`paths. Or worse, infinite number of paths when considering cyclic`
			`object graphs. Also as of Feb 2021, TorchScript won't import into`
			`this form (it has a bug related to the identity of submodules).`
Give torch.global_slot an initializer region. This is a much simpler representation than the ad-hoc initializer function we had before. It is also less general, but given the rationale in Passes.td it seems like the right tradeoff right now. We can probably carry this representation for quite a while, and when we can't, it likely means that TorchScript has fixed their object identity bug and we probably need to just upgrade to a more general object graph modeling (more general than GlobalizeObjectGraph). In particular, we don't want to deal with defining and carrying around this initializer function concept until we need it. For example, if we want to constant-fold the global slots into uses, this is a much better representation, and it plays better with symbol-dce (the initializer function counts as a "use" of the symbol). (the alternative would have been to write a pass that converts the initializer function to this form when possible, but I realized that lots of information had been lost which made that fairly annoying -- it was all self-inflicted anyway, so best to just go to the source (GlobalizeObjectGraph) before the information is lost) Now symbol-dce works nicely (no more "training" bools) ``` pt_util ~/tmp/classifier.pt --import --exported-name forward \ \| npcomp-opt -torch-globalize-object-graph -inline -symbol-dce ``` IR: https://gist.github.com/silvasean/8abe63d70d24e29d6db9170ccc8d512b 2021-02-26 07:54:51 +08:00			`- Two slots cannot have initial values that alias each other.`
			`- Rationale: This makes the representation of initial values simpler. Also`
			`as of Feb 2021, TorchScript won't import into this form except`
			`potentially for Tensors (it has a bug related to the identity of`
			`objects). And for tensors, the npcomp IValue importer only supports a`
			`very restricted form of aliasing anyway for other reasons. We are`
			`waiting for signals that more general handling of object aliasing is`
			`important to devote the effort to it.`
Implement GlobalizeObjectGraph transformation. This required restructuring of how we model TorchScript on import. The main difference is that now we split out a `torch.class_type` that holds methods and declarations of the types of each slot. This is more consistent with TorchScript (our previous representation was "denormalized"). Recommended reading order: 1. check out the description of `torch.class_type` in `TorchOps.td` and look at `test/Dialect/Torch/ops.mlir` and `frontends/pytorch/test/module_import/` to familiarize with the new representation. - Just look at the new IR. The diff between the old names and new names is confusing. 2. check out `test/Dialect/Torch/globalize-object-graph*.mlir` and read along with the pass description in `include/npcomp/Dialect/Torch/Transforms/Passes.td` 3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes in `ivalue_importer.cpp`, `TorchOps.cpp`, etc. 2021-02-18 03:28:51 +08:00			`}];`
			`}`

Support multiple instances of a class in GlobalizeObjectGraph. This happens in practice with e.g. ResNet from torchvision (multiple instances of the same BatchNorm class). The key observation is that for this program, and the expected set of programs, we can convert the program to the same globalized form with a bit more static analysis and effort to suitably monomorphize the program. Though what we are doing here is fairly annoying to implement, it saves any nontrivial later pass from having to do similar analyses (or worse). E.g. shape inference would need to be object-graph aware, mutation/lifetime analyses would have to be aware, etc. Additionally, it would make us front-load what it means to have a !torch.nn.Module type on an ABI boundary, which we are just not ready to handle. I'm really, really hoping that in practice we can get away with this, otherwise it's going to be really rough designing a representation (and implementing everything to back it) that is convenient to transform and gracefully scales from full object graph (in the most dynamic case) down to a fixed set of global slots like we have here (in the most static case, which we presume a lot of practical programs fall into). This also involved introducing a `torch-prepare-for-globalize-object-graph` pass that does a minimal set of lowerings to simplify the IR into a more orthogonal and analyzable form, and a `torch-globalize-pipeline` helper. Recommended review order: - updated documentation in Passes.td - new tests in `globalize-object-graph-multiple-instances*.mlir` - implementation of GlobalizeObjectGraph.cpp - PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir - misc stuff like torch-globalize-pipeline pipeline definition. With this, we can import, globalize, and inline resnet18 from torchvision: https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8 2021-03-10 12:33:21 +08:00			`def PrepareForGlobalizeObjectGraph`
			`: Pass<"torch-prepare-for-globalize-object-graph", "ModuleOp"> {`
			`let summary = "Lowering in preparation for globalizing";`
			`let constructor = "mlir::NPCOMP::Torch::createPrepareForGlobalizeObjectGraphPass()";`
			`let description = [{`
			`Establishes and the invariants needed by the`
			`torch-globalize-object-graph transformation. Fails if that cannot be`
			`accomplished.`

			`Currently, this just involves ensuring a small set of patterns have been`
			`applied.`
			`}];`
			`}`

Add torch-adjust-calling-conventions pass. This pass incorporates torch.type_bound info and also removes NoneType returns (eventually it will rewrite tuple types too, but can't yet because !basicpy.TupleType doesn't track element types). Recommend looking at adjust-calling-conventions.mlir first to see what it is doing, and holding your nose for the implementation of the pass. I decided to implement this with the conversion framework, because it gives us some goodies for type conversion -- mainly avoiding large amounts of tricky RAUW dances. Unfortunately, the conversion framework isn't a perfect fit for a couple reasons: - the incorporation of torch.type_bound is a context-sensitive rewrite (requires looking at the arg attr, not just the type). - NoneType conversion is 1->0, which requires some special handling - (not implemented yet) 1->N tuple type conversions require special handling. It's a little bit scary, but on balance doing it the other way would have its own downsides. 2021-04-02 08:36:18 +08:00			`def AdjustCallingConventions`
			`: Pass<"torch-adjust-calling-conventions", "ModuleOp"> {`
			`let summary = "Adjust the calling conventions of functions";`
			`let constructor = "mlir::NPCOMP::Torch::createAdjustCallingConventionsPass()";`
			`let description = [{`
			`Adjusts the calling conventions of functions in the module, with the aim of`
			`preparing them for backends and further lowering passes. As this changes`
			`the module calling convention, it should be considered a legalization`
			`step towards reaching IR that is suitable for an appropriate backend.`
			`All transformations are context-free and suitable for documenting`
			`at the user level if needed to clarify the eventual calling convention`
			`of compiled artifacts.`
			`This is not an optimization.`

			`The transformations performed are:`
			- `torch.type_bound` annotations are incorporated into the type of the
			function arguments, which should be `!numpy.ndarray<...>`'s.
			`- Python-isms are rewritten to MLIR-isms`
			`- NoneType return is rewritten to the absence of a return value.`
			`- (Not implemented yet) Tuple return is rewritten to multiple return`
			`values`
			`}];`
			`}`

Implement GlobalizeObjectGraph transformation. This required restructuring of how we model TorchScript on import. The main difference is that now we split out a `torch.class_type` that holds methods and declarations of the types of each slot. This is more consistent with TorchScript (our previous representation was "denormalized"). Recommended reading order: 1. check out the description of `torch.class_type` in `TorchOps.td` and look at `test/Dialect/Torch/ops.mlir` and `frontends/pytorch/test/module_import/` to familiarize with the new representation. - Just look at the new IR. The diff between the old names and new names is confusing. 2. check out `test/Dialect/Torch/globalize-object-graph*.mlir` and read along with the pass description in `include/npcomp/Dialect/Torch/Transforms/Passes.td` 3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes in `ivalue_importer.cpp`, `TorchOps.cpp`, etc. 2021-02-18 03:28:51 +08:00			`#endif // NPCOMP_TORCH_PASSES`