torch-mlir

Commit Graph

Author	SHA1	Message	Date
Yi Zhang	d6b9709fa5	Changes to refine types - Add `!torch.optional` knowledge tracking - Changes to improve type propagation for branches and terminators. See examples in `refine-types-branch.mlir` - Refator to separate handling of different ops from `visitOperation` - Add refine types for a few new ops	2021-08-27 11:42:00 -04:00
Yi Zhang	bc5eae41ca	Add more folders to fold away branches Added folders to a few binary computing ops, `TupleUnpack`, `__contains__.str` and `__getitem__.Dict_str`.	2021-08-26 17:37:49 -04:00
Sean Silva	cab8d922ec	Add TorchToIREE and factor out TorchConversion dialect. This converts a basic list op (torch.prim.ListConstruct) to the IREE dialect. ``` def forward(self, x: float): return [x, x] ``` turns into: ``` builtin.func @forward(%arg0: !torch.float) -> !torch.list<!torch.float> { %0 = torch.prim.ListConstruct %arg0, %arg0 : (!torch.float, !torch.float) -> !torch.list<!torch.float> return %0 : !torch.list<!torch.float> } ``` which turns into: ``` builtin.func @forward(%arg0: f64) -> !iree.list<f64> { %c1 = constant 1 : index %c0 = constant 0 : index %c2 = constant 2 : index %0 = iree.list.create %c2 : !iree.list<f64> iree.list.set %0[%c0], %arg0 : !iree.list<f64>, f64 iree.list.set %0[%c1], %arg0 : !iree.list<f64>, f64 return %0 : !iree.list<f64> } ``` As part of doing this, I realized that it was time to formalize the IR form that we reach right before running TorchTo{Linalg,Std,...}. We now call it the "Torch backend contract". We then lower the "Torch backend contract" to the "npcomp backend contract", which involves the new TorchConversion (`torch_c`) dialect, which holds ops that need to operate on both the npcomp backend types (e.g. builtin tensors, i1, IREE list, etc.) and the `!torch` types. This made more sense, as I realized that if I didn't factor out `torch_c` then the Torch dialect would have a dependency on IREE dialect (we previously didn't notice this was an issue because we only depended on `builtin` types), which seemed wrong to me. Recommended review order: - TorchToIREE.cpp / `TorchToIREE/basic.mlir` - Look at the new structure of createTorchScriptToNpcompBackendPipeline. It now lives in TorchConversion/Transforms/Passes.cpp and cleanly calls into `Torch::createTorchScriptToTorchBackendPipeline` for the frontend lowering to the Torch backend contract. - Mechanical change extracting `torch_c.{to,from}_{i1,i64,f64,builtin_tensor,iree_list}` into a new TorchConversion dialect, and a few passes specific to the lowering from the Torch backend contract to the npcomp backend contract. - Minor fixes to TorchToLinalg.cpp to use unconverted operands (now that we convert lists as part of operand materialization, we need to use the original operands). Also added test for AtenMaxPool2dOp and fixed m_TorchConstantIntList. - TmpDeleteDeadIREELists pass. Temporary pass for deleting dead IREE lists that are created as part of operand materialization for conv/max pool/avg pool ops in TorchToLinalg.	2021-08-16 15:01:58 -07:00
Yi Zhang	85ff8b692b	Fix compilation errors from MT model With the following changes the compilation can continue until RefineTypes pass: - Add operators without ODS into `torch_ods_gen.py` - Add some new optional and list types in `TorchTypes.td` - Add some folders for aten int type comparator ops - Modify GlobalizeObjectGraph.cpp. For global slots that's not used, dont check if an aliased value is stored in more than one of global slots. This can work around a failure where the same tensor is stored in multiple "version" slots which are not used.	2021-08-16 16:37:23 -04:00
Yi Zhang	0342b73bf1	Add torch.aten.flatten.using_ints and aten.MaxPool2d linalg lowering - torch.aten.flatten.using_ints to linalg lowering - torch.aten.max_pool2d to linalg lowering - Support torch.aten.conv2d for more flexible dilation and strides values	2021-08-04 12:00:43 -04:00
Sean Silva	f168cacd6d	Remove TCF and TCP. These were legacy concepts that are now superceded by direct Torch to linalg-on-tensors lowering. These were based on some very early thinking related to the layering of frontends vs codegen, which is now obsolete because: - We expected a lot more centralization at the frontend (TCF) level. It turns out that frontend needs really vary a lot, and there is no grand unifying TCF dialect plausible. The additional layer isn't worth it. - Linalg-on-tensors obsoletes the primary need for TCP. There are still a few things not representable with linalg-on-tensors, but the support is growing and the whole "not included in linalg-on-tensors" direction needs to be rethought. Our TCP dialect didn't cover any of the actually important things in this space (such as sort, FFT, top-k, etc.). See historical [slides](https://drive.google.com/file/d/1iljcpTQ5NPaMfGpoPDFml1XkYxjK_6A4/view) / [recording](https://drive.google.com/file/d/1jSPa8TwPKUt0WuLquGc8OgSUVYJHMvWZ/view) for more details on the origin story here. Their presence was confusing users too [bug](https://github.com/llvm/mlir-npcomp/issues/248). Also, - Trim down npcomp-run-mlir testing. It was testing TCF to TCP lowering for the most part. The essential stuff is retained and rephrased with linalg-on-tensors. (we should probably rename it "refback-run" or something, as it is just a way to invoke RefBackend) - test/Python/Backend/RefJIT/simple_invoke_numpy.py is XFAIL'ed. Our "anti-framework" direction seems to be the likely future path.	2021-08-02 12:08:39 -07:00
Stella Laurenzo	cd44a35177	Bump llvm-project to 5b2e7f50a6798fd9b9c79d9d62fdebcd9e78525b. (#260 )	2021-07-29 12:26:54 -07:00
Sean Silva	83b5b5456d	Bump llvm-project to da289a174fc6617c7be37be2947480510fd4f02a - Build adjustments for `.cpp.inc` dialect files. - Renaming of `memref.dim` to `tensor.dim` for tensor case. Minor changes: - Renaming of `mlir::linalg::ReassociationIndices` to `mlir::ReassociationIndices`. - Adjust command line option parsing in npcomp-run-mlir.	2021-07-07 13:57:29 -07:00
Sean Silva	79928cd2dd	Generalize support for elementwise ops. We plumb through e2e a fair number of interesting cases: - unary, binary, ternary elementwise ops - ops like `torch.aten.add.Tensor` that also take a scalar parameter - static size-1 broadcasting We allow the static size-1 broadcasting case, but emit a runtime error in the case of dynamic size-1 broadcasting. This seems like a sweet spot subset of things that can be lowered directly to linalg, while not being overly constraining to users. This is consistent with what IREE is doing for CHLO->Linalg lowering as well ([code](`50bf7a87e4/iree/compiler/InputConversion/MHLO/BroadcastingToLinalgPatterns.cpp (L1)`)). To test the static size-1 case, we added support for the `torch.aten.unsqueeze` op and lowering for it through `linalg.tensor_expand_shape`. This involved a generalization of `MaximizeValueSemantics` able to handle it (the solution there also works for `torch.aten.flatten.using_ints` which we need for ResNet anyway) Also, a few minor additional changes: - Add `VerifyInvariantsBeforeBackendLowering` pass, which catches a large class of errors before we get to backend lowering (now that we are doing dialect conversion, the errors are way nicer if we just emit them up front rather than in the guts of a random pattern). - Minor change to RefBackend to allow `linalg.tensor_expand_shape`. Recommended review order: - e2e tests in elementwise.py - `ConvertElementwiseOp` in TorchToLinalg.cpp + elementwise.mlir test - `ConvertAtenUnsqueezeOp` in TorchToLinalg.cpp + unsqueeze.mlir test - RefineTypes.cpp + tests - MaximizeValueSemantics changes + test - VerifyInvariantsBeforeBackendLowering pass + test	2021-06-28 13:28:38 -07:00
Sean Silva	145d4ae23c	Bump llvm-project to a37cf17834d39411ed1d669098b428f8374c5b45 Changes: - Change to operand ordering of `linalg.fill`.	2021-06-23 10:03:29 -07:00
Sean Silva	90c6c64fd6	Make torch.constant.float print a little nicer. This printing is chosen to be similar to how MLIR prints the values by default.	2021-06-23 08:07:45 -07:00
Sean Silva	60a947b4a7	Add CastOpInterface to torch.prim.unchecked_cast. This allows it to fold away in trivial cases.	2021-06-23 08:07:45 -07:00
Yi Zhang	5ad144c4fe	More folding for aten.gt.int, aten.ne.int and Aten__Getitem__TOp. - Fold more for aten.gt.int, aten.ne.int and Aten__Getitem__TOp - Some format cleaning up	2021-06-23 08:06:37 -07:00
Sean Silva	79aade33da	Make MaximizeValueSemantics a bit smarter. This adds a pattern to MaximizeValueSemantics which does a simple abstract interpretation within a block, which handles simple cases of `torch.overwrite_tensor`, enough to remove all the unnecessary uses of non-value tensors in ResNet right now. Before/after IR: [gist](https://gist.github.com/silvasean/a3e1ef625b19dfc63579f73cd3b543b6) Also, - Split `torch.copy.tensor` into `torch.copy.to_tensor` and `torch.copy.to_vtensor` which convert between value and non-value semantic tensors. This is a much cleaner factorization as they have very separate use cases and properties (e.g. different side effects) - Remove the various canonicalization patterns they had, which were confusing because they resulted in limited forms of maximizing value semantics throughout the pipeline. We should structure our compilation pipeline such that only MaximizeValueSemantics should be maximizing value semantics. - Adjust pass pipeline to only run MaximizeValueSemantics once. - Make OverwriteTensorOp `$value` always be a value tensor and `$overwritten` be a non-value tensor.	2021-06-22 16:48:57 -07:00
Sean Silva	78d2cc0818	Make `torch.copy.tensor` canonicalization a bit smarter. This removes most of the trivial cases that MaximizeValueSemantics needs to handle, making it easier to see the nontrivial cases.	2021-06-17 18:11:58 -07:00
Sean Silva	333e07a74e	Add `torch.vtensor.literal` op. This op is much better behaved than the `torch.tensor.literal` op (which is the new name of the `torch.tensor` op). In particular `torch.tensor.literal`: - always has a maximally refined type. - always has value semantics. - can be constant folded / CSE'd. ReduceOpVariants is changed to perform the transformation from `torch.tensor.literal` to `torch.vtensor.literal` (which in general involves static information casts and copies. This new op also allowed tightening up `torch.tensor.literal` to only accept NonValueTensorType (instead of any tensor type). This new ".literal" name is more descriptive. It was getting too confusing seeing an op called just `torch.tensor` (we originally called it that because that's the name of the similar function in the Torch Python API, but it just doesn't fit here).	2021-06-17 14:37:04 -07:00
Sean Silva	4a0eb44d17	Add a !torch.float type. This removes the dependence of the `torch` dialect on the low-level builtin types. Now the `torch` dialect is a standalone layer, suitable for targeting from higher-level Python abstractions without any premature lowering to primitive types.	2021-06-17 09:24:18 -07:00
Sean Silva	f49ebf1690	Add `!torch.int` type. This replaces the ad-hoc use of `i64` throughout the Torch layer, and helps to keep it crystal clear the distinction between `!torch.int` (which is modeling the Python `int` type) and the various types that serve as dtypes of tensors, which are a totally different type universe. Changes: - `!torch.int` type and C bindings. - Change `torch.constant.int` parser to not need the `: i64` at the end. - `m_TorchConstantInt` matcher to aid with matching constants. - BackendTypeConversion changes for `!torch.int` -> `i64` type conversion. - Refactor finalizing patterns in FinalizingBackendTypeConversionPass (they were getting very repetitive). - Mechanical rewriting of `!torch.int` to `i64` in all the tests, and `AnyTorchIntType` to `Torch_IntType` in the `.td` files.	2021-06-17 07:28:23 -07:00
Sean Silva	224afb186e	Add folders for torch.aten.gt.int / torch.aten.ne.int This fixes a "regression" on ResNet where we weren't folding away all the control flow. For now, our policy is to "optimize hard enough" to make that control flow go away, because we don't yet have a way to lower to the backend the stuff guarded by the control flow (RaiseException, string operations, etc.). It remains to be seen how much optimization we decide to do at this level in the fullness of time -- the torch op set is not particularly well-designed (at least not idiomatically for MLIR) for general optimization. Ideally, with really good backend support for various features, all the heavy optimization will happen at that layer on `std` ops and `scf` control flow. But I have a suspicion we might end up needing more optimization earlier in the pipeline.	2021-06-16 14:04:31 -07:00
Sean Silva	8860b5c55d	Add `torch.prim.If` This removes the use of `scf.if`, which required laundering back and forth between `i1` and `!torch.bool` in the frontend. We will eventually lower this op to `scf.if`, but this results in a cleaner IR and layering at the frontend.	2021-06-16 14:04:31 -07:00
Sean Silva	784156a998	Add `!torch.bool` type. This finishes removing the dependence on the basicpy dialect! Changes: - Add `!torch.bool` type and replace use of `!basicpy.BoolType` in Torch-related code. - Rename BuiltinTensorize to BackendTypeConversion since now it handles bool conversions (and, when we add !torch.int and !torch.float, it will handle those as well), and generalize the related utilities (I also moved them to Torch/Transforms since they aren't really part of Torch/IR). - Add `torch.to_i1` and `torch.from_i1` ops for materializations - [cleanup] Reorganize `torch.constant.*` ops in TorchOps.td - Remove dependency of `torch` dialect on `basicpy` dialect and also `std` dialect. For `std`, we use some call related ops, but the `torch` dialect itself never produces them (we have passes that do though). This is fairly mechanical. Recommended review order: - New stuff in Torch/IR - New BuiltinTypeConversion files. - Mechnical fixups elsewhere.	2021-06-16 13:22:00 -07:00
Sean Silva	3ccf6002af	Add `torch.constant.int` and `torch.constant.float`. - This removes reliance on basicpy.numeric_constant. - Also, add OpAsmOpInterface to the `torch.constant.none` and `torch.constant.str` ops.	2021-06-15 15:29:42 -07:00
Sean Silva	2e850ecb72	Add !torch.str type. - Remove dependence on `!basicpy.BytesType`. - Add `torch.constant.str "s"` analogous to `torch.constant.none`.	2021-06-15 10:10:59 -07:00
Sean Silva	92ee0fa98f	Add `!torch.tuple<T1, T2>` type. This further eliminates the need for the `basicpy` dependency. This required adding `torch.prim.TupleConstruct` to replace `basicpy.build_tuple`.	2021-06-15 08:15:22 -07:00
Sean Silva	db282fd1b4	Introduce native `!torch.none` type. - Add `torch.constant.none` op to construct it (naming is chosen to be analogous to Torch's representation of a prim::Constant with NoneType, rather than using the "singleton" terminology of Basicpy).	2021-06-14 13:30:58 -07:00
Sean Silva	0b6516c7cc	Bump llvm-project to cbd0054b9eb17ec48f0702e3828209646c8f5ebd Changes: - MLIR_BINDINGS_PYTHON_ENABLED -> MLIR_ENABLE_BINDINGS_PYTHON - canonicalizer constant insertion order - EDSC is gone now	2021-06-10 16:26:45 -07:00
Yi Zhang	e0ff5248fb	Add TorchList type and prim::ListConstruct #218	2021-06-10 14:31:35 -07:00
Sean Silva	370e3270ab	Introduce `!torch.tensor` / `!torch.vtensor` types. This removes our reliance on the numpy dialect and avoids our off-label use of the builtin tnesor type for modeling unknown dtypes. The `!torch.vtensor` (`ValueTensorType`) type is a value-semantic tensor. The `!torch.tensor` (`NonValueTensorType`) type is a non-value-semantic tensor. The new types look as follows syntactically: ``` // Least-static-information, non-value-semantic tensor. !torch.tensor // Explicit form of least-static-information variant. !torch.tensor<,unk> // Least-static-information, value-semantic tensor. !torch.vtensor // Explicit form of least-static-information variant. !torch.vtensor<,unk> // Fixed-set of allowable element types, with first-class support for // Torch's frontend signedness semantics. !torch.tensor<*,si32> // First-class support for unknown dtypes. !torch.tensor<[?,?,?],unk> // Standard MLIR representation of `?` for unknown dimensions. !torch.tensor<[?,2,?,4],unk> // Statically shaped / dtyped example. !torch.vtensor<[1,2,3,4],f32> ``` This required fairly significant changes throughout the compiler, but overall it is a big cleanup. We now have a much clearer layering of "the Torch frontend lowering" vs "lowering to std + linalg + etc.". At the C++ level, there is `ValueTensorType`, `NonValueTensorType`. We also have a helper `BaseTensorType` (kind of like ShapedType) which interoperates with those two. Included changes: - New `torch.tensor(dense<0.0> : tensor<5xf32>) : !torch.tensor` op for creating torch tensor literals in the frontend. - Consistently use signedness for the types (except i1 which I didn't touch -- we need to sort out the situation with !basicpy.BoolType there anyway so will be attending to that soon) - Frontend can annotate whether an argument to the function has value semantics. We currently require this, as our backend contract does not currently allow us to even model the non-value-semantic case. Before, the value-semantic assumption was randomly injected in the middle of the pass pipeline. - Move ArrayToTensor (now called MaximizeValueSemantics) and RefinePublicReturn passes to torch dialect. - The TorchToStd and TorchToLinalg passes are now type conversions from `!torch.vtensor` to `tensor` and use the dialect conversion infra. The overall conversion pipeline is set up following the best practices of the "Type Conversions the Not-So-Hard Way" talk. This required introducing `torch-func-builtin-tensorize` and `torch-finalizing-builtin-tensorize` passes analogous to the upstream bufferization passes with the corresponding names (mostly just copypasta from there). - Misc Torch-level canonicalizations -- we now cleanly layer the lowering to std later in the pipeline, so we are gradually lessening our reliance on random std constant folding before we get to that point. Recommended review order: - New types in TorchTypes.td/TorchTypes.h/TorchDialect.cpp - New ops in TorchOps.td / TorchOps.cpp - Less important / more mechanical stuff - Frontend changes. - Pass changes/additions in `Torch/Transforms` and `Conversion/`	2021-06-10 10:56:48 -07:00
Sean Silva	d66e8fe1f8	Get simple quantized model importing. This is enough to import the program and get it through the compilation pipeline. It of course fails at the VerifyBackendContract pass since there is a lot missing, but the final IR for a simple quantized MLP is looking pretty decent already: [IR](https://gist.github.com/silvasean/f76bccd76e9b193d396cfb2f9a11f54d) Main changes: - Add support for importing torch quantized tensors, including `torch.per_tensor_affine.create` op and `!torch.qint8` element type. - Add support for importing `LinearPackedParamsBase` (basically a weight + optional bias, but requires `torch.linear_params.create` op + `!torch.LinearParams` type to model it). This was less painful than I expected, as it has the necessary methods to opaquely unpack itself. I factored things so it should be easy to extend to other custom classes like `ConvPackedParamsBase`. - Add minimal boilerplate for importing `quantized::*` ops, with `quantized::linear` being a motivating example. - Add e2e test with simple quantized MLP (courtesy of @phoenix-meadowlark). This is somewhat of an abuse of `!numpy.ndarray` / `tensor`, as really the proper semantics of `!torch.qint8` dtype on a Torch tensor is "check the quantizer object of the tensor for side data (scale/offset, possibly per-channel) that defines the full semantics of the tensor". We don't have any such notion of "side data" for `!numpy.ndarray` / `tensor`, let alone anything that would have the associated behavior of keying off the dtype to determine if the side data is present. This will be fixed by a proper `!torch.tensor` type.	2021-05-20 11:28:20 -07:00
Sean Silva	2efda323ff	Significantly restructure torch/aten import design. This is a really major and invasive restructuring of the way we get torch operators (`torch::jit::Operator` / `c10::OperatorHandle`) into MLIR. Please forgive the challenging review, but due to the sheer invasiveness, it wasn't really practical do do it in sane smaller pieces. This fully replaces everything that was already working on the TorchScript path (actually, more -- we added tanh support to TorchToLinalg in order to delete the older code paths). Additionally, I've kept the lights on for the acap path too, including what little e2e stuff was working before (for expediency I made a few tiny compromises along the way that will be easy to undo when we give that path proper attention). Overview of the new design: - The torch operator `somens::someunqualname.someoverloadname` is imported as `torch.somens.someunqualname.someoverloadname` (skip the last dotted part if the overload name is empty), OR, if we don't have such an op registered, it is imported as `torch.operator "somens.someunqualname.someoverloadname" (...) : ...`. - The addition of the "overload name" is a critical element here, as the `(ns,unqual,overload)` triple is unique, which solves a lot of problems we were having. - This involves having separate MLIR ops for the `trailing_` and `.out` variants and all the different overloads. This seemed necessary, because the set of overloads is so wild and varied and unstructured. The previous design was leaning into some underlying structure that just isn't there -- the default situation is the "random overload that we want to manage on the MLIR side", rather than that being an exception. E.g. `aten::ne` (not-equal) has 21 overloads, only 4 of which are c10 dispatcher ops see [gist](https://gist.github.com/silvasean/190ba918c550c956260e21254e1b8aa1), and the "out" variant is really called `.Tensor_out` instead of `.out` as it frequently is for other ops. - Rationale for all being in `torch` namespace: the set of operators are so varied and unstructured that "dialect per namespace" doesn't result in anything resembling the typical MLIR dialect boundary expectations. We could maybe draw the boundary at dispatcher ops vs non-dispatcher ops, but that doesn't seem to really result in very much useful structure at this point in time. - Note: within the torch operator registry, we effectively have a mini-basicpy subdialect (already type-resolved), which is reasonably structured. - The existing Torch op interfaces are also removed -- now that we track the overload name, we can losslessly find the original operator. - Instead of `ATenRecognizeKernelsPass`, we now have a `ReduceOpVariantsPass` that keys off certain traits (and perhaps eventually interfaces) to reduce variants of ops to a smaller set, ideally operating on immutable tensors and using surrounding ops to model the mutability/aliasing aspects. - Note: `torch.ns.unqual.overload` ops allow both immutable and mutable tensors (unlike the previous hard distinction in the common case). This is a premonition for a future change that will introduce a bona fide `!torch.tensor` type that will clean up a bunch of stuff. - `TorchToLinalg` / `TorchToStd` supercede the existing "ATen->TCF->TCP->Linalg" path. - The new `torch_ods_gen.py` supercedes `torch_signature_ods_gen.py`. It should look somewhat familiar, but the benefit of hindsight has allowed a lot of simplifications. The overall trend seems to be to make the `torch` dialect a nice layer independent of anything else. It feels like as a natural result of various future changes we will be removing the reliance on basicpy+numpy dialects and have a nice self-contained type system too that properly models the TorchScript type system (including proper subtyping, mutable/immutable tensors, optional dtype, etc.). Recommended review order: - Start at some of the new import IR, e.g. in `frontends/pytorch/test/node_import/prim.py`, `frontends/pytorch/test/acap_export/test_export_add3.py`, and other tests. - `frontends/pytorch/python/torch_mlir_utils/codegen/torch_ods_gen.py` and associated generated files: - `include/npcomp/Dialect/Torch/IR/GeneratedAtenOps.td` - `include/npcomp/Dialect/Torch/IR/GeneratedPrimOps.td` - Inspect `ReduceOpVariants.cpp` / `reduce-op-variants.mlir` and the new traits in `include/npcomp/Dialect/Torch/IR/TorchTraits.h` - Various code changes in the import path in `frontends/pytorch/csrc/builder`. Probably most interesting is the new code in `torch_to_mlir_utils.cpp` that has the logic to create the `torch.operator` ops or `torch.ns.unqual.overload` ops. This is the [new ResNet IR](https://gist.github.com/silvasean/5407aafb710d07612b7b5b92eabecebe), just to be able to look at a substantial sample of IR in the new style.	2021-05-19 13:37:39 -07:00
Sean Silva	133bdf4b31	[cleanup] Add materializer for basicpy.singleton This allows the canonicalizer to coalesce it like other constants.	2021-05-03 09:54:44 -07:00
Sean Silva	3d08c83580	Add flatten op recognition + shape refinement. This op has complex aliasing semantics, so it is kept mutable for now. With this, we reduce ResNet18 to a single BB with all aten operators having rank + dtype: https://gist.github.com/silvasean/2fcb1c6e4d4ae27461204a43ae9c5031	2021-05-03 09:54:44 -07:00
Sean Silva	122cae2ee3	Add aten::len.t, aten::size, and aten::gt.int primitive ops Also add some canonicalizations that finally reduce ResNet down to a single block.	2021-04-30 10:57:02 -07:00
Sean Silva	ec6d06aa86	Add some more ResNet ops. - aten::relu_, aten::max_pool2d, aten::adaptive_avg_pool2d, aten::batch_norm, aten::conv2d No aten-to-linalg conversion for the latter ones, as they are fairly substantial. At this point, I'm trying to get shape inference and stuff working for them and the IR cleaned up.	2021-04-30 10:57:02 -07:00
Sean Silva	9257457d8a	Add AllowsTypeRefinement trait and use it to improve RefineTypes This trait lets us model the semantics of various aten/torch/numpy ops that are insensitive to type refinements. This replaces hardcoded/inconsistent checks for this property. To show usage of this new trait, we fix up some old uses, and improve RefineTypes to be smarter about rewriting with this trait.	2021-04-30 10:57:02 -07:00
Sean Silva	1c832604d2	Remove old aten-to-std / ATenLowering pass. It was confusing now that we have `convert-aten-to-std`.	2021-04-30 10:57:02 -07:00
Sean Silva	55c3cc6624	Add recognition/folder/lowering for aten::__is__, aten::ne.int, and aten::dim Interestingly, TorchScript has its own op (`torch::jit::Operator`) registry separate from the dispatcher (it is a superset of the dispatcher). This is where the "prim" ops and some "aten" ops (that should probably be renamed to "prim") live. In particular, `aten::__is__` is in that latter category of "aten but really prim". This registry is also the source of truth for what the TorchScript interpreter calls into when it executes. The bulk of the "not part of the dispatcher" ops live in `09feb5f579/torch/csrc/jit/runtime/register_prim_ops.cpp (L82)` And the registry itself lives in: `09feb5f579/torch/csrc/jit/runtime/operator.cpp (L196)` This fold further reduces the IR of ResNet by folding away some more not-taken branches. These not-taken branches in ResNet require first-class handling of the list type which we don't yet have on any backend.	2021-04-30 10:57:02 -07:00
Sean Silva	7eb36b4ae7	Constant fold through basicpy.bool_cast. This is the start of a push to getting ResNet running. This involves throwing in the towel on an O0 pipelinie for now. See note in the code. We keep an options struct with `optimize` flag, but it default to true for now.	2021-04-30 10:57:02 -07:00
River Riddle	4678a7fedd	Refactor RefineTypes to use the upstream ForwardDataFlowAnalysis engine This removes the need for defining all of the custom propagation logic, and also adds support for propagating value knowledge across branches, through regions, and across calls.	2021-04-27 13:17:56 -07:00
Sean Silva	179105ca3e	Add basic MLP's to the e2e curriculum. These tests pass on the reference backend. - Add aten.linear op + shape xfer function + ATen->Linalg lowering. - Note: this needs to be more automated, and needs to cover more cases. - Current not implemented caveats: - size-1 broadcasting for bias vector (either static-size-1 or ? case) - higher-rank aten.linear ops (not produced by torch.nn.Linear though) - type promotion (still don't even know the exact rules here) - Add folder for torch.derefine op. Now the inliner can clean it up as it inlines. (call boundaries are a main place we need to insert torch.derefine) This is brittle -- the other important case is control flow which will need to be handled via an extension to RefineTypes.cpp (as will more robust call handling). River has an in-flight patch to update it to the new dataflow framework so I didn't want to do anything intrusive here. - Also adjust torch.derefine syntax to use the keyword `to` instead of `->`, as most type-only, cast-like ops do.	2021-04-27 12:18:54 -07:00
Sean Silva	9ba77c6e13	Add InlineGlobalSlots pass. This inlines global slots if possible. This allows them to participate in folding, canonicalization, shape inference, etc. Example use cases: - inlining weights and biases that are readonly during inference - inlining the "training" bool to allow stuff to fold away For training use cases (especially internal training loop), we will need something smarter to get good performance. That would look like an "SSA formation" which promotes the global slots to tensors in the program, flushing them back to the slots at the minimal number of necessary places. We might want to let backends do that transformation though. This also interacts with shape inference (type bounds on the slots to even lower them to backends in the first place).	2021-04-27 12:18:54 -07:00
Sean Silva	b1c49ae648	Move GlobalizeObjectGraph tests to their own directory	2021-04-27 12:18:54 -07:00
Sean Silva	f5dfa02523	Add `aten.mm` to linalg lowering. This is our first op with error semantics, and stresses the system. There are a few design notes of special interest: - RefineTypes.cpp's note about shape inference in the presence of code that dynamically produces and error, and it is provable statically. - ATenToLinalg.cpp's notes about future automation of the ATen->linalg path. - The notes in Passes.td about using low-tech `std.assert` ops instead of `shape.assuming`. Note: Doesn't work on IREE yet due to the `std.assert` op (needs to be lowered to `vm.fail` on the IREE side).	2021-04-16 12:03:31 -07:00
Sean Silva	927546b3c5	Add RefinePublicReturn pass. This pass allows shape information to be propagated to return types, which is nontrivial and cannot be cleanly put anywhere else as it changes the public ABI, which is a concern that we want to keep concentrated in one place.	2021-04-07 11:06:34 -07:00
Sean Silva	1e357ae680	Add simple type refinement pass. Currently implemented as a simple intraprocedural dataflow analysis over a standard ShapedType lattice (hasRank, sizes, and elementType). It currently hardcodes a few key pieces of information: - shape transfer functions - whether it is legal to update the operand type of an op This needs to be made pluggable obviously and the core propagation logic moved somewhere agnostic.	2021-04-07 11:06:34 -07:00
Sean Silva	6431b0f11f	Add primitive ArrayToTensor (numpy-array-to-tensor) pass. The current implementation is just sufficient to do a unary aten.tanh from the e2e spike, and just applies some local rewrite patterns. I've sketched out the more full explanation of where this pass eventually need to go in the pass docs. Adding this required adding `numpy.tensor_static_info_cast`, which is the tensor analog of `numpy.static_info_cast`. This op encapsulates the same numpy-specific "no runtime code" casting semantics, in particular the interpretation of `!numpy.any_dtype`. The `numpy.tensor_static_info_cast` I see in practice now are "information erasing" and will be removed by a later pass that exploits the fact that aten ops are agnostic to the static info in the operand types (so substituting a type with more static info is fine). Side note: we need to do dtype and rank inference before aten->tcf (which will eventually mostly be aten->linalg+guards), because each aten op is idiosyncratically overloaded based on dtype and rank. Without copying that idiosyncratic overloading into lower layers (layering violation), we cannot really lower it to anything until we do that.	2021-04-05 17:56:35 -07:00
Sean Silva	30356c41c8	Add torch-adjust-calling-conventions pass. This pass incorporates torch.type_bound info and also removes NoneType returns (eventually it will rewrite tuple types too, but can't yet because !basicpy.TupleType doesn't track element types). Recommend looking at adjust-calling-conventions.mlir first to see what it is doing, and holding your nose for the implementation of the pass. I decided to implement this with the conversion framework, because it gives us some goodies for type conversion -- mainly avoiding large amounts of tricky RAUW dances. Unfortunately, the conversion framework isn't a perfect fit for a couple reasons: - the incorporation of torch.type_bound is a context-sensitive rewrite (requires looking at the arg attr, not just the type). - NoneType conversion is 1->0, which requires some special handling - (not implemented yet) 1->N tuple type conversions require special handling. It's a little bit scary, but on balance doing it the other way would have its own downsides.	2021-04-05 17:56:35 -07:00
Sean Silva	e749074bae	Basic infra for annotate shapes and dtypes on arguments. These allow users to annotate a known "type bound" on the argument, which can seed shape/dtype inference. We don't rewrite the function types as part of the import process (it will happen in a yet-to-be-written pass) because: 1. We would need to interprocedurally rewrite all calls to keep the IR consistent. Currently, we have a place after GlobalizeObjectGraph but before we convert to tensors where this is convenient to do. Ideally, we would do this on the object graph representation. 1. We don't necessarily know that adjusting the function type is a legal calling convention change. The pass will have blessed knowledge (by the pass pipeline author) that adjusting the argument type based on the type bound is safe (which it frequently is). 2. Note that in principle, a type bound could be a fairly general thing (such as maximum sizes of dimensions, unions of multiple concrete types, etc.). The pass will in principle have logic to interpret the type bounds and to determine a suitable "best" (and legal) argument type.	2021-04-01 18:40:03 -07:00
Sean Silva	99178a167d	Bump llvm-project to 0524a09cc7e1a0797982feacf505825231efbee7 - renames of OwningRewritePatternList -> RewritePatternSet - also `insert` to `add` - RewritePatternSet holds a context now - memref dialect split from std	2021-03-23 14:29:05 -07:00
Bryce Arden	4591884d06	[refbackrt] Scalar arg support * Adds f32 scalar argument support across the ABI boundary. * Adds support for passing input type / shape information across the ABI boundary * Adds support for parsing / creating input FloatAttr's in `npcomp-run-mlir`	2021-03-23 13:16:44 -07:00
Sean Silva	703428eff4	Add support for "trailing_" and "out" variants of various ops. We already had the `promoteTrailingOutTensor` flag, but weren't using it. A inplaceVariantKernelName flag needed to be added. This change is a little dissatisfying, as the conversions done by the RecognizeKernelsPass are currently non-orthogonal. In particular, `kDropResultAndAliasArg0` probably won't work as intended if mixed with these (we probably need to promote kDropResultAndAliasArg0 to not be an arg-level thing anyway, as we have done with promoteTrailingOutTensor). This involved adding a new op `numpy.overwrite_array`. ``` numpy.overwrite_array %arg2 overwrites %arg0 : tensor<2x3xf32>, !numpy.ndarray<[2,3]:f32> ``` This models the destructive update behavior. Note that in the above op, we cannot simply RAUW %arg0 with a suitably conveted %arg2 (for example, %arg0 might have uses that are not dominated by %arg2, or might have an alias relation with some other array in the program). In general, we need a pass analogous to "SSA-formation" which knows how to see through these to uncover an underlying tensor program. Also, add tanh_out_e2e.py/div_inplace_e2e.py and fix some bitrot in refjit.py which is my running example I'm trying to get working.	2021-03-19 10:34:50 -07:00
Aaron Arthurs	4fd9b4afb5	Import ATen conv2d conversion and test (#180 ) * Import ATen conv2d conversion and test This is a first attempt at expanding ATen-to-TCF conversion for the conv2d operator. Eventually, this will come in use when lowering a high-level conv-based model.	2021-03-12 17:21:16 -08:00
Sean Silva	58c7030104	Support multiple instances of a class in GlobalizeObjectGraph. This happens in practice with e.g. ResNet from torchvision (multiple instances of the same BatchNorm class). The key observation is that for this program, and the expected set of programs, we can convert the program to the same globalized form with a bit more static analysis and effort to suitably monomorphize the program. Though what we are doing here is fairly annoying to implement, it saves any nontrivial later pass from having to do similar analyses (or worse). E.g. shape inference would need to be object-graph aware, mutation/lifetime analyses would have to be aware, etc. Additionally, it would make us front-load what it means to have a !torch.nn.Module type on an ABI boundary, which we are just not ready to handle. I'm really, really hoping that in practice we can get away with this, otherwise it's going to be really rough designing a representation (and implementing everything to back it) that is convenient to transform and gracefully scales from full object graph (in the most dynamic case) down to a fixed set of global slots like we have here (in the most static case, which we presume a lot of practical programs fall into). This also involved introducing a `torch-prepare-for-globalize-object-graph` pass that does a minimal set of lowerings to simplify the IR into a more orthogonal and analyzable form, and a `torch-globalize-pipeline` helper. Recommended review order: - updated documentation in Passes.td - new tests in `globalize-object-graph-multiple-instances*.mlir` - implementation of GlobalizeObjectGraph.cpp - PrepareForGlobalizeObjectGraph.cpp + prepare-for-globalize-object-graph.mlir - misc stuff like torch-globalize-pipeline pipeline definition. With this, we can import, globalize, and inline resnet18 from torchvision: https://gist.github.com/silvasean/821586afc19b67d9fb72030b2e0adeb8	2021-03-11 19:21:07 -08:00
Sean Silva	c837dbb077	Properly import the entire torch::jit::CompilationUnit This primarily unlocks proper handling of free functions (that is, functions that are not methods of any torch.nn.Module). Recommended review order: - `ivalue_importer.cpp` + `ivalue_import/functions*.py` - `GlobalizeObjectGraph.cpp` + test case - misc other stuff The `torch::jit::CompilationUnit` is basically a backing store or "context" holding all the possible functions in the program. The previous code was not explicitly accessing this data structure, since it just imported the `torch::jit::Function`'s that it saw attached to methods. Subtly, any time a TorchScript module called into a free function, the free function gets incorporated into the torch::jit::CompilationUnit, but doesn't show up anywhere when dumping the module, except in the curious pattern: ``` %5 : Function = prim::Constant[name="adaptive_avg_pool2d"]() %6 : Tensor = prim::CallFunction(%5, %input.1, %4) ``` That is, calls are indirect calls, and are accessed via `prim::Constant` materializing a function object. Even stranger, the `name` attribute here doesn't really even tell the full story -- it doesn't correspond to anything. It turns out that the c10::FunctionType itself actually holds a pointer to the `torch::jit::Function` in the compilation unit directly (so there is actually no indirection in prim::CallMethod, because any two values of the same FunctionType call the same function!). E.g. when converting the IR to bytecode, the "name" is ignored [code link](`1d6bd15790/torch/csrc/jit/runtime/interpreter.cpp (L937)`). We do import `prim::CallFunction` as a `std.call_indirect` though because it's more braindead to do it that way (it gets canonicalized to a direct call easily).	2021-03-01 12:08:01 -08:00
Sean Silva	79a3f639bf	Give torch.global_slot an initializer region. This is a much simpler representation than the ad-hoc initializer function we had before. It is also less general, but given the rationale in Passes.td it seems like the right tradeoff right now. We can probably carry this representation for quite a while, and when we can't, it likely means that TorchScript has fixed their object identity bug and we probably need to just upgrade to a more general object graph modeling (more general than GlobalizeObjectGraph). In particular, we don't want to deal with defining and carrying around this initializer function concept until we need it. For example, if we want to constant-fold the global slots into uses, this is a much better representation, and it plays better with symbol-dce (the initializer function counts as a "use" of the symbol). (the alternative would have been to write a pass that converts the initializer function to this form when possible, but I realized that lots of information had been lost which made that fairly annoying -- it was all self-inflicted anyway, so best to just go to the source (GlobalizeObjectGraph) before the information is lost) Now symbol-dce works nicely (no more "training" bools) ``` pt_util ~/tmp/classifier.pt --import --exported-name forward \ \| npcomp-opt -torch-globalize-object-graph -inline -symbol-dce ``` IR: https://gist.github.com/silvasean/8abe63d70d24e29d6db9170ccc8d512b	2021-02-26 16:24:19 -08:00
Sean Silva	a375ccf9da	Add ability to annotate TorchScript classes. The first use case is to annotate certain program constructs as either exported or private. In this commit we plumb it down to GlobalizeObjectGraph which makes use of this information. Recommended review order: 1. class_annotator.h/.cpp + `test/module_import/annotations/*` - New abstractions to communicate with Python code and annotate. 2. IR changes in TorchOps.td - Adding "private" attribute to various things. 3. ivalue_import.cpp changes - Module + ClassAnnotator = annotated IR 4. GlobalizeObjectGraph.cpp + tests - use new "private" attributes to create "private" IR. - also, tweak some of the op deleting mechanics, which was triggering some memory errors / assertions With this, we can run the classifier through and inline it as follows: ``` frontends/pytorch/utils/pt_util.py --import --exported-name forward ~/tmp/classifier.pt \ \| npcomp-opt -torch-globalize-object-graph -inline ``` IR: https://gist.github.com/silvasean/32dcad9f6270557f412094a77cecdd69	2021-02-25 11:28:34 -08:00
Sean Silva	1b769f7841	Extend GlobalizeObjectGraph to handle torch.prim.GetAttr returning NnModuleType This happens in practice. With this, we can globalize slots for the non-trivial classifier layer obtained from https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb This also adds support for tuple return types, which were needed by that model.	2021-02-19 10:23:25 -08:00
Sean Silva	158c5c484d	Implement GlobalizeObjectGraph transformation. This required restructuring of how we model TorchScript on import. The main difference is that now we split out a `torch.class_type` that holds methods and declarations of the types of each slot. This is more consistent with TorchScript (our previous representation was "denormalized"). Recommended reading order: 1. check out the description of `torch.class_type` in `TorchOps.td` and look at `test/Dialect/Torch/ops.mlir` and `frontends/pytorch/test/module_import/` to familiarize with the new representation. - Just look at the new IR. The diff between the old names and new names is confusing. 2. check out `test/Dialect/Torch/globalize-object-graph*.mlir` and read along with the pass description in `include/npcomp/Dialect/Torch/Transforms/Passes.td` 3. Read the code in `GlobalizeObjectGraph.cpp` and miscellaneous changes in `ivalue_importer.cpp`, `TorchOps.cpp`, etc.	2021-02-18 18:18:47 -08:00
Aaron J Arthurs	c0e14da888	Fix TensorFromElementsOp reference	2021-01-28 12:01:35 -08:00
Aaron J Arthurs	fc650c9447	Import TCP pad	2021-01-28 12:01:35 -08:00
Sean Silva	689b40c7a6	Add initial TorchScript module importer It turns out that this was easiest to structure as a general IValue importer, since torch module are just one of the possible IValue's. We import the IValue object graph in a braindead fashion into basicpy ops and a new `torch.nn_module` op that is used to model the attributes/methods of a torch::jit::Module IValue. See `Torch/ops.mlir` for an example, and also check out the .py import tests in `frontends/pytorch/test/module_import`. As part of this change, a few housekeeping tasks: - extract some helpers from graph_importer.cpp - more helpers around the C API - misc touchups	2021-01-28 11:55:17 -08:00
Sean Silva	3f4161635c	Bump llvm-project to be7352c00d51f4358db3a23ed6a077f7cb48eafd - TensorFromElementsOp -> tensor::FromElementsOp - `cmpi "eq", ...` -> `cmpi eq, ...`. Same for `cmpf` - syntax change for private func ops - some changes to the python bindings	2021-01-21 11:16:55 -08:00
Aaron Arthurs	85898aaf10	Add TCF convolutional op with bias addition (#137 )	2020-12-15 12:53:12 -08:00
Sean Silva	46aa6d0a24	[RefBackend] Fix leaks related to ABI boundaries. Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks except for those due to buffers created internally to the codegenned code itself (up next I'll add the buffer deallocation pass to fix those). The main change is that instead of attempting to pass `refbackrt::Tensor` to the codegenned function directly, we make all the ABI types be UnrankedMemRef which gets passed awkwardly (but workably) as a `{size_t rank, void ptrToDescriptor}` on the ABI. The reason why refbackrt::Tensor wasn't workable is that is that MLIR doesn't really have a way to deal with the lifetime of unranked memref descriptors that happen inside the function, which is inevitably what would happen in the old code that would emit runtime calls to `refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to `refbackrt::Tensor` inside the codegenned code. So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no real sound basis for valid lifetime management, we now have a lovely piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely seems to be sound. We rely on the codegenned code having these properties, which it seems to have: - it won't free memref descriptors or their backing buffer for arguments of UnrankedMemRef type. - it will allocate a separate memref descriptor for each result UnrankedMemRef (which is ensured by having a separate memref_cast for each) - we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers) to avoid double-freeing in the case of aliasing of the backing buffer (including backing buffers for arguments feeding into results) - to catch the case of statically allocated data (which we need to avoid passing to `free`) , check if the `allocatedPtr` is (no joke) equal to `0xDEADBEEF`, because there is otherwise no way to distinguish statically allocated from malloc'ed data... (std.global_memref lowering to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`, presumably mainly as a debugging thing) Even with all this, we still* need to (internally to refbackrt::invoke) make copies of all inputs/outputs! And the details of how the LLVM-level ABI gets laid out for e.g. function arguments/returns is still super tricky. This really highlights how deficient memref is as the general runtime type for our use case. It's stewing in my mind how best to improve the situation. My general gut feeling is that IREE's abstractions for this are "right", but I need to think more how to distill those aspects of IREE's design in a "reference" way for RefBackend. Some implementation notes: - In terms of how this is implemented, this did catch a bug in our ABI wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to work before through some combination of npcomprt::Tensor being passed as a single pointer + probably me infinite-monkey-ing it until it worked) - This actually removes 2 out of the 3 compiler runtime functions (the only one left is "abort_if". (most of the memref descriptor code moved from CopmilerRuntime.cpp to Runtime.cpp) - this also means deleting `refbackrt.from_memref` and `refbackrt.to_memref`	2020-11-25 13:09:58 -08:00
Stella Laurenzo	3937dd14cb	Add basicpy.numeric_constant op. * Going through TODOs on the PyTorch side, this is a big cause of them (not being able to have constants for signed/unsigned). * Added complex while in here since we're at the phase where it is better to just have things complete than partially done.	2020-11-24 16:44:40 -08:00
Sean Silva	5227d52c26	[RefBackend] Use std.global_memref instead of homegrown thing This vastly simplifies our code, allowing deleting multiple ops, simplifying multiple passes, and removing a whole pass. Now `refback` dialect is down to one op (refback.alloc_memref, which simplifies allocations to just take a shape instead of individual extents).	2020-11-13 18:43:50 -08:00
Sean Silva	1c7c362e29	[TCP] Replace tcp.matmul with linalg.matmul. This involved adding a `tcp.splatted` op to splat a dynamically sized init tensor. See rationale in TCPOps.td docs. One interesting observation is that when lowering tcf.matmul to linalg.matmul, we need to both 1) create the error checks and 2) calculate a shape transfer function to create the init tensors. Previously, 2) was deferred to bufferizing tcp.matmul later. I'm not sure if this is a conflation of concerns or not. For now, it's not a big burden.	2020-11-10 18:58:28 -08:00
Sean Silva	0427aacb0b	[TCP] Replace elementwise ops with std elementwise ops.	2020-11-10 18:58:28 -08:00
Stella Laurenzo	6c702b149f	Add a number of kernels and new patterns. * convolution, convolution_backward, _log_softmax, _log_softmax_backward_data, nll_loss_forward, nll_loss_backward, nll_loss2d_forward, nll_loss2d_backward, copy_ * Extends the recognition logic and metadata for handling inplace transformations, optional tensors, ints, lists and dropped args. * The kernel_calls generated by test_conv_nllloss_grads.py now convert to ATen. * The result almost comes out as a pure tensor program with the exception of the copy_ op, which I will do some followup work to deal with. * More progress on #97	2020-11-04 14:36:59 -08:00
Sean Silva	0761df9f58	Bump llvm-project to 72ddd559b8aafef402091f8e192e025022e4ebef - Fixup to OpBuilderDAG - Update for affine map naming	2020-10-30 18:12:41 -07:00
Aaron J Arthurs	29c715b6b1	Add TCP mul test	2020-10-30 15:11:52 -07:00
Stella Laurenzo	c08935a418	Rewrite ATen ODS code generator to be based on new op registry and new signature recognition system. * Deletes prior code generator from previous attempt (moved some of it into this one). * Renames old generated tablegen source to "Legacy". * Generates ODS and import rules for most binary and unary arithmetic ops. * Removes old generated ops and integration tests that were testing details of the prior setup.	2020-10-28 10:37:37 -07:00
Stella Laurenzo	510f226df2	Expose signature metadata to ops and implement ATenRecognizeKernelsPass pass. * Two op interfaces, one for querying instance metadata and one for getting static data needed to construct an op from a generic form. * For torch.generic_kernel ops, metadata is splatted in during capture from Torch (it comes from the op registry, which will work for either device capture or graph import). * Moved the 'add' out of the generated set so I can experiment on it. It implements the TorchBuildableKernelOpInterface interface which provides its metadata. * The ATenRecognizeKernelsPass pass generically lowers from a torch.generic_kernel to recognized ops that implement the TorchBuildableKernelOpInterface, handling the various types of transformations that we allow at this stage.	2020-10-26 20:31:45 -07:00
Stella Laurenzo	029815152e	Add remaining pieces to capture full example models. * Adds Basicpy List, Tuple, Dict types and plumbs through C API. * Started debugging the issues around aten::conv2d capture, but a PyTorch bug is suspected. * Was able to manually verify that the basic conv2d forward test captures correctly with a workaround. * Need to resolve some printing issues upstream and move these tests to an integration test target (they take ~seconds to run).	2020-10-19 22:16:59 -07:00
Sean Silva	06a8ba6900	[RefBackend] Use more idiomatic bufferize pattern for TCP. The time has come for BypassShapes/LowerShapedResultsToMemref to go away :( For the reference backend, being consistent with upstream conventions is the name of the game now. This is a step down in a number of ways, e.g. test clarity and separation of concerns. But it is fewer files and fewer tests, and does address the "TODO: This is really fragile". It also eliminates two more ops from the refback dialect (sadly, they are the shaped_results/yield that we were getting kind of fond of, but alas).	2020-10-15 20:15:53 -07:00
Sean Silva	b6bdc8cc4f	[RefBackend] Use upstream BufferizeTypeConverter Now that it has grown source/target materialization capabilities (spelled with ops tensor_load/tensor_to_memref), we can use it. We can also now delete refback.memref_to_tensor/refback.tensor_to_memref. This is also a first step to reducing the downstream functionality needed in the refback dialect.	2020-10-15 15:58:51 -07:00
Sean Silva	7edb5f3641	[RefBackend] Rename RefBackend dialect to Refback I now realize that VerboseCamelCase is not the best choice for dialect directory/file names and C++ identifiers (take e.g. "Linalg", "Basicpy", etc. as prior art here; not LinearAlgebra or BasicPython). If I had to name the convention it seems to be "Shortword" (or of course just acronym dialects like LLVM, SCF, etc.). This rename also has the side benefit of differentiating RefBackend directories, which now refer to the actual backend itself, from Refback/Refbackrt, which are the dialects which happen to be used by that backend.	2020-10-08 09:07:00 -07:00
Sean Silva	bf99a82832	[RefBackend] Rename Npcomprt dialect to Refbackrt.	2020-10-08 09:07:00 -07:00
Sean Silva	5017430dc7	[RefBackend] Split out RefBackend (refback) dialect from TCP. This is the first in a patch series that is refactoring the constellation of things variously called or associated with "E2E", "RefE2E", "npcomprt", and "TCP" into a more cleanly layered result. Concretely, this first patch fixes the fact that TCP was basically acting like a dumping ground needed by the reference backend. This splits it out, which is fairly mechanical, but touches a lot of lines of code (basically replacing `tcp` with `refback` and `TCP` with `RefBackend). Now, the RefBackend dialect is that dumping ground, which is slighly better, as it starts allowing TCP to become a nice clean middle layer that is not related per se to the reference backend. The previous name RefE2E or "reference e2e flow" was super confusing. Now that we are seeing more clearly where the "backend" distinction lies, the [RefBackend] commit tag is born :)	2020-10-07 10:29:48 -07:00
Stella Laurenzo	3d74337be0	Add a torch.kernel_call op and associated predicates.	2020-09-29 15:10:38 -07:00
Stella Laurenzo	2c9ca79c89	Add boilerplate for Torch dialect.	2020-09-28 15:26:17 -07:00
Sean Silva	f9b37c55b7	[RefE2E] Add support for unary ops exp and tanh This is fairly mechanical.	2020-09-24 18:41:30 -07:00
Sean Silva	c69e9fabc5	[RefE2E] Add support for "max". This cleans up the lowering pipeline to easily allow extending to multiple binary ops. It looks fairly repetitive at multiple levels, but I don't want to prematurely generalize. I think that in principle we could derive a large swatch of TCF + TCP from a single linalg-style specification. Another direction is to use an OpInterface (something like "buildLinalgGenericBody"). I'm keeping my eye on it. In a subsequent commit, I'll mechanically add a set of binary ops modeled off of the std arithmetic ops.	2020-09-22 18:38:32 -07:00
Sean Silva	276f5b80ea	[RefE2E] Add assemblyFormat for TCF and TCP ops and tidy up.	2020-09-18 15:03:53 -07:00
Sean Silva	d8675f8ad2	[RefE2E] Add support for matmul. I'm pretty happy with how this turned out. It looks pretty much like it should -- one change at each layer. This particular op bottoms out on linalg which takes care of the rest. - Add tcf.matmul - Add tcp.matmul - Add TCF->TCP lowering - Add tcp.matmul shape transfer function (BypassShapes.cpp) - Add tcp.matmul -> linalg.matmul lowering (LowerShapedResultsToMemref.cpp) - Add support to LowerShapeConstraints for lowering the new shape.cstr_require This matmul op is pretty limited in its capabilities. There is no batching and no multidimensional contraction. Certainly more design work will be needed to find the right abstractions that aren't too general but also help to canonicalize many cases from frontends. This is mainly to show that adding a new op needn't be very "scary" once we have the e2e infra in place. Also, - this clears out some exploratory cruft from the TCF dialect now that this is starting to become real.	2020-09-18 11:31:01 -07:00
Sean Silva	75f57b461e	Totally rework RefE2E tensor to memref flow. (#42 ) This now gets the overall "RefE2E" compilation stack to a point that I'm fairly happy with. We simplify it by mostly embracing the "descriptor" view of the world. The overall flow is best understood by reading through the createE2ELoweringPipeline function in lib/E2E/E2E.cpp That function creates a pass pipeline that lowers from "TCF" (which is ~numpy level of abstraction) down to LLVM IR. A brief high-level summary of what happens there: 1. TCF to TCP conversion. This involves reifying error handling in the form of shape constraints. See test/Conversion/TCFToTCP/basic.mlir 2. Lowering shape constraints. This converts shape constraints into eager error-handling code. See test/E2E/lower-shape-constraints.mlir This pass will soon go upstream. Because this lowers to std.assert, some later passes like LowerToNpcomprtABI and LowerToLLVM are updated to properly plumb this through e2e. See test/npcomp-run-mlir/invalid-broadcast.mlir for an execution test that properly aborts in case of an error. 3. Lowering tensors to memrefs. This is done via a series of passes rather than an single mega conversion. Unlike the previous code that mixed in the npcomprt ABI stuff here, it's now a very clean "pure memref" conversion. See test/E2E/lower-*-to-memref.mlir and lib/E2E/TensorToMemref/ Most of the changes are concentrated here. 4. As part of the above, we use the upstream ConvertShapeToStandard for lowering shapes. 5. We lower linalg to loops and lower loops to CFG using upstream passes. 6. Rewrite the "ABI" boundaries of the program to npcomprt data structures (LowerToNpcomprtABI). This mainly affects ABI boundaries and how global tensor constants are represented. One of the major improvements in this commit is that now it's a very clean rewrite that just replaces memrefs on ABI boundaries with !npcomprt.tensor (before there was a get_extent function that is not needed). See test/E2E/lower-to-npcomprt-abi.mlir 7. Lower to LLVM with upstream mlir patterns + some patterns for the npcomprt lowerings. One aspect here that is still a remnant of a non-descriptor-based tensor to memref flow is the BypassShapes + LowerShapedResultsToMemref. BypassShapes wraps the "tensor compute" ops in a tcp.shaped_results (basically a "tie_shape" kind of op), and then LowerShapedResultsToMemref uses those annotations to allocate output buffers while lowering the "tensor compute ops". Note that there are very few "tensor compute" ops currently supported (tcp.add + tcp.broadcast_to), so we just hardcode them in both passes. Realistically, I expect this to go away as we fully embrace the descriptor-based approach for simplicity, so don't look too deep into it.	2020-09-16 17:31:40 -07:00
stephenneuendorffer	bb668e6e26	Add ATen Dialect (#16 ) This patch adds a dialect intended to be used as a frontend dialect to facilitate lowering from "A Tensor Library" in torch/pytorch. This patch includes several passes that are useful in conjuction with the dialect: --aten-layer-name: Generates layer names for each operation, which are not present in the original pytorch. --aten-to-std: Lower the ATen dialect into standard dialect function calls. --return-elimination-pass: convert functions (primarily the toplevel function) to pass return values by reference. This simplifies pytorch integration. --aten-op-report: generate a textual report about the model --liveness-report Future patches will implement actual integration with the pytorch jit to intercept and generates MLIR in this dialect, then lower the resulting MLIR into function calls through aten-layer-name -> aten-to-std -> return-elimination -> std-to-llvm. The result would then jitted using the LLVM jit, linked against a runtime library which makes calls back into pytorch to implement all the layers. Co-authored-by: Jeff Fifield <jeff.fifield@xilinx.com> Co-authored-by: Jeff Fifield <jeff.fifield@xilinx.com>	2020-08-12 19:28:04 -07:00
Sean Silva	e228aa4b11	npcomprt: add support for constants - create tcp.global + tcp.get_global_memref - create npcomprt.global + npcomprt.get_global - LLVM lowering for new npcomprt ops - Runtime: - GlobalDescriptor struct emitted by LLVM lowering - implement __npcomp_compiler_rt_get_global Also, - cleanly isolate all runtime data structure definitions shared by the compiler and runtime into lib/runtime/CompilerDataStructures.h	2020-07-10 17:31:24 -07:00
Stella Laurenzo	efbcf0aa44	Add NumpyPublicFunctionsToTensor pass. * Rewrites public function signatures to operate on tensors (vs ndarray). * Most of our backends presume immutable tensors at public function boundaries.	2020-07-08 22:51:54 -07:00
Sean Silva	b4f0cea8fa	Rework e2e flow to use new "npcomprt" This ~totally reworks the existing "runtime" stuff to be more principled and usable, such as from Python. It's still not fully production-quality, mainly in the department of memory management (e.g. it currently leaks memory; we need to figure out "who frees memrefs" + the analysis and transformation needed to do that (maybe use upstream buffer allocation pass?)). The user API is in include/npcomp/runtime/UserAPI.h, though include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper. The stuff under {include,lib}/runtime is totally firewalled from the compiler and tiny (<6kB, though no attention has gone into optimizing that size). For example, we don't link in libSupport into the runtime, instead having our own bare bones replacements for basics like ArrayRef (the JITRuntime helps with bridging that gap, since it can depend on all common LLVM utilities). The overall features of npcomprt is that it exposes a module that with multiple function entry points. Each function has arguments and results that are tensor-valued, and npcomprt::Tensor is the runtime type that is used to interact with that (and a npcomprt::Ref<T> reference-counting wrapper is provided to wrap npcomprt::Tensor in the common case). From an implementation perspective, an npcomprt module at the LLVM/object/binary level exposes a single module descriptor struct that has pointers to other metadata (currently just a list of function metadata descriptors). All interactions with the npcomp runtime are keyed off of that module descriptor, including function lookups and dispatching. This is done to dodge platform ABI issues and also allow enough reflection to e.g. verify provided arguments. Most of the compiler-side work here was in LowerToNpcomprtABI and LowerToLLVM. Also, - Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting annoying to type the underscores/caps. - misc improvements to bash_helpers.sh	2020-07-08 19:36:19 -07:00
Stella Laurenzo	5aa2f0f9f6	Add a trivial copy elision canonicalization on ndarray->tensor. * This elides the very common code the compiler adds for chaining otherwise tensor-related numpy ops together. * More aggressive canonicalizations would require more advanced analysis.	2020-07-05 18:09:43 -07:00
Stella Laurenzo	fae15ec5e7	Allow the ndarray type to carry a shape.	2020-07-05 17:34:03 -07:00
Stella Laurenzo	046751254f	Refactor old tracing tests and remove deprecated ops. * Old doctests to run under lit. * Old custom filecheck tests -> pytest directory (under lit). * Rename some old ufunc ops in the tracer.	2020-06-29 16:19:03 -07:00
Stella Laurenzo	b2708e4687	Add test case for !numpy.ndarray.	2020-06-28 17:41:21 -07:00
Stella Laurenzo	7bd5733d38	Add "template function" ops and importer code. * This starts to lay down the infra for reasoning about calls * Adds the importer code to generate IR for function calls of compiler recognized static functions.	2020-06-26 18:36:36 -07:00
Stella Laurenzo	529873d13c	Wire up IREE compilation and runtime in a new backend test. * Adds python bindings for invoking flow, HAL, and VM lowering pipelines. * Adds pythong bindings for translating to VM module flatbuffer. * Adds a new backend_test/iree directory and configure lit to find the IREE python rt bindings. * Open code a simple_invoke.py that exercises the whole pipeline (need real APIs for a lot of this). * Fails when invoking the function because I never implemented argument marshaling for scalars :( * Plenty of stuff to do tomorrow.	2020-06-19 00:30:34 -07:00
Sean Silva	e8b1a07ef4	Initial NpcompRt (npcomp_rt) dialect boilerplate.	2020-06-01 19:07:53 -07:00
Sean Silva	1b48d0d80b	Remove the present tcp.island. The idea was half-baked and after some deep thought felt like a solution looking for a problem. What we had here (and is removed in this patch) just wasn't pulling its weight. I cannot think of anything we would want to do with tcp.island as it is removed here beyond just sinking and merging them within a basic block, such that the witness argument is kind of pointless (only matters for hoisting). TCP compute ops like tcp.add and tcp.broadcast_to have the strong invariant of "pure or undefined behavior", which means they are always safe to sink. The island concept as removed here conferred no benefit. Also, I'll note that "islands" are a trick you can only play once in a system (unless they strictly nest). I have some early-stage thoughs on having an island concept that helps with modeling tensor shapes robustly which seems promising (the island would serve a similar role as tie_shape).	2020-05-14 15:19:37 -07:00
Sean Silva	e29aef855b	Initial TCF/TCP E2E seed. Very much WIP. This is enough to get tcf.add down to approximately the "linalg.generic on buffers" level of abstraction. (but there are nuances)	2020-05-08 20:20:41 -07:00
Stella Laurenzo	bc5ef81d68	Add basicpy.SlotObject type and ops to create/index into it. * This is intended to provide low-level modeling for built-in objects. * It is now possible to trace slice tuples (which are tuples of NoneType\|EllipsisType\|SlotObjectType<slice, ...>).	2020-05-05 18:16:01 -07:00
Stella Laurenzo	d3632af675	Add !numpy.any_dtype dialect type.	2020-04-29 18:20:42 -07:00
Stella Laurenzo	b4425fe1d2	Add numpy.ufunc_call op.	2020-04-29 17:49:56 -07:00
Stella Laurenzo	e845db8a20	Add builtin_ufunc and generic_ufunc ops.	2020-04-28 23:51:54 -07:00
Stella Laurenzo	953ef89a30	Add npcomp-opt and lit runner.	2020-04-26 17:55:15 -07:00

... 2 3 4 5 6

254 Commits (061a97c3f2492dcade81b9077866688b3c5c6c1d)