* convolution, convolution_backward, _log_softmax, _log_softmax_backward_data, nll_loss_forward, nll_loss_backward, nll_loss2d_forward, nll_loss2d_backward, copy_
* Extends the recognition logic and metadata for handling inplace transformations, optional tensors, ints, lists and dropped args.
* The kernel_calls generated by test_conv_nllloss_grads.py now convert to ATen.
* The result *almost* comes out as a pure tensor program with the exception of the copy_ op, which I will do some followup work to deal with.
* More progress on #97
Two changes:
- no more "verifyPasses" constructor arg for PassManager
- OpPassManager defaults to requiring explicit "nest" calls when created
via the C++ API. The behavior upstream for mlir-opt still obeys the
"implicit" mode, so I just slapped that onto all our pass managers.
I pinged https://reviews.llvm.org/D90671 to get a signal for whether we
are expected to migrate to explicit mode. If so, I'll do that too later.
* Enables -gsplit-dwarf for both LLVM and NPCOMP, reducing the occurrence of the ~GB scale binaries.
* CMake shared linking seems incompatible with this, so shared objects are still "too big" but there are few of them.
* Reduces disk thrash on clean/install of everything.
Now, the only bufferization we have left is lowering tensor constants to
memref, which will hopefully proceed soon after Rahul's new
std.global_memref lands + the lowering to LLVM IR. Then I'll port
LowerConstantTensorsToMemref to upstream and we'll be 100% upstream
bufferization, except for our local TCP dialect (which will probably go
away and be replaced by std elementwise + linalg named ops on tensors :)
).
The current code was inserting all build_list ops
after the last constant op since it was assuming that all
elements being passed in were constants.
This patch replaces that patch with a new function that
inserts the build_list ops before the terminator.
Also modifies test_export_conv2d_fwd.py since its output
no longer matches.
TEST: Added test_export_cat.py which is the code in #102
* This is sufficient to capture the forward and backward pass and gradients of a convolutional model with an nllloss.
* As with the forward conv, the backward conv is a special case wrapped in an enigma on the PyTorch side. There aren't many like it, so special casing is just what we do.
* When I traced this, I found that the copy_ op is not yet boxing compatible so I had to map it manually. If there are many more like this, I'll probably do something a bit more clever to reduce duplication.
* This exposes new signature patterns that will need to be handled by the ATen lowering. Will take care of that next: It will be nice to have an e2e of a non-trivial case with full gradients.
* Fixes#97.
* None's out Device? args.
* Emits bool tensors if needed.
* Adds some stderr tracing to better see what is going on.
* Test case that exercises NLLLoss.
* This test case emits something for backward calculations but there are some issues still to be worked out, so that part is left out of the test case.
* Progress on #97
* Deletes prior code generator from previous attempt (moved some of it into this one).
* Renames old generated tablegen source to "Legacy".
* Generates ODS and import rules for most binary and unary arithmetic ops.
* Removes old generated ops and integration tests that were testing details of the prior setup.
Register the following for the multiply op:
- tcf.mul
- tcp.mul
- TCP->TCP lowering
- Shape transfer, broadcasted multiplicands
- Lower to standard `MulFOp` op
* Two op interfaces, one for querying instance metadata and one for getting static data needed to construct an op from a generic form.
* For torch.generic_kernel ops, metadata is splatted in during capture from Torch (it comes from the op registry, which will work for either device capture or graph import).
* Moved the 'add' out of the generated set so I can experiment on it. It implements the TorchBuildableKernelOpInterface interface which provides its metadata.
* The ATenRecognizeKernelsPass pass generically lowers from a torch.generic_kernel to recognized ops that implement the TorchBuildableKernelOpInterface, handling the various types of transformations that we allow at this stage.
Fix linker error:
lib/Python/libNPCOMPPythonCommon.a(MlirInit.cpp.o): in function `mlir::npcomp::python::npcompMlirInitialize()':
mlir-npcomp/build/../lib/Python/MlirInit.cpp:46: undefined reference to `npcompInitializeLLVMCodegen'
* Enables the conv2d fwd test and ResA (which are both small).
* Deletes resnet18 and vgg, which both run but generate output that crashes FileCheck and lit (or at least makes them take an eternity).
* Adds Basicpy List, Tuple, Dict types and plumbs through C API.
* Started debugging the issues around aten::conv2d capture, but a PyTorch bug is suspected.
* Was able to manually verify that the basic conv2d forward test captures correctly with a workaround.
* Need to resolve some printing issues upstream and move these tests to an integration test target (they take ~seconds to run).
* Now gets far enough to capture batch_norm.
* Has some issues still with in-place ops.
* Can materialize constants.
* Includes an upgrade to PyTorch nightly, which has important bug fixes for fallback and boxed kernel dispatch.
* Fixes#78, #79, #80.
* Will do more testing in a follow-up once further bugs are fixed that facilitate getting at the other features.
The time has come for BypassShapes/LowerShapedResultsToMemref to go away :(
For the reference backend, being consistent with upstream conventions is
the name of the game now.
This is a step down in a number of ways, e.g. test clarity and
separation of concerns. But it is fewer files and fewer tests, and
*does* address the "TODO: This is really fragile". It also eliminates two
more ops from the refback dialect (sadly, they are the
shaped_results/yield that we were getting kind of fond of, but alas).
Otherwise, on my machine MLIR somehow gets configured with Python 2,
which is not supported.
The `probe_python` stuff is copied from cmake_configure.sh
Now that it has grown source/target materialization capabilities
(spelled with ops tensor_load/tensor_to_memref), we can use it. We can
also now delete refback.memref_to_tensor/refback.tensor_to_memref.
This is also a first step to reducing the downstream functionality
needed in the refback dialect.
* Adds a trampoline/loader 'torch_mlir' module.
* Plumbs through the MLIR python Context and Module creation, interoping with the MLIR Python API (resolves TODO on creating with own context and accessing the module being built).
* Inter-module Python API interop is still a bit rough but workable via the capsule mechanism. Can be evolved later.
* Exports the frontends/pytorch python sources to the project python/ build directory.
* Requires D89294 to land.
* Also adds two lit tests to verify that all of our extensions load without fireworks, which is a good indication that the shared library deps are sane.
* Bumps llvm-project version to use D89167.
Now the reference backend is cleanly accepts "TCP"+scalar ops.
We introduce tcf-refback-lowering-pipeline which also does TCF->TCP
conversion for convenience until we have a "target interface".
* Need to have a dag of shared library deps in order to interop across python extensions (as presented in ODM).
* Introduced add_npcomp_library and friends to mirror the MLIR setup.
* Adds a libNPCOMP.so shared library.
* Redirects tools and extensions to link against libNPCOMP.so (instead of static libs).
* Moves all libraries to lib/, all binaries to bin/ and all python extensions to python/. The invariant is that the rpaths are setup to have a one level directory structure.
* Reworks the _torch_mlir extension to build like the others (still need to come up with a consolidated rule to do this instead of open coded).
* Includes an upstream version bump to pick up needed changes.
Sizes with dynamic linking (stripped, release, asserts enabled):
libNPCOMP.so: 43M (includes much of the underlying LLVM codegen deps)
libMLIR.so: 31M
_npcomp.so: 1.6M (python extension)
_torch_mlir.so: 670K (python extension)
npcomp-capi-ir-test: 6.3K
npcomp-opt: 351K
npcomp-run-mlir: 461K
mnist-playground: 530K
Still more can be done to normalize and optimize but this gets us structurally to the starting point.
I now realize that VerboseCamelCase is not the best choice for dialect
directory/file names and C++ identifiers (take e.g. "Linalg", "Basicpy",
etc. as prior art here; not LinearAlgebra or BasicPython). If I had to
name the convention it seems to be "Shortword" (or of course just
acronym dialects like LLVM, SCF, etc.).
This rename also has the side benefit of differentiating RefBackend
directories, which now refer to the actual backend itself, from
Refback/Refbackrt, which are the dialects which happen to be used by
that backend.
Other than the dialect definitions (which will live in standard Dialect/
subdirectory), the goal here is to keep RefBackend-related code nested
in {include/npcomp,lib,test}/RefBackend.