* We're building libLLVM.so anyway. Saves a lot of time/space to link tools against it.
* MLIR tools do not yet respect this (but it doesn't seem to hurt).
* Incorporates a dep on the new MLIRPublicAPI shared library.
* More work is needed to further separate npcomp between public API and impl libraries, but amalgamating them will hold until then.
* Conversions are very simple, suporting mul, maximum and add (alpha=1 only).
* Example added with pass pipeline needed to run.
* Much missing off of the golden path but sufficient for such simple cases.
* convolution, convolution_backward, _log_softmax, _log_softmax_backward_data, nll_loss_forward, nll_loss_backward, nll_loss2d_forward, nll_loss2d_backward, copy_
* Extends the recognition logic and metadata for handling inplace transformations, optional tensors, ints, lists and dropped args.
* The kernel_calls generated by test_conv_nllloss_grads.py now convert to ATen.
* The result *almost* comes out as a pure tensor program with the exception of the copy_ op, which I will do some followup work to deal with.
* More progress on #97
Two changes:
- no more "verifyPasses" constructor arg for PassManager
- OpPassManager defaults to requiring explicit "nest" calls when created
via the C++ API. The behavior upstream for mlir-opt still obeys the
"implicit" mode, so I just slapped that onto all our pass managers.
I pinged https://reviews.llvm.org/D90671 to get a signal for whether we
are expected to migrate to explicit mode. If so, I'll do that too later.
* Enables -gsplit-dwarf for both LLVM and NPCOMP, reducing the occurrence of the ~GB scale binaries.
* CMake shared linking seems incompatible with this, so shared objects are still "too big" but there are few of them.
* Reduces disk thrash on clean/install of everything.
Now, the only bufferization we have left is lowering tensor constants to
memref, which will hopefully proceed soon after Rahul's new
std.global_memref lands + the lowering to LLVM IR. Then I'll port
LowerConstantTensorsToMemref to upstream and we'll be 100% upstream
bufferization, except for our local TCP dialect (which will probably go
away and be replaced by std elementwise + linalg named ops on tensors :)
).
The current code was inserting all build_list ops
after the last constant op since it was assuming that all
elements being passed in were constants.
This patch replaces that patch with a new function that
inserts the build_list ops before the terminator.
Also modifies test_export_conv2d_fwd.py since its output
no longer matches.
TEST: Added test_export_cat.py which is the code in #102
* This is sufficient to capture the forward and backward pass and gradients of a convolutional model with an nllloss.
* As with the forward conv, the backward conv is a special case wrapped in an enigma on the PyTorch side. There aren't many like it, so special casing is just what we do.
* When I traced this, I found that the copy_ op is not yet boxing compatible so I had to map it manually. If there are many more like this, I'll probably do something a bit more clever to reduce duplication.
* This exposes new signature patterns that will need to be handled by the ATen lowering. Will take care of that next: It will be nice to have an e2e of a non-trivial case with full gradients.
* Fixes#97.
* None's out Device? args.
* Emits bool tensors if needed.
* Adds some stderr tracing to better see what is going on.
* Test case that exercises NLLLoss.
* This test case emits something for backward calculations but there are some issues still to be worked out, so that part is left out of the test case.
* Progress on #97
* Deletes prior code generator from previous attempt (moved some of it into this one).
* Renames old generated tablegen source to "Legacy".
* Generates ODS and import rules for most binary and unary arithmetic ops.
* Removes old generated ops and integration tests that were testing details of the prior setup.
Register the following for the multiply op:
- tcf.mul
- tcp.mul
- TCP->TCP lowering
- Shape transfer, broadcasted multiplicands
- Lower to standard `MulFOp` op
* Two op interfaces, one for querying instance metadata and one for getting static data needed to construct an op from a generic form.
* For torch.generic_kernel ops, metadata is splatted in during capture from Torch (it comes from the op registry, which will work for either device capture or graph import).
* Moved the 'add' out of the generated set so I can experiment on it. It implements the TorchBuildableKernelOpInterface interface which provides its metadata.
* The ATenRecognizeKernelsPass pass generically lowers from a torch.generic_kernel to recognized ops that implement the TorchBuildableKernelOpInterface, handling the various types of transformations that we allow at this stage.
Fix linker error:
lib/Python/libNPCOMPPythonCommon.a(MlirInit.cpp.o): in function `mlir::npcomp::python::npcompMlirInitialize()':
mlir-npcomp/build/../lib/Python/MlirInit.cpp:46: undefined reference to `npcompInitializeLLVMCodegen'
* Enables the conv2d fwd test and ResA (which are both small).
* Deletes resnet18 and vgg, which both run but generate output that crashes FileCheck and lit (or at least makes them take an eternity).
* Adds Basicpy List, Tuple, Dict types and plumbs through C API.
* Started debugging the issues around aten::conv2d capture, but a PyTorch bug is suspected.
* Was able to manually verify that the basic conv2d forward test captures correctly with a workaround.
* Need to resolve some printing issues upstream and move these tests to an integration test target (they take ~seconds to run).
* Now gets far enough to capture batch_norm.
* Has some issues still with in-place ops.
* Can materialize constants.
* Includes an upgrade to PyTorch nightly, which has important bug fixes for fallback and boxed kernel dispatch.
* Fixes#78, #79, #80.
* Will do more testing in a follow-up once further bugs are fixed that facilitate getting at the other features.
The time has come for BypassShapes/LowerShapedResultsToMemref to go away :(
For the reference backend, being consistent with upstream conventions is
the name of the game now.
This is a step down in a number of ways, e.g. test clarity and
separation of concerns. But it is fewer files and fewer tests, and
*does* address the "TODO: This is really fragile". It also eliminates two
more ops from the refback dialect (sadly, they are the
shaped_results/yield that we were getting kind of fond of, but alas).
Otherwise, on my machine MLIR somehow gets configured with Python 2,
which is not supported.
The `probe_python` stuff is copied from cmake_configure.sh
Now that it has grown source/target materialization capabilities
(spelled with ops tensor_load/tensor_to_memref), we can use it. We can
also now delete refback.memref_to_tensor/refback.tensor_to_memref.
This is also a first step to reducing the downstream functionality
needed in the refback dialect.
* Adds a trampoline/loader 'torch_mlir' module.
* Plumbs through the MLIR python Context and Module creation, interoping with the MLIR Python API (resolves TODO on creating with own context and accessing the module being built).
* Inter-module Python API interop is still a bit rough but workable via the capsule mechanism. Can be evolved later.
* Exports the frontends/pytorch python sources to the project python/ build directory.
* Requires D89294 to land.
* Also adds two lit tests to verify that all of our extensions load without fireworks, which is a good indication that the shared library deps are sane.
* Bumps llvm-project version to use D89167.
Now the reference backend is cleanly accepts "TCP"+scalar ops.
We introduce tcf-refback-lowering-pipeline which also does TCF->TCP
conversion for convenience until we have a "target interface".
* Need to have a dag of shared library deps in order to interop across python extensions (as presented in ODM).
* Introduced add_npcomp_library and friends to mirror the MLIR setup.
* Adds a libNPCOMP.so shared library.
* Redirects tools and extensions to link against libNPCOMP.so (instead of static libs).
* Moves all libraries to lib/, all binaries to bin/ and all python extensions to python/. The invariant is that the rpaths are setup to have a one level directory structure.
* Reworks the _torch_mlir extension to build like the others (still need to come up with a consolidated rule to do this instead of open coded).
* Includes an upstream version bump to pick up needed changes.
Sizes with dynamic linking (stripped, release, asserts enabled):
libNPCOMP.so: 43M (includes much of the underlying LLVM codegen deps)
libMLIR.so: 31M
_npcomp.so: 1.6M (python extension)
_torch_mlir.so: 670K (python extension)
npcomp-capi-ir-test: 6.3K
npcomp-opt: 351K
npcomp-run-mlir: 461K
mnist-playground: 530K
Still more can be done to normalize and optimize but this gets us structurally to the starting point.