Although `refCount` is initialized as `std::atomic<int> refCount{0};` in
the definition of Tensor, our tail-allocating malloc would ignore it,
resulting in bogus values that led to leaks.
Caught with LeakSanitizer, but I added an assertion that the refcount is
non-negative to begin with, which should catch this bug in the future
fairly consistently (assuming the garbage refcount is negative half the
time).
This vastly simplifies our code, allowing deleting multiple ops,
simplifying multiple passes, and removing a whole pass.
Now `refback` dialect is down to one op (refback.alloc_memref, which
simplifies allocations to just take a shape instead of individual
extents).
* Need to have a dag of shared library deps in order to interop across python extensions (as presented in ODM).
* Introduced add_npcomp_library and friends to mirror the MLIR setup.
* Adds a libNPCOMP.so shared library.
* Redirects tools and extensions to link against libNPCOMP.so (instead of static libs).
* Moves all libraries to lib/, all binaries to bin/ and all python extensions to python/. The invariant is that the rpaths are setup to have a one level directory structure.
* Reworks the _torch_mlir extension to build like the others (still need to come up with a consolidated rule to do this instead of open coded).
* Includes an upstream version bump to pick up needed changes.
Sizes with dynamic linking (stripped, release, asserts enabled):
libNPCOMP.so: 43M (includes much of the underlying LLVM codegen deps)
libMLIR.so: 31M
_npcomp.so: 1.6M (python extension)
_torch_mlir.so: 670K (python extension)
npcomp-capi-ir-test: 6.3K
npcomp-opt: 351K
npcomp-run-mlir: 461K
mnist-playground: 530K
Still more can be done to normalize and optimize but this gets us structurally to the starting point.
Other than the dialect definitions (which will live in standard Dialect/
subdirectory), the goal here is to keep RefBackend-related code nested
in {include/npcomp,lib,test}/RefBackend.