* llvm-project: b5924a8e27536d19dd5c4d302db29fb6163d5faa
* mhlo: 848ca244d20f045b7921da55a98a04d95ef94f0e
* Multiple breakages that need to be fixed.
Fixes:
* Refactor dialect registration
* Remove all kindof methods (Casting functionality has been added upstream and is implicitly
available, see https://llvm.discourse.group/t/removing-kinds-from-attributes-and-types/1547.)
* Update dialect registration to comply with https://reviews.llvm.org/D85495.
* Remove type kinds and update some changed dialect signatures.
* Upgrade ATen dialect to match upstream needs.
* Move dialect registration to tablegen.
* Register the ListType in tablegen.
* Change dialect initialization signature.
* Use TypeSwitch in MlirIr location printer.
* Remove global registry depends from npcomp-opt.
* Change LowerToLLVM to pass an MLIRContext vs an LLVMDialect for type creation.
* Remove dep on MLIREDSCInterface that is removed upstream.
* Thread through the DialectRegistry for opt and python-like tools.
* Modernize pass registration (This was forced because the GEN_PASS_REGISTRATION code now generates inline functions vs literal pass registration statements)
Co-authored-by: Marius Brehler <marius.brehler@iml.fraunhofer.de>
This patch adds a dialect intended to be used as a frontend dialect
to facilitate lowering from "A Tensor Library" in torch/pytorch.
This patch includes several passes that are useful in conjuction with the
dialect:
--aten-layer-name: Generates layer names for each operation, which are not
present in the original pytorch.
--aten-to-std: Lower the ATen dialect into standard dialect function calls.
--return-elimination-pass: convert functions (primarily the toplevel function)
to pass return values by reference. This simplifies pytorch integration.
--aten-op-report: generate a textual report about the model
--liveness-report
Future patches will implement actual integration with the pytorch jit to
intercept and generates MLIR in this dialect, then lower the resulting MLIR
into function calls through aten-layer-name -> aten-to-std ->
return-elimination -> std-to-llvm. The result would then jitted using the LLVM
jit, linked against a runtime library which makes calls back into pytorch to
implement all the layers.
Co-authored-by: Jeff Fifield <jeff.fifield@xilinx.com>
Co-authored-by: Jeff Fifield <jeff.fifield@xilinx.com>
Mostly this is CMake cleanup. Several library dependencies are missing, which
is often revealed with shared library builds. Also, it's generally bad to
link directly against LLVM libraries because it fails when using
LLVM_LINK_LLVM_DYLIB. MLIR will pull in libLLVM.so, and there will be
duplicate linkage with the the explicit libraries. There may need to be more
refactoring here.
* Since the manylinux images do not hard-link against python libs (resolving them at runtime), the module must be built without resolving undefined references.
* For some reason, builds on this platform are stricter, exposing dependency ordering issues.
* The CMake bits to build the extension are still somewhat of a mess. I have better versions both upstream and in IREE and will be reconciling. For now, don't look too closely.
* Primarily, the upstream shape dialect now uses tensor<?xindex> for non-erroring, immediate shape calculations (and will return this for shape_of of a tensor or memref).
* In addition, upstream passes do not yet exist for fully lowering to standard ops, so the passes here need to be extended to handle this new convention.
* This should be seen as an intermediate state, necessary to integrate a new LLVM version and needs more work and cleanup for generality.
* There is a good deal of awkwardness in these conversions. The hope is that additional upstream work will yield better defined conversion paths once out of this intermediate state.
My main interest in this is that tweaking the default of this flag is a
quick way to check for miscompiling canonicalizations / op definitions
not annotated properly (e.g. marked NoSideEffect when in fact it is not
safe to do so).
* Enables e2e test.
* With what I've learned in upstream about test directory layout, I can consolidate most of the separate directories we have for these things. Will do that in a followup.
* Not pleased with the LLVM global initialization depends but serviceable for now.
This required making module descriptors hold a FuncDescriptor* instead
of a pointer to array of FuncDescriptors as it previously did, which is
innocuous (just requires an llvm.bitcast after the llvm.mlir.addressof).
* Rewrites public function signatures to operate on tensors (vs ndarray).
* Most of our backends presume immutable tensors at public function boundaries.
Also, the previous code had a special case for deleting this op when it
had no uses. This is subsumed by the change in this commit since now
shape.const_shape is properly lowered.
With this change, the included test case with multiple serially
dependent ops works!
This specific issue was related to the scalar argument to that
function. We needed to compute a broadcast of a scalar shape (which is a
shape.const_shape) with another shape.
This ~totally reworks the existing "runtime" stuff to be more
principled and usable, such as from Python. It's still not fully
production-quality, mainly in the department of memory management (e.g.
it currently leaks memory; we need to figure out "who frees memrefs" +
the analysis and transformation needed to do that (maybe use upstream
buffer allocation pass?)).
The user API is in include/npcomp/runtime/UserAPI.h, though
include/npcomp/JITRuntime/JITModule.h is a friendlier wrapper.
The stuff under {include,lib}/runtime is totally firewalled from the
compiler and tiny (<6kB, though no attention has gone into optimizing
that size). For example, we don't link in libSupport into the runtime,
instead having our own bare bones replacements for basics like ArrayRef
(the JITRuntime helps with bridging that gap, since it *can* depend on
all common LLVM utilities).
The overall features of npcomprt is that it exposes a module that
with multiple function entry points. Each function has arguments and
results that are tensor-valued, and npcomprt::Tensor is the runtime type
that is used to interact with that (and a npcomprt::Ref<T>
reference-counting wrapper is provided to wrap npcomprt::Tensor in the
common case).
From an implementation perspective, an npcomprt module at the
LLVM/object/binary level exposes a single module descriptor struct that
has pointers to other metadata (currently just a list of function
metadata descriptors). All interactions with the npcomp runtime are
keyed off of that module descriptor, including function lookups and
dispatching. This is done to dodge platform ABI issues and also allow
enough reflection to e.g. verify provided arguments.
Most of the compiler-side work here was in LowerToNpcomprtABI and
LowerToLLVM.
Also,
- Rename npcomp_rt/NpcompRt to npcomprt/Npcomprt; it was getting
annoying to type the underscores/caps.
- misc improvements to bash_helpers.sh
* This elides the very common code the compiler adds for chaining otherwise tensor-related numpy ops together.
* More aggressive canonicalizations would require more advanced analysis.
* Preserving shape across the copy ops makes more thing shaped by default.
* Inference of ndarray types will now preserve the shape when specializing the dtype.
* Correctly infers the unknown dtypes that emit as part of compilation for the simple ufunc case.
* Significant more testing needs to be done on the details now that the pass is minimally functional.
* The actual pass itself is still too hacky/not general, but the underlying analysis is further along.
* Adds an op interface for adding CPA constraints.
* Adds a type conversion hook for handling built-in types (that we can't have adopt our interface).
* Converts tensor<> to object(!Tensor, [e:<type>]) just like NdArray.
* Implement a few numpy ops far enough to do dtype inference for simple sequences.