torch-mlir/test/npcomp-run-mlir/constant-add-scalar.mlir

// RUN: npcomp-run-mlir %s \
// RUN:   -invoke constant_add_scalar \
// RUN:   -arg-value="dense<3.0> : tensor<f32>" \
// RUN:   -shared-libs=%npcomp_runtime_shlib 2>&1 \
// RUN:   | FileCheck %s

// CHECK: output #0: dense<4.000000e+00> : tensor<f32>
func @constant_add_scalar(%arg0: tensor<f32>) -> tensor<f32> {
  %0 = constant dense<1.0> : tensor<f32>
  %1 = tcf.add %arg0, %0 : (tensor<f32>, tensor<f32>) -> tensor<f32>
  return %1 : tensor<f32>
}
Make input file to npcomp-run-mlir be positional. This makes command lines more succinct. 2020-07-14 07:00:54 +08:00			`// RUN: npcomp-run-mlir %s \`
npcomprt: add support for constants - create tcp.global + tcp.get_global_memref - create npcomprt.global + npcomprt.get_global - LLVM lowering for new npcomprt ops - Runtime: - GlobalDescriptor struct emitted by LLVM lowering - implement __npcomp_compiler_rt_get_global Also, - cleanly isolate all runtime data structure definitions shared by the compiler and runtime into lib/runtime/CompilerDataStructures.h 2020-07-11 08:31:24 +08:00			`// RUN: -invoke constant_add_scalar \`
			`// RUN: -arg-value="dense<3.0> : tensor<f32>" \`
			`// RUN: -shared-libs=%npcomp_runtime_shlib 2>&1 \`
			`// RUN: \| FileCheck %s`

			`// CHECK: output #0: dense<4.000000e+00> : tensor<f32>`
			`func @constant_add_scalar(%arg0: tensor<f32>) -> tensor<f32> {`
			`%0 = constant dense<1.0> : tensor<f32>`
[RefE2E] Add assemblyFormat for TCF and TCP ops and tidy up. 2020-09-19 05:05:36 +08:00			`%1 = tcf.add %arg0, %0 : (tensor<f32>, tensor<f32>) -> tensor<f32>`
npcomprt: add support for constants - create tcp.global + tcp.get_global_memref - create npcomprt.global + npcomprt.get_global - LLVM lowering for new npcomprt ops - Runtime: - GlobalDescriptor struct emitted by LLVM lowering - implement __npcomp_compiler_rt_get_global Also, - cleanly isolate all runtime data structure definitions shared by the compiler and runtime into lib/runtime/CompilerDataStructures.h 2020-07-11 08:31:24 +08:00			`return %1 : tensor<f32>`
[RefBackend] Fix leaks related to ABI boundaries. Best as I can tell (e.g. from LeakSanitizer), this fixes all the leaks except for those due to buffers created internally to the codegenned code itself (up next I'll add the buffer deallocation pass to fix those). The main change is that instead of attempting to pass `refbackrt::Tensor` to the codegenned function directly, we make all the ABI types be UnrankedMemRef which gets passed awkwardly (but workably) as a `{size_t rank, void ptrToDescriptor}` on the ABI. The reason why refbackrt::Tensor wasn't workable is that is that MLIR doesn't really have a way to deal with the lifetime of unranked memref descriptors that happen inside the function, which is inevitably what would happen in the old code that would emit runtime calls to `refbackrt.to_memref/refbackrt.from_memref` to convert back and forth to `refbackrt::Tensor` inside the codegenned code. So, instead of the `refbackrt.to_memref/refbackrt.from_memref` with no real sound basis for valid lifetime management, we now have a lovely piece of code in `refbackrt::invoke` in `Runtime.cpp` that just barely seems to be sound. We rely on the codegenned code having these properties, which it seems to have: - it won't free memref descriptors or their backing buffer for arguments of UnrankedMemRef type. - it will allocate a separate memref descriptor for each result UnrankedMemRef (which is ensured by having a separate memref_cast for each) - we can sniff the `allocatedPtr`'s (i.e. the backing buffer pointers) to avoid double-freeing in the case of aliasing of the backing buffer (including backing buffers for arguments feeding into results) - to catch the case of statically allocated data (which we need to avoid passing to `free`) , check if the `allocatedPtr` is (no joke) equal to `0xDEADBEEF`, because there is otherwise no way to distinguish statically allocated from malloc'ed data... (std.global_memref lowering to LLVM by happenstance sets the allocatedPtr equal to `0xDEADBEEF`, presumably mainly as a debugging thing) Even with all this, we still* need to (internally to refbackrt::invoke) make copies of all inputs/outputs! And the details of how the LLVM-level ABI gets laid out for e.g. function arguments/returns is still super tricky. This really highlights how deficient memref is as the general runtime type for our use case. It's stewing in my mind how best to improve the situation. My general gut feeling is that IREE's abstractions for this are "right", but I need to think more how to distill those aspects of IREE's design in a "reference" way for RefBackend. Some implementation notes: - In terms of how this is implemented, this did catch a bug in our ABI wrapper functions in LowerToLLVM.cpp, which I had to fix (it happened to work before through some combination of npcomprt::Tensor being passed as a single pointer + probably me infinite-monkey-ing it until it worked) - This actually removes 2 out of the 3 compiler runtime functions (the only one left is "abort_if". (most of the memref descriptor code moved from CopmilerRuntime.cpp to Runtime.cpp) - this also means deleting `refbackrt.from_memref` and `refbackrt.to_memref` 2020-11-25 09:18:57 +08:00			`}`