2021-09-11 02:44:38 +08:00
|
|
|
#!/bin/bash
|
|
|
|
# Updates auto-generated ODS files for the `torch` dialect.
|
2022-06-14 05:51:30 +08:00
|
|
|
#
|
|
|
|
# Environment variables:
|
|
|
|
# TORCH_MLIR_EXT_MODULES: comma-separated list of python module names
|
|
|
|
# which register custom PyTorch operators upon being imported.
|
|
|
|
# TORCH_MLIR_EXT_PYTHONPATH: colon-separated list of paths necessary
|
|
|
|
# for importing PyTorch extensions specified in TORCH_MLIR_EXT_MODULES.
|
|
|
|
# For more information on supporting custom operators, see:
|
|
|
|
# ${TORCH_MLIR}/python/torch_mlir/_torch_mlir_custom_op_example/README.md
|
|
|
|
|
2022-07-07 05:39:30 +08:00
|
|
|
set -euo pipefail
|
2021-09-11 02:44:38 +08:00
|
|
|
|
2022-07-07 05:39:30 +08:00
|
|
|
src_dir="$(realpath "$(dirname "$0")"/..)"
|
2021-09-11 02:44:38 +08:00
|
|
|
build_dir="$(realpath "${TORCH_MLIR_BUILD_DIR:-$src_dir/build}")"
|
Reduce compilation time for TorchOps.cpp.inc
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
2022-03-19 05:04:47 +08:00
|
|
|
torch_ir_include_dir="${src_dir}/include/torch-mlir/Dialect/Torch/IR"
|
2022-11-04 21:13:02 +08:00
|
|
|
|
|
|
|
in_tree_pkg_dir="${build_dir}/tools/torch-mlir/python_packages"
|
|
|
|
out_of_tree_pkg_dir="${build_dir}/python_packages"
|
|
|
|
|
|
|
|
if [[ ! -d "${in_tree_pkg_dir}" && ! -d "${out_of_tree_pkg_dir}" ]]; then
|
|
|
|
echo "Couldn't find in-tree or out-of-tree build, exiting."
|
|
|
|
exit 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
# The `-nt` check works even if one of the two directories is missing.
|
|
|
|
if [[ "${in_tree_pkg_dir}" -nt "${out_of_tree_pkg_dir}" ]]; then
|
|
|
|
python_packages_dir="${in_tree_pkg_dir}"
|
|
|
|
else
|
|
|
|
python_packages_dir="${out_of_tree_pkg_dir}"
|
|
|
|
fi
|
2021-09-11 02:44:38 +08:00
|
|
|
|
2022-08-03 04:35:56 +08:00
|
|
|
TORCH_MLIR_EXT_PYTHONPATH="${TORCH_MLIR_EXT_PYTHONPATH:-""}"
|
2022-06-14 05:51:30 +08:00
|
|
|
pypath="${python_packages_dir}/torch_mlir"
|
2022-08-03 04:35:56 +08:00
|
|
|
if [ ! -z ${TORCH_MLIR_EXT_PYTHONPATH} ]; then
|
|
|
|
pypath="${pypath}:${TORCH_MLIR_EXT_PYTHONPATH}"
|
|
|
|
fi
|
|
|
|
TORCH_MLIR_EXT_MODULES="${TORCH_MLIR_EXT_MODULES:-""}"
|
|
|
|
ext_module="${ext_module:-""}"
|
|
|
|
if [ ! -z ${TORCH_MLIR_EXT_MODULES} ]; then
|
|
|
|
ext_module="${TORCH_MLIR_EXT_MODULES}"
|
|
|
|
fi
|
2022-06-14 05:51:30 +08:00
|
|
|
|
|
|
|
PYTHONPATH="${pypath}" python \
|
2021-09-21 04:55:36 +08:00
|
|
|
-m torch_mlir.dialects.torch.importer.jit_ir.build_tools.torch_ods_gen \
|
Reduce compilation time for TorchOps.cpp.inc
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
2022-03-19 05:04:47 +08:00
|
|
|
--torch_ir_include_dir="${torch_ir_include_dir}" \
|
2022-08-03 04:35:56 +08:00
|
|
|
--pytorch_op_extensions="${ext_module}" \
|
Reduce compilation time for TorchOps.cpp.inc
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
2022-03-19 05:04:47 +08:00
|
|
|
--debug_registry_dump="${torch_ir_include_dir}/JITOperatorRegistryDump.txt"
|