2021-09-11 02:44:38 +08:00
|
|
|
#!/bin/bash
|
|
|
|
# Updates auto-generated ODS files for the `torch` dialect.
|
2022-06-14 05:51:30 +08:00
|
|
|
#
|
|
|
|
# Environment variables:
|
|
|
|
# TORCH_MLIR_EXT_MODULES: comma-separated list of python module names
|
|
|
|
# which register custom PyTorch operators upon being imported.
|
|
|
|
# TORCH_MLIR_EXT_PYTHONPATH: colon-separated list of paths necessary
|
|
|
|
# for importing PyTorch extensions specified in TORCH_MLIR_EXT_MODULES.
|
|
|
|
# For more information on supporting custom operators, see:
|
|
|
|
# ${TORCH_MLIR}/python/torch_mlir/_torch_mlir_custom_op_example/README.md
|
|
|
|
|
2022-07-07 05:39:30 +08:00
|
|
|
set -euo pipefail
|
2021-09-11 02:44:38 +08:00
|
|
|
|
2022-07-07 05:39:30 +08:00
|
|
|
src_dir="$(realpath "$(dirname "$0")"/..)"
|
2021-09-11 02:44:38 +08:00
|
|
|
build_dir="$(realpath "${TORCH_MLIR_BUILD_DIR:-$src_dir/build}")"
|
Reduce compilation time for TorchOps.cpp.inc
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
2022-03-19 05:04:47 +08:00
|
|
|
torch_ir_include_dir="${src_dir}/include/torch-mlir/Dialect/Torch/IR"
|
2022-11-04 21:13:02 +08:00
|
|
|
|
|
|
|
in_tree_pkg_dir="${build_dir}/tools/torch-mlir/python_packages"
|
|
|
|
out_of_tree_pkg_dir="${build_dir}/python_packages"
|
|
|
|
|
|
|
|
if [[ ! -d "${in_tree_pkg_dir}" && ! -d "${out_of_tree_pkg_dir}" ]]; then
|
|
|
|
echo "Couldn't find in-tree or out-of-tree build, exiting."
|
|
|
|
exit 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
# The `-nt` check works even if one of the two directories is missing.
|
|
|
|
if [[ "${in_tree_pkg_dir}" -nt "${out_of_tree_pkg_dir}" ]]; then
|
|
|
|
python_packages_dir="${in_tree_pkg_dir}"
|
|
|
|
else
|
|
|
|
python_packages_dir="${out_of_tree_pkg_dir}"
|
|
|
|
fi
|
2021-09-11 02:44:38 +08:00
|
|
|
|
2022-08-03 04:35:56 +08:00
|
|
|
TORCH_MLIR_EXT_PYTHONPATH="${TORCH_MLIR_EXT_PYTHONPATH:-""}"
|
2022-06-14 05:51:30 +08:00
|
|
|
pypath="${python_packages_dir}/torch_mlir"
|
2022-08-03 04:35:56 +08:00
|
|
|
if [ ! -z ${TORCH_MLIR_EXT_PYTHONPATH} ]; then
|
|
|
|
pypath="${pypath}:${TORCH_MLIR_EXT_PYTHONPATH}"
|
|
|
|
fi
|
|
|
|
TORCH_MLIR_EXT_MODULES="${TORCH_MLIR_EXT_MODULES:-""}"
|
|
|
|
ext_module="${ext_module:-""}"
|
|
|
|
if [ ! -z ${TORCH_MLIR_EXT_MODULES} ]; then
|
|
|
|
ext_module="${TORCH_MLIR_EXT_MODULES}"
|
|
|
|
fi
|
2022-06-14 05:51:30 +08:00
|
|
|
|
Update Torch ODS list with new ops (#2361)
* [LTC] Add shape_inference_(add|uniform)
* Add torch.multinomial op.
* Update ods gen; add normal_functional and erfinv ops support
* New TorchMLIR ops: clamp_min.Tensor, clamp_max.Tensor, xlogy, binary_cross_entropy, log_sigmoid_forward, sigmoid_backward, cosine_embedding_loss, scatter.reduce
* Improve the shape inference logic of whereOp
- Infer the result tensor according to the broadcasting semantics
Signed-off-by: rahul shrivastava <rahul.shrivastava@cerebras.net>
* Added aten::sgn
* Add shape inference logic for hardtanh_backward op
* Added new Torch-MLIR ops
Co-authored-by: GlebKazantaev <gleb.nnstu@gmail.com>
* Add support for elu lowering
* Add support for elu_backward lowering
* Support fmod, remainder, and floor_divide
Emit generated op defs for the remainder.Tensor and fmod.Tensor
Add shape inference impelementations for remainder.Scalar, fmod.Scalar
and floor_divide.Tensor
* Add shape inference logic for im2col
- pytorch.nn.unfold gets decomposed into im2col
Signed-off-by: rahul shrivastava <rahul.shrivastava@cerebras.net>
* Add aten::eye and aten::eye.m support
* Add tracing for linalg_qr
* Update GeneratedTorchOps.td
* Update xfails
* Fix unbound variable issue in torch_ods_gen
---------
Signed-off-by: rahul shrivastava <rahul.shrivastava@cerebras.net>
Co-authored-by: Mark Browning <mark@cerebras.net>
Co-authored-by: zihaoc-cerebras <zihao.chen@cerebras.net>
Co-authored-by: rahul shrivastava <rahul.shrivastava@cerebras.net>
Co-authored-by: Gokul Ramakrishnan <gokul.ramakrishnan@cerebras.net>
Co-authored-by: glebk-cerebras <111300564+glebk-cerebras@users.noreply.github.com>
Co-authored-by: Behzad Abghari <behzad.abghari@gmail.com>
Co-authored-by: Ahmed Elkoushy <ahmed.elkoushy@cerebras.net>
2023-08-21 18:36:39 +08:00
|
|
|
set +u
|
|
|
|
PYTHONPATH="${PYTHONPATH}:${pypath}" python \
|
2021-09-21 04:55:36 +08:00
|
|
|
-m torch_mlir.dialects.torch.importer.jit_ir.build_tools.torch_ods_gen \
|
Reduce compilation time for TorchOps.cpp.inc
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
2022-03-19 05:04:47 +08:00
|
|
|
--torch_ir_include_dir="${torch_ir_include_dir}" \
|
2022-08-03 04:35:56 +08:00
|
|
|
--pytorch_op_extensions="${ext_module}" \
|
Reduce compilation time for TorchOps.cpp.inc
The `assemblyFormat` stuff (which generates unrolled, per-op C++ code)
was taking up a lot of compile time, and all the ops are essentially
printed with the same logic. So this PR makes them all call the same
helper function. This is done by using
`let hasCustomAssemblyFormat = 1` and then implementing `FooOp::parse`
and `FooOp::print`.
Additionally, the `Generated*Ops.td` files are all collapsed into just
`GeneratedTorchOps.td` (there is no reason to have the files separate,
since the files are very large anyway so one is always having to search
within them -- editors don't care that the file to search is now a bit
bigger :) ).
This reduces TorchOpsODSGenerated.cpp compile time (which is now
GeneratedTorchOps.cpp) from 39 to 31 seconds on my machine. This is
actually less than I expected, but this PR is an overall cleanup to the
code anyway. The next step will be to introduce (better) functionality
upstream for sharding the TorchOps.cpp.inc file, so that we can truly
parallelize the O(#ops) costs. This is also necessary, because after
this PR, TorchDialect.cpp is now the slowest file to compile, due to the
`addOperations<... all the ops ...>` call, which needs to be shareded
too.
2022-03-19 05:04:47 +08:00
|
|
|
--debug_registry_dump="${torch_ir_include_dir}/JITOperatorRegistryDump.txt"
|