torch-mlir/python/CMakeLists.txt

120 lines
3.9 KiB
CMake
Raw Permalink Normal View History

Upstream the ONNX importer. (#2636) This is part 1 of 2, which will also include upstreaming the FX importer. I started with ONNX because it forces some project layout updates and is more self contained/easier as a first step. Deviating somewhat from the RFCs on project layout, I made the following decisions: * Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks already has opened up that namespace and it seemed to fit. Better to have fewer things at that level. * Setup the build so that the root project only contains MLIR Python and pure Python deps (like the importers), but this can be augmented with the `projects/` adding more depending on which features are enabled. * The default build continues to build everything whereas in `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a `torch-mlir-core` wheel with the pure contents only. `onnx_importer.py` and `importer_smoke_test.py` are almost verbatim copies from SHARK-Turbine. I made some minor local alterations to adapt to paths and generalize the way they interact with the outer project. I expect I can copy these back to Turbine verbatim from here. I also updated the license boilerplate (they have the same license but slightly different project norms for the headers) but retained the correct copyright. Other updates: * Added the ONNX importer unit test (which also can generate test data) in lit, conditioned on the availability of the Python `onnx` package. In a followup once I know everything is stable, I'll add another env var that the CI can set to always enable this so we know conclusively if tests pass. * Moved the ONNX conversion readme to `docs/`. * Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` -> `TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the JitIR importer and LTC options `cmake_dependent_options` for robustness.
2023-12-13 11:02:51 +08:00
# Disables generation of "version soname" (i.e. libFoo.so.<version>), which
# causes pure duplication as part of Python wheels.
set(CMAKE_PLATFORM_NO_VERSIONED_SONAME ON)
# The directory at which the Python import tree begins.
# See documentation for `declare_mlir_python_sources`'s ROOT_DIR
# argument.
set(TORCH_MLIR_PYTHON_ROOT_DIR "${CMAKE_CURRENT_SOURCE_DIR}/torch_mlir")
# We vendor our own MLIR instance in the `torch_mlir` namespace.
add_compile_definitions("MLIR_PYTHON_PACKAGE_PREFIX=torch_mlir.")
################################################################################
# Sources
################################################################################
declare_mlir_python_sources(TorchMLIRPythonSources)
declare_mlir_python_sources(TorchMLIRPythonExtensions)
declare_mlir_python_sources(TorchMLIRPythonSources.Dialects
ROOT_DIR "${TORCH_MLIR_PYTHON_ROOT_DIR}"
ADD_TO_PARENT TorchMLIRPythonSources
)
declare_mlir_dialect_python_bindings(
ADD_TO_PARENT TorchMLIRPythonSources.Dialects
ROOT_DIR "${TORCH_MLIR_PYTHON_ROOT_DIR}"
TD_FILE dialects/TorchBinding.td
SOURCES dialects/torch/__init__.py
DIALECT_NAME torch
)
declare_mlir_python_sources(TorchMLIRPythonSources.Importers
ROOT_DIR "${TORCH_MLIR_PYTHON_ROOT_DIR}"
ADD_TO_PARENT TorchMLIRPythonSources
SOURCES
[fx] Upstream the turbine FxImporter to torch-mlir. (#2681) Changes made during upstreaming: * Removed comments attributing some copied code back to torch-mlir (since it is now repatriated). * Re-organized imports. * Inlined RefMapping/RefTracker and TypeSubclassMap from an external utility module. * Added FxImporter class comments. * Updated stack trace extraction to be fail safe. * Added an entry-point for `import_frozen_exported_program` which uses the shiny new upstream `torch.export.export()` API (versus the lower-level/older API that Turbine is presently using). This necessitated a small FX rewrite to line external state management up with current conventions. * Adapted one of Turbine's importer tests to go with this initial submission. Turbine unfortunately has a lot of more-integration-ey tests, and I would like to extract those as more of unit tests of the importer features and upstream them that way vs trying to copy directly. For now, one overall test with the initial submission gets us moving. I acknowledge that there are some code quality things that could be improved in this submission: this was authored over the course of many months (and often via some trial and error). I would like to keep it relatively converged with the downstream for the next few steps while getting the test suite upstreamed. And then it will be easier to take a hygienic pass through the code. Including co-authors for contributors in the git log of the original repository. Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com> Co-authored-by: Avinash Sharma <aviator1994@gmail.com> Co-authored-by: Arham Khan <arhammkhan@gmail.com> Co-authored-by: brucekimrokcmu <kwangkyk@alumni.cmu.edu> Co-authored-by: saienduri <77521230+saienduri@users.noreply.github.com>
2023-12-22 00:40:10 +08:00
extras/fx_importer.py
Upstream the ONNX importer. (#2636) This is part 1 of 2, which will also include upstreaming the FX importer. I started with ONNX because it forces some project layout updates and is more self contained/easier as a first step. Deviating somewhat from the RFCs on project layout, I made the following decisions: * Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks already has opened up that namespace and it seemed to fit. Better to have fewer things at that level. * Setup the build so that the root project only contains MLIR Python and pure Python deps (like the importers), but this can be augmented with the `projects/` adding more depending on which features are enabled. * The default build continues to build everything whereas in `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a `torch-mlir-core` wheel with the pure contents only. `onnx_importer.py` and `importer_smoke_test.py` are almost verbatim copies from SHARK-Turbine. I made some minor local alterations to adapt to paths and generalize the way they interact with the outer project. I expect I can copy these back to Turbine verbatim from here. I also updated the license boilerplate (they have the same license but slightly different project norms for the headers) but retained the correct copyright. Other updates: * Added the ONNX importer unit test (which also can generate test data) in lit, conditioned on the availability of the Python `onnx` package. In a followup once I know everything is stable, I'll add another env var that the CI can set to always enable this so we know conclusively if tests pass. * Moved the ONNX conversion readme to `docs/`. * Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` -> `TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the JitIR importer and LTC options `cmake_dependent_options` for robustness.
2023-12-13 11:02:51 +08:00
extras/onnx_importer.py
)
declare_mlir_python_sources(TorchMLIRPythonSources.PublicAPI
ROOT_DIR "${TORCH_MLIR_PYTHON_ROOT_DIR}"
ADD_TO_PARENT TorchMLIRPythonSources
SOURCES
compiler_utils.py
fx.py
extras/fx_decomp_util.py
)
declare_mlir_python_sources(TorchMLIRPythonSources.Tools
ROOT_DIR "${TORCH_MLIR_PYTHON_ROOT_DIR}"
ADD_TO_PARENT TorchMLIRPythonSources
SOURCES
tools/import_onnx/__main__.py
)
[ci] Upgrade to new runners and disable unsupported jobs. (#2818) Per the RFC and numerous conversations on Discord, this rebuilds the torch-mlir CI and discontinues the infra and coupling to the binary releases (https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371). I iterated on this to get latency back to about what it was with the old (much larger and non-ephemeral) runners: About 4m - 4.5m for an incremental change. Behind the scenes changes: * Uses a new runner pool operated by AMD. It is currently set to manual scaling and has two runners (32-core, 64GiB RAM) while we get some traction. We can either fiddle with some auto-scaling or use a schedule to give it an increase during certain high traffic hours. * Builds are now completely isolated and cannot have run-to-run interference like we were getting before (i.e. lock file/permissions stuff). * The GHA runner is installed directly into a manylinux 2.28 container with upgraded dev tools. This eliminates the need to do sub-invocations of docker on Linux in order to run on the same OS that is used to build wheels. * While not using it now, this setup was cloned from another project that posts the built artifacts to the job and fans out testing. Might be useful here later. * Uses a special git cache that lets us have ephemeral runners and still check out the repo and deps (incl. llvm) in ~13s. * Running in an Azure VM Scale Set. In-repo changes: * Disables (but does not yet delete): * Old buildAndTest.yml jobs * releaseSnapshotPackage.yml * Adds a new `ci.yml` pipeline and scripts the steps in `build_tools/ci` (by decomposing the existing `build_linux_packages.sh` for in-tree builds and modularizing it a bit better). * Test framework changes: * Adds a `TORCH_MLIR_TEST_CONCURRENCY` env var that can be used to bound the multiprocess concurrency. Ended up not using this in the final version but is useful to have as a knob. * Changes the default concurrency to `nproc * 0.8 + 1` vs `nproc * 1.1`. We're running on systems with significantly less virtual memory and I did a bit of fiddling to find a good tradeoff. * Changed multiprocess mode to spawn instead of fork. Otherwise, I was getting instability (as discussed on discord). * Added MLIR configuration to disable multithreaded contexts globally for the project. Constantly spawning `nproc * nproc` threads (more than that actually) was OOM'ing. * Added a test timeout of 5 minutes. If a multiprocess worker crashes, the framework can get wedged indefinitely (and then will just be reaped after multiple hours). We should fix this, but this at least keeps the CI pool from wedging with stuck jobs. Functional changes needing followup: * No matter what I did, I couldn't get the LTC tests to work, and I'm not 100% sure they were being run in the old setup as the scripts were a bit twisty. I disabled them and left a comment. * Dropped out-of-tree build variants. These were not providing much signal and increase CI needs by 50%. * Dropped MacOS and Windows builds. Now that we are "just a library" and not building releases, there is less pressure to test these commit by commit. Further, since we bump torch-mlir to known good commits on these platforms, it has been a long time since either of these jobs have provided much signal (and they take ~an hour+ to run). We can add them back later post-submit if ever needed.
2024-01-28 10:35:45 +08:00
declare_mlir_python_sources(TorchMLIRSiteInitialize
ROOT_DIR "${TORCH_MLIR_PYTHON_ROOT_DIR}"
ADD_TO_PARENT TorchMLIRPythonSources
SOURCES
_mlir_libs/_site_initialize_0.py
)
Upstream the ONNX importer. (#2636) This is part 1 of 2, which will also include upstreaming the FX importer. I started with ONNX because it forces some project layout updates and is more self contained/easier as a first step. Deviating somewhat from the RFCs on project layout, I made the following decisions: * Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks already has opened up that namespace and it seemed to fit. Better to have fewer things at that level. * Setup the build so that the root project only contains MLIR Python and pure Python deps (like the importers), but this can be augmented with the `projects/` adding more depending on which features are enabled. * The default build continues to build everything whereas in `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a `torch-mlir-core` wheel with the pure contents only. `onnx_importer.py` and `importer_smoke_test.py` are almost verbatim copies from SHARK-Turbine. I made some minor local alterations to adapt to paths and generalize the way they interact with the outer project. I expect I can copy these back to Turbine verbatim from here. I also updated the license boilerplate (they have the same license but slightly different project norms for the headers) but retained the correct copyright. Other updates: * Added the ONNX importer unit test (which also can generate test data) in lit, conditioned on the availability of the Python `onnx` package. In a followup once I know everything is stable, I'll add another env var that the CI can set to always enable this so we know conclusively if tests pass. * Moved the ONNX conversion readme to `docs/`. * Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` -> `TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the JitIR importer and LTC options `cmake_dependent_options` for robustness.
2023-12-13 11:02:51 +08:00
################################################################################
# Extensions
################################################################################
declare_mlir_python_extension(TorchMLIRPythonExtensions.Main
MODULE_NAME _torchMlir
ADD_TO_PARENT TorchMLIRPythonExtensions
SOURCES
TorchMLIRModule.cpp
EMBED_CAPI_LINK_LIBS
TorchMLIRCAPI
PRIVATE_LINK_LIBS
LLVMSupport
)
################################################################################
# Generate packages and shared library
# Downstreams typically will not use these, but they are useful for local
# testing.
################################################################################
set(_source_components
# TODO: Core is now implicitly building/registering all dialects, increasing
# build burden by ~5x. Make it stop.
# TODO: Reduce dependencies. We need ExecutionEngine and a bunch of passes
# for the reference backend, but logically they can be separate. But seemingly
# the only way to handle that is to create a separate mlir python package
# tree, which seems excessive.
MLIRPythonSources
MLIRPythonExtension.Core
MLIRPythonExtension.RegisterEverything
TorchMLIRPythonSources
TorchMLIRPythonExtensions
[ci] Upgrade to new runners and disable unsupported jobs. (#2818) Per the RFC and numerous conversations on Discord, this rebuilds the torch-mlir CI and discontinues the infra and coupling to the binary releases (https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371). I iterated on this to get latency back to about what it was with the old (much larger and non-ephemeral) runners: About 4m - 4.5m for an incremental change. Behind the scenes changes: * Uses a new runner pool operated by AMD. It is currently set to manual scaling and has two runners (32-core, 64GiB RAM) while we get some traction. We can either fiddle with some auto-scaling or use a schedule to give it an increase during certain high traffic hours. * Builds are now completely isolated and cannot have run-to-run interference like we were getting before (i.e. lock file/permissions stuff). * The GHA runner is installed directly into a manylinux 2.28 container with upgraded dev tools. This eliminates the need to do sub-invocations of docker on Linux in order to run on the same OS that is used to build wheels. * While not using it now, this setup was cloned from another project that posts the built artifacts to the job and fans out testing. Might be useful here later. * Uses a special git cache that lets us have ephemeral runners and still check out the repo and deps (incl. llvm) in ~13s. * Running in an Azure VM Scale Set. In-repo changes: * Disables (but does not yet delete): * Old buildAndTest.yml jobs * releaseSnapshotPackage.yml * Adds a new `ci.yml` pipeline and scripts the steps in `build_tools/ci` (by decomposing the existing `build_linux_packages.sh` for in-tree builds and modularizing it a bit better). * Test framework changes: * Adds a `TORCH_MLIR_TEST_CONCURRENCY` env var that can be used to bound the multiprocess concurrency. Ended up not using this in the final version but is useful to have as a knob. * Changes the default concurrency to `nproc * 0.8 + 1` vs `nproc * 1.1`. We're running on systems with significantly less virtual memory and I did a bit of fiddling to find a good tradeoff. * Changed multiprocess mode to spawn instead of fork. Otherwise, I was getting instability (as discussed on discord). * Added MLIR configuration to disable multithreaded contexts globally for the project. Constantly spawning `nproc * nproc` threads (more than that actually) was OOM'ing. * Added a test timeout of 5 minutes. If a multiprocess worker crashes, the framework can get wedged indefinitely (and then will just be reaped after multiple hours). We should fix this, but this at least keeps the CI pool from wedging with stuck jobs. Functional changes needing followup: * No matter what I did, I couldn't get the LTC tests to work, and I'm not 100% sure they were being run in the old setup as the scripts were a bit twisty. I disabled them and left a comment. * Dropped out-of-tree build variants. These were not providing much signal and increase CI needs by 50%. * Dropped MacOS and Windows builds. Now that we are "just a library" and not building releases, there is less pressure to test these commit by commit. Further, since we bump torch-mlir to known good commits on these platforms, it has been a long time since either of these jobs have provided much signal (and they take ~an hour+ to run). We can add them back later post-submit if ever needed.
2024-01-28 10:35:45 +08:00
TorchMLIRSiteInitialize
Upstream the ONNX importer. (#2636) This is part 1 of 2, which will also include upstreaming the FX importer. I started with ONNX because it forces some project layout updates and is more self contained/easier as a first step. Deviating somewhat from the RFCs on project layout, I made the following decisions: * Locating the `onnx_importer.py` into `torch_mlir.extras` as Maks already has opened up that namespace and it seemed to fit. Better to have fewer things at that level. * Setup the build so that the root project only contains MLIR Python and pure Python deps (like the importers), but this can be augmented with the `projects/` adding more depending on which features are enabled. * The default build continues to build everything whereas in `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS=1` mode, it builds a `torch-mlir-core` wheel with the pure contents only. `onnx_importer.py` and `importer_smoke_test.py` are almost verbatim copies from SHARK-Turbine. I made some minor local alterations to adapt to paths and generalize the way they interact with the outer project. I expect I can copy these back to Turbine verbatim from here. I also updated the license boilerplate (they have the same license but slightly different project norms for the headers) but retained the correct copyright. Other updates: * Added the ONNX importer unit test (which also can generate test data) in lit, conditioned on the availability of the Python `onnx` package. In a followup once I know everything is stable, I'll add another env var that the CI can set to always enable this so we know conclusively if tests pass. * Moved the ONNX conversion readme to `docs/`. * Renamed CMake option `TORCH_MLIR_ENABLE_ONLY_MLIR_PYTHON_BINDINGS` -> `TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS` and inverted the sense. Made the JitIR importer and LTC options `cmake_dependent_options` for robustness.
2023-12-13 11:02:51 +08:00
# Sources related to optional Torch extension dependent features. Typically
# empty unless if project features are enabled.
TorchMLIRPythonTorchExtensionsSources
)
add_mlir_python_common_capi_library(TorchMLIRAggregateCAPI
INSTALL_COMPONENT TorchMLIRPythonModules
INSTALL_DESTINATION python_packages/torch_mlir/torch_mlir/_mlir_libs
OUTPUT_DIRECTORY "${TORCH_MLIR_PYTHON_PACKAGES_DIR}/torch_mlir/torch_mlir/_mlir_libs"
RELATIVE_INSTALL_ROOT ".."
DECLARED_SOURCES ${_source_components}
)
add_mlir_python_modules(TorchMLIRPythonModules
ROOT_PREFIX "${TORCH_MLIR_PYTHON_PACKAGES_DIR}/torch_mlir/torch_mlir"
INSTALL_PREFIX "python_packages/torch_mlir/torch_mlir"
DECLARED_SOURCES ${_source_components}
COMMON_CAPI_LINK_LIBS
TorchMLIRAggregateCAPI
)