2021-09-30 00:03:40 +08:00
|
|
|
# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
# See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
|
# Also available under a BSD-style license. See LICENSE.
|
2021-04-20 06:12:29 +08:00
|
|
|
"""
|
|
|
|
# End-to-end testing framework for TorchScript.
|
|
|
|
|
|
|
|
For the purposes of this framework, "end to end" means the first "end" is
|
|
|
|
a `torch.nn.Module`, and the second "end" is execution.
|
|
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
|
|
A program for this testing framework is considered to be a `torch.nn.Module`,
|
|
|
|
which has a public interface consisting of its methods and instance attributes.
|
|
|
|
|
|
|
|
A test in the framework consists conceputally of a list of calls into
|
|
|
|
the methods of a module (TODO: extend to instance attributes). It is expected
|
|
|
|
that the outputs match between the program run on a backend (controlled by
|
|
|
|
a TestConfig) and a golden trace obtained by running on native Torch (without
|
|
|
|
compiling or TorchScript'ing).
|
|
|
|
"""
|
|
|
|
|
|
|
|
import abc
|
2021-08-11 07:10:31 +08:00
|
|
|
from typing import Any, Callable, List, NamedTuple, Optional, TypeVar, Union, Dict
|
Miscellaneous fixes for Windows builds (#1376)
* test: allow spaces in path to Python executable
On Windows, the path to the Python binary may contain spaces, so this
patch adds quotes around the path to the python executable.
Thanks to @sstamenova for suggesting the fix!
* python: remove header file that causes Windows build failures
Similar to https://reviews.llvm.org/D125284, we can safely remove this
header file without affecting the build on either Linux. It is
necessary to remove this header file on Windows builds since otherwise
it causes build errors.
* python: drop `TORCH_API` from function defined in Torch-MLIR
`TORCH_API` should apply to functions that are either exported by
libtorch.so or ones that are imported from libtorch.so by its downstream
consumers (like Torch-MLIR). Neither case applies to the
`importJitFunctionAsFuncOp()` function, since it is defined in
Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem
by dropping `TORCH_API` from that function's declaration.
* python: make output of class anotations deterministic
The `class-annotator-repr.py` test checks for class annotations in a
specific order, but prior to this patch, the order was
non-deterministic, since the code iterated on an _unordered_ map.
This patch makes the iteration order deterministic through two changes:
1. using a sorted map
2. using the class qualified name instead of the address of the class in
memory
* test: use Python3_EXECUTABLE as interpreter path for consistency
This ensures that tests use the Python3 version that was detected using
CMake, instead of whichever python version that happens to be in the
PATH variable when invoking the test.
* test: fix RUN string
The parenthesis syntax does not run on Windows (the shell interprets the
`(` character as part of the path). Moreover, the ODR violation in the
comment no longer seems to apply.
* python: port parallel test framework to Windows
Since Windows does not support `fork` natively, Python's
`multiprocessing` module needs to use `spawn` on Windows. However, to
use `spawn`, the multiprocessing module serializes (or pickles) the
worker function and its arguments. Sadly, the multiprocessing module
(both the default one in Python and the one that is extended in PyTorch)
is unable to serialize lambda functions (see
https://stackoverflow.com/a/19985580) for detals.
Unfortunately, given how our tests are structured, we require that the
function under test is passed as an argument to another function, so we
cannot sidestep our use of lambda functions.
To resolve this problem, this patch makes use of the `multiprocess` and
`dill` Python modules, which together offers a multiprocessing mechanism
that can serialize lambda functions. The multiprocess module also
offers a process pool, which simplifies the code for our parallel
testing framework.
2022-09-30 01:07:43 +08:00
|
|
|
from itertools import repeat
|
2021-04-20 06:12:29 +08:00
|
|
|
|
2022-08-26 05:44:21 +08:00
|
|
|
import sys
|
2021-10-26 07:16:01 +08:00
|
|
|
import traceback
|
2021-07-10 03:22:45 +08:00
|
|
|
|
2021-04-20 06:12:29 +08:00
|
|
|
import torch
|
Miscellaneous fixes for Windows builds (#1376)
* test: allow spaces in path to Python executable
On Windows, the path to the Python binary may contain spaces, so this
patch adds quotes around the path to the python executable.
Thanks to @sstamenova for suggesting the fix!
* python: remove header file that causes Windows build failures
Similar to https://reviews.llvm.org/D125284, we can safely remove this
header file without affecting the build on either Linux. It is
necessary to remove this header file on Windows builds since otherwise
it causes build errors.
* python: drop `TORCH_API` from function defined in Torch-MLIR
`TORCH_API` should apply to functions that are either exported by
libtorch.so or ones that are imported from libtorch.so by its downstream
consumers (like Torch-MLIR). Neither case applies to the
`importJitFunctionAsFuncOp()` function, since it is defined in
Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem
by dropping `TORCH_API` from that function's declaration.
* python: make output of class anotations deterministic
The `class-annotator-repr.py` test checks for class annotations in a
specific order, but prior to this patch, the order was
non-deterministic, since the code iterated on an _unordered_ map.
This patch makes the iteration order deterministic through two changes:
1. using a sorted map
2. using the class qualified name instead of the address of the class in
memory
* test: use Python3_EXECUTABLE as interpreter path for consistency
This ensures that tests use the Python3 version that was detected using
CMake, instead of whichever python version that happens to be in the
PATH variable when invoking the test.
* test: fix RUN string
The parenthesis syntax does not run on Windows (the shell interprets the
`(` character as part of the path). Moreover, the ODR violation in the
comment no longer seems to apply.
* python: port parallel test framework to Windows
Since Windows does not support `fork` natively, Python's
`multiprocessing` module needs to use `spawn` on Windows. However, to
use `spawn`, the multiprocessing module serializes (or pickles) the
worker function and its arguments. Sadly, the multiprocessing module
(both the default one in Python and the one that is extended in PyTorch)
is unable to serialize lambda functions (see
https://stackoverflow.com/a/19985580) for detals.
Unfortunately, given how our tests are structured, we require that the
function under test is passed as an argument to another function, so we
cannot sidestep our use of lambda functions.
To resolve this problem, this patch makes use of the `multiprocess` and
`dill` Python modules, which together offers a multiprocessing mechanism
that can serialize lambda functions. The multiprocess module also
offers a process pool, which simplifies the code for our parallel
testing framework.
2022-09-30 01:07:43 +08:00
|
|
|
import multiprocess as mp
|
2021-04-20 06:12:29 +08:00
|
|
|
|
2021-08-11 07:10:31 +08:00
|
|
|
TorchScriptValue = Union[int, float, List['TorchScriptValue'],
|
|
|
|
Dict['TorchScriptValue',
|
|
|
|
'TorchScriptValue'], torch.Tensor]
|
|
|
|
|
|
|
|
|
2021-04-20 06:12:29 +08:00
|
|
|
class TraceItem(NamedTuple):
|
|
|
|
# The externally visible symbol name that is called.
|
|
|
|
# For example `"forward"` or `"submodule.forward"`.
|
|
|
|
symbol: str
|
2021-08-11 07:10:31 +08:00
|
|
|
# The inputs to the call.
|
|
|
|
inputs: List[TorchScriptValue]
|
|
|
|
# The output from the call.
|
|
|
|
# In Python, there is only one output from a function. It might be a tuple
|
|
|
|
# in case of "multiple results".
|
2021-04-20 06:12:29 +08:00
|
|
|
# Sometimes this field is treated as golden outputs from a test.
|
|
|
|
# Sometimes this field is treated as ignored, such as the input trace
|
|
|
|
# provided to `TestConfig.run`.
|
2021-08-11 07:10:31 +08:00
|
|
|
output: TorchScriptValue
|
2021-04-20 06:12:29 +08:00
|
|
|
|
|
|
|
|
|
|
|
# A trace of invocations to the program.
|
|
|
|
# This is an ordered sequence of external invocations to a program's
|
|
|
|
# public boundary.
|
|
|
|
Trace = List[TraceItem]
|
|
|
|
|
|
|
|
|
2022-03-23 23:34:02 +08:00
|
|
|
# Clone all the tensor values.
|
|
|
|
def clone_torch_script_value(v: TorchScriptValue):
|
|
|
|
if isinstance(v, torch.Tensor):
|
|
|
|
return v.clone()
|
|
|
|
if isinstance(v, tuple):
|
|
|
|
return tuple(clone_torch_script_value(field) for field in v)
|
|
|
|
if isinstance(v, list):
|
|
|
|
return [clone_torch_script_value(item) for item in v]
|
|
|
|
if isinstance(v, dict):
|
|
|
|
return {
|
|
|
|
clone_torch_script_value(key): clone_torch_script_value(val)
|
|
|
|
for key, val in v.items()
|
|
|
|
}
|
|
|
|
if isinstance(v, float) or isinstance(v, int) or isinstance(v, str):
|
|
|
|
return v
|
|
|
|
assert False, "unhandled cloning of TorchScriptValue value type"
|
|
|
|
|
|
|
|
|
2022-03-23 23:34:02 +08:00
|
|
|
# This clone helper is used to work around issues with output tensors when
|
|
|
|
# using multiprocessing module to run tests. The error happens for tests like
|
|
|
|
# ContiguousModule_basic where the output tensor aliases with an input tensor.
|
|
|
|
# When the output tensor is not cloned, the testing trace would be modified for
|
|
|
|
# unknown reason when passed through the shared memory through synchronized
|
|
|
|
# queue for example.
|
|
|
|
# TODO: Figure out the root cause of the failure and fix properly.
|
|
|
|
def clone_trace(trace: Trace) -> Trace:
|
|
|
|
return [
|
|
|
|
TraceItem(symbol=item.symbol,
|
|
|
|
inputs=item.inputs,
|
|
|
|
output=clone_torch_script_value(item.output))
|
|
|
|
for item in trace
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
# A type shared between the result of `TestConfig.compile` and the input
|
|
|
|
# to `TestConfig.run`. Each backend will likely have a different definition of
|
|
|
|
# this type.
|
|
|
|
CompiledArtifact = TypeVar('CompiledArtifact')
|
2021-04-20 06:12:29 +08:00
|
|
|
|
|
|
|
class TestConfig(abc.ABC):
|
|
|
|
"""The interface implemented by backends to run tests.
|
|
|
|
|
|
|
|
The testing framework expects to be able to call `compile` to compile
|
|
|
|
a torch.nn.Module, and then pass the compiled artifact to `run` to run it.
|
|
|
|
|
|
|
|
Note that the definition of "compiled artifact" here is quite loose, and
|
|
|
|
this interface allows for many different use cases besides simple testing.
|
|
|
|
|
|
|
|
For example, this interface can be overridden to be a "data collector"
|
|
|
|
to gather information across all the test cases. For example,
|
|
|
|
a compiler backend could override "compile" to just return some IR at a
|
|
|
|
useful intermediate abstraction level (rather than the final compiled
|
|
|
|
artifact), and then have "run" save this intermediate IR + the trace as
|
|
|
|
input to some lower-level software stack's testing format.
|
|
|
|
|
|
|
|
The set of TestConfig's is expected to be pluggable and provided by
|
|
|
|
users to suit their own needs. We provide a few configs out of the box
|
|
|
|
in the `configs` submodule of this package, but those are intended
|
|
|
|
to be for basic inspiration and enough for our own testing.
|
2021-09-28 02:36:44 +08:00
|
|
|
Backends to torch-mlir will likely have more elaborate TestConfig's, such
|
2021-04-20 06:12:29 +08:00
|
|
|
as `compile` being "compile for such-and-such DSP with these vectorization
|
|
|
|
cost model flags" and `run` being "connect to Android phone with
|
|
|
|
device ID 1234 and upload a program to run on it's DSP core, and also set
|
|
|
|
power throttling settings to 'performance'".
|
|
|
|
|
|
|
|
That is also why this class is not called "backend", as it
|
|
|
|
encapsulates potentially many specific details of the test configuration
|
|
|
|
process as well. There isn't a general way to disentangle test configuration
|
|
|
|
from the compile/run process specific to a logical backend, since each
|
|
|
|
backend (compiler backend and runtime target) will have an arbitrarily
|
|
|
|
wild and wonderful set of possible configurations that we cannot predict.
|
|
|
|
"""
|
|
|
|
# This is not a frontend-lowered module, to allow various testing at the PyTorch level.
|
2021-09-28 02:36:44 +08:00
|
|
|
# We can have a helper class LinalgOnTensorsBackendTestConfig which does that.
|
2021-04-20 06:12:29 +08:00
|
|
|
@abc.abstractmethod
|
|
|
|
def compile(self, program: torch.nn.Module) -> CompiledArtifact:
|
|
|
|
"""Compile the provided torch.nn.Module into a compiled artifact"""
|
|
|
|
pass
|
|
|
|
|
|
|
|
# Any should match result of `compile`.
|
|
|
|
|
|
|
|
@abc.abstractmethod
|
|
|
|
def run(self, artifact: CompiledArtifact, trace: Trace) -> Trace:
|
|
|
|
"""Run the compiled artifact produced by `compile`.
|
|
|
|
|
|
|
|
The backend should load the compiled artifact and call the
|
|
|
|
symbol names listed in `trace` with their respective inputs (the outputs
|
|
|
|
of `trace` should be ignored). A new identical trace with outputs
|
|
|
|
populated should be returned.
|
|
|
|
|
|
|
|
This method should assume that `artifact` is being shared with
|
|
|
|
multiple parallel invocations of `run`, and so it should not be mutated.
|
|
|
|
This property is typicaly trivially satisfied for a true
|
|
|
|
"compiled artifact", but some backends don't directly involve a
|
|
|
|
compiled artifact per se (like a backend for which `CompiledArtifact` is
|
|
|
|
`torch.nn.Module` and `run` just invokes the torch.nn.Module itself)
|
|
|
|
|
|
|
|
Args:
|
|
|
|
artifact: a compiled artifact produced by `compile`.
|
|
|
|
trace: The external invocations to stimulate the module.
|
|
|
|
Returns:
|
|
|
|
A trace with outputs recorded according to the results of running
|
|
|
|
on this backend.
|
|
|
|
"""
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
|
|
# Utilities for common testing trace generation.
|
|
|
|
# Also, resets the random seed for reproducibility.
|
|
|
|
# TODO: If generating in parallel, how to have manual_seed be local?
|
|
|
|
class TestUtils:
|
|
|
|
"""Utilities for executing a test.
|
|
|
|
|
|
|
|
Test cases are provided an instance of this class to make test cases
|
|
|
|
more succinct.
|
|
|
|
|
|
|
|
For reproducibility, this class also resets the random seed.
|
|
|
|
TODO: Figure out how to seed reset properly scoped to just a test case
|
|
|
|
(such as when running tests in parallel)
|
|
|
|
"""
|
2022-03-23 23:34:02 +08:00
|
|
|
|
2021-04-20 06:12:29 +08:00
|
|
|
def __init__(self):
|
|
|
|
torch.manual_seed(0)
|
|
|
|
|
|
|
|
# TODO: Add zeros/ones/etc. as convenient.
|
2021-10-27 11:44:01 +08:00
|
|
|
def rand(self, *sizes, low=0.0, high=1.0):
|
|
|
|
return torch.empty(sizes).uniform_(low, high)
|
|
|
|
|
2022-08-26 23:50:16 +08:00
|
|
|
def randint(self, *sizes, low=0, high=10):
|
|
|
|
return torch.randint(low, high, sizes)
|
|
|
|
|
2021-10-27 11:44:01 +08:00
|
|
|
def nans(self, *sizes):
|
|
|
|
vals = torch.empty(sizes)
|
|
|
|
vals[...] = torch.nan
|
|
|
|
return vals
|
2021-04-20 06:12:29 +08:00
|
|
|
|
|
|
|
|
|
|
|
class Test(NamedTuple):
|
|
|
|
"""A description of a test as produced by the test frontend.
|
|
|
|
"""
|
|
|
|
# Stable name for error reporting.
|
|
|
|
#
|
|
|
|
# This name's stability is also useful for backend, which want to
|
|
|
|
# generate their own lower-level test suites based on this framework.
|
|
|
|
#
|
|
|
|
# It is expected that those backends will need additional
|
|
|
|
# metadata to describe their test configurations, so having a unique
|
|
|
|
# key to keep that information associated is important.
|
|
|
|
unique_name: str
|
|
|
|
# A callable which produces the module under test.
|
|
|
|
# This is a callable to allow lazily creating the module.
|
|
|
|
program_factory: Callable[[], torch.nn.Module]
|
|
|
|
# A callable which provides external stimuli to the module.
|
|
|
|
# The first parameter is a torch.nn.Module (or a `_Tracer` wrapping that
|
|
|
|
# module, actually).
|
|
|
|
# The secon parameter is a `TestUtils` instance for convenience.
|
|
|
|
program_invoker: Callable[[Any, TestUtils], None]
|
|
|
|
|
|
|
|
|
|
|
|
class TestResult(NamedTuple):
|
|
|
|
# Stable unique name for error reporting and test suite configuration.
|
|
|
|
#
|
|
|
|
# Tests frequently need some additional data (such as expected pass/fail
|
|
|
|
# status, desired test configurations, etc.), and this gives a key to
|
|
|
|
# associate to. This avoids extending this class arbitrarily for every
|
|
|
|
# possible requirement from the test framework.
|
|
|
|
#
|
|
|
|
# This name is also useful for backends that are generating their own
|
|
|
|
# lower-level test suites from this framework for the same reasons, though
|
|
|
|
# those reasons are stronger because we cannot simply extend this
|
|
|
|
# class.
|
|
|
|
unique_name: str # Should match Test.unique_name for corresponding test.
|
2021-05-20 08:36:00 +08:00
|
|
|
# If compilation failed, a string describing the failure.
|
|
|
|
# If this is not None, then the `trace` and `golden_trace` fields are None,
|
|
|
|
# and vice-versa.
|
|
|
|
compilation_error: Optional[str]
|
2021-09-24 05:50:37 +08:00
|
|
|
# If runtime failed, a string describing the failure.
|
|
|
|
# If this is not None, then the `trace` and `golden_trace` fields are None,
|
|
|
|
# and vice-versa.
|
|
|
|
runtime_error: Optional[str]
|
2021-04-20 06:12:29 +08:00
|
|
|
# The trace produced by the backend.
|
2021-05-20 08:36:00 +08:00
|
|
|
trace: Optional[Trace]
|
2021-04-20 06:12:29 +08:00
|
|
|
# The golden trace which `trace` is expected to match.
|
2021-05-20 08:36:00 +08:00
|
|
|
golden_trace: Optional[Trace]
|
2021-04-20 06:12:29 +08:00
|
|
|
|
|
|
|
|
|
|
|
class _Tracer:
|
|
|
|
"""Wrapper around a `torch.nn.Module` that records calls into it.
|
|
|
|
|
2021-07-10 03:22:45 +08:00
|
|
|
The inputs and outputs of each call are recorded in a Trace. Recursive
|
|
|
|
property accesses are also traced.
|
2021-04-20 06:12:29 +08:00
|
|
|
"""
|
2022-03-23 23:34:02 +08:00
|
|
|
|
2021-07-10 03:22:45 +08:00
|
|
|
def __init__(self, wrapped, property_base_path: List[str], trace: Trace):
|
|
|
|
self.__wrapped__ = wrapped
|
|
|
|
self.__trace__ = trace
|
|
|
|
self.__property_base_path__ = property_base_path
|
|
|
|
|
|
|
|
def __call__(self, *args, **kwargs):
|
2022-03-23 23:34:02 +08:00
|
|
|
# Clone the inputs to capture the original tensors values. This is
|
|
|
|
# needed because inplace mutation might happen to the input tensors.
|
|
|
|
inputs = [clone_torch_script_value(arg) for arg in args]
|
2021-08-11 07:10:31 +08:00
|
|
|
output = self.__wrapped__(*args, **kwargs)
|
2021-07-10 03:22:45 +08:00
|
|
|
self.__trace__.append(
|
|
|
|
TraceItem(symbol=".".join(self.__property_base_path__),
|
2022-03-23 23:34:02 +08:00
|
|
|
inputs=inputs,
|
2021-08-11 07:10:31 +08:00
|
|
|
output=output))
|
|
|
|
return output
|
2021-04-20 06:12:29 +08:00
|
|
|
|
|
|
|
def __getattr__(self, name):
|
2021-07-10 03:22:45 +08:00
|
|
|
return _Tracer(getattr(self.__wrapped__, name),
|
|
|
|
self.__property_base_path__ + [name], self.__trace__)
|
|
|
|
|
|
|
|
|
|
|
|
def generate_golden_trace(test: Test) -> Trace:
|
|
|
|
"""Generate a trace with the original program.
|
|
|
|
|
|
|
|
If the original program is deterministic, then this the produced trace is
|
|
|
|
suitable as a golden trace to compare against.
|
|
|
|
"""
|
|
|
|
trace = []
|
|
|
|
tracer = _Tracer(test.program_factory(), [], trace)
|
2021-04-20 06:12:29 +08:00
|
|
|
test.program_invoker(tracer, TestUtils())
|
2021-07-10 03:22:45 +08:00
|
|
|
return trace
|
2021-04-20 06:12:29 +08:00
|
|
|
|
|
|
|
|
2022-08-26 05:44:21 +08:00
|
|
|
def compile_and_run_test(test: Test, config: TestConfig, verbose=False) -> Any:
|
2022-03-23 23:34:02 +08:00
|
|
|
try:
|
|
|
|
golden_trace = generate_golden_trace(test)
|
2022-08-26 05:44:21 +08:00
|
|
|
if verbose:
|
|
|
|
print(f"Compiling {test.unique_name}...", file=sys.stderr)
|
2022-03-23 23:34:02 +08:00
|
|
|
compiled = config.compile(test.program_factory())
|
|
|
|
except Exception as e:
|
|
|
|
return TestResult(unique_name=test.unique_name,
|
|
|
|
compilation_error="".join(
|
|
|
|
traceback.format_exception(
|
|
|
|
type(e), e, e.__traceback__)),
|
|
|
|
runtime_error=None,
|
|
|
|
trace=None,
|
|
|
|
golden_trace=None)
|
|
|
|
try:
|
2022-08-26 05:44:21 +08:00
|
|
|
if verbose:
|
|
|
|
print(f"Running {test.unique_name}...", file=sys.stderr)
|
2022-03-23 23:34:02 +08:00
|
|
|
trace = config.run(compiled, golden_trace)
|
|
|
|
except Exception as e:
|
|
|
|
return TestResult(unique_name=test.unique_name,
|
|
|
|
compilation_error=None,
|
|
|
|
runtime_error="".join(
|
|
|
|
traceback.format_exception(
|
|
|
|
type(e), e, e.__traceback__)),
|
|
|
|
trace=None,
|
|
|
|
golden_trace=None)
|
|
|
|
return TestResult(unique_name=test.unique_name,
|
|
|
|
compilation_error=None,
|
|
|
|
runtime_error=None,
|
|
|
|
trace=clone_trace(trace),
|
|
|
|
golden_trace=golden_trace)
|
|
|
|
|
|
|
|
|
2022-08-26 05:44:21 +08:00
|
|
|
def run_tests(tests: List[Test], config: TestConfig, sequential=False, verbose=False) -> List[TestResult]:
|
2021-04-20 06:12:29 +08:00
|
|
|
"""Invoke the given `Test`'s with the provided `TestConfig`."""
|
2022-08-26 05:44:21 +08:00
|
|
|
num_processes = min(int(mp.cpu_count() * 1.1), len(tests))
|
|
|
|
# TODO: We've noticed that on certain 2 core machine parallelizing the tests
|
|
|
|
# makes the llvm backend legacy pass manager 20x slower than using a
|
|
|
|
# single process. Need to investigate the root cause eventually. This is a
|
|
|
|
# hack to work around this issue.
|
|
|
|
# Also our multiprocessing implementation is not the most efficient, so
|
|
|
|
# the benefit at core count 2 is probably not worth it anyway.
|
|
|
|
if mp.cpu_count() == 2:
|
|
|
|
num_processes = 1
|
|
|
|
|
2022-08-30 05:46:39 +08:00
|
|
|
# Sort the tests to make output nicer.
|
|
|
|
tests = list(sorted(tests, key=lambda t: t.unique_name))
|
|
|
|
|
2022-08-26 05:44:21 +08:00
|
|
|
# TODO: If num_processes == 1, then run without any of the multiprocessing
|
|
|
|
# machinery. In theory it should work, but any crash in the testing process
|
|
|
|
# seems to cause a cascade of failures resulting in undecipherable error
|
|
|
|
# messages.
|
|
|
|
if num_processes == 1 or sequential:
|
|
|
|
return [compile_and_run_test(test, config, verbose) for test in tests]
|
|
|
|
|
2022-03-23 23:34:02 +08:00
|
|
|
# This is needed because autograd does not support crossing process
|
|
|
|
# boundaries.
|
|
|
|
torch.autograd.set_grad_enabled(False)
|
|
|
|
|
Miscellaneous fixes for Windows builds (#1376)
* test: allow spaces in path to Python executable
On Windows, the path to the Python binary may contain spaces, so this
patch adds quotes around the path to the python executable.
Thanks to @sstamenova for suggesting the fix!
* python: remove header file that causes Windows build failures
Similar to https://reviews.llvm.org/D125284, we can safely remove this
header file without affecting the build on either Linux. It is
necessary to remove this header file on Windows builds since otherwise
it causes build errors.
* python: drop `TORCH_API` from function defined in Torch-MLIR
`TORCH_API` should apply to functions that are either exported by
libtorch.so or ones that are imported from libtorch.so by its downstream
consumers (like Torch-MLIR). Neither case applies to the
`importJitFunctionAsFuncOp()` function, since it is defined in
Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem
by dropping `TORCH_API` from that function's declaration.
* python: make output of class anotations deterministic
The `class-annotator-repr.py` test checks for class annotations in a
specific order, but prior to this patch, the order was
non-deterministic, since the code iterated on an _unordered_ map.
This patch makes the iteration order deterministic through two changes:
1. using a sorted map
2. using the class qualified name instead of the address of the class in
memory
* test: use Python3_EXECUTABLE as interpreter path for consistency
This ensures that tests use the Python3 version that was detected using
CMake, instead of whichever python version that happens to be in the
PATH variable when invoking the test.
* test: fix RUN string
The parenthesis syntax does not run on Windows (the shell interprets the
`(` character as part of the path). Moreover, the ODR violation in the
comment no longer seems to apply.
* python: port parallel test framework to Windows
Since Windows does not support `fork` natively, Python's
`multiprocessing` module needs to use `spawn` on Windows. However, to
use `spawn`, the multiprocessing module serializes (or pickles) the
worker function and its arguments. Sadly, the multiprocessing module
(both the default one in Python and the one that is extended in PyTorch)
is unable to serialize lambda functions (see
https://stackoverflow.com/a/19985580) for detals.
Unfortunately, given how our tests are structured, we require that the
function under test is passed as an argument to another function, so we
cannot sidestep our use of lambda functions.
To resolve this problem, this patch makes use of the `multiprocess` and
`dill` Python modules, which together offers a multiprocessing mechanism
that can serialize lambda functions. The multiprocess module also
offers a process pool, which simplifies the code for our parallel
testing framework.
2022-09-30 01:07:43 +08:00
|
|
|
pool = mp.Pool(num_processes)
|
|
|
|
arg_list = zip(tests, repeat(config))
|
|
|
|
handles = pool.starmap_async(compile_and_run_test, arg_list)
|
|
|
|
results = handles.get()
|
2022-03-23 23:34:02 +08:00
|
|
|
|
Miscellaneous fixes for Windows builds (#1376)
* test: allow spaces in path to Python executable
On Windows, the path to the Python binary may contain spaces, so this
patch adds quotes around the path to the python executable.
Thanks to @sstamenova for suggesting the fix!
* python: remove header file that causes Windows build failures
Similar to https://reviews.llvm.org/D125284, we can safely remove this
header file without affecting the build on either Linux. It is
necessary to remove this header file on Windows builds since otherwise
it causes build errors.
* python: drop `TORCH_API` from function defined in Torch-MLIR
`TORCH_API` should apply to functions that are either exported by
libtorch.so or ones that are imported from libtorch.so by its downstream
consumers (like Torch-MLIR). Neither case applies to the
`importJitFunctionAsFuncOp()` function, since it is defined in
Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem
by dropping `TORCH_API` from that function's declaration.
* python: make output of class anotations deterministic
The `class-annotator-repr.py` test checks for class annotations in a
specific order, but prior to this patch, the order was
non-deterministic, since the code iterated on an _unordered_ map.
This patch makes the iteration order deterministic through two changes:
1. using a sorted map
2. using the class qualified name instead of the address of the class in
memory
* test: use Python3_EXECUTABLE as interpreter path for consistency
This ensures that tests use the Python3 version that was detected using
CMake, instead of whichever python version that happens to be in the
PATH variable when invoking the test.
* test: fix RUN string
The parenthesis syntax does not run on Windows (the shell interprets the
`(` character as part of the path). Moreover, the ODR violation in the
comment no longer seems to apply.
* python: port parallel test framework to Windows
Since Windows does not support `fork` natively, Python's
`multiprocessing` module needs to use `spawn` on Windows. However, to
use `spawn`, the multiprocessing module serializes (or pickles) the
worker function and its arguments. Sadly, the multiprocessing module
(both the default one in Python and the one that is extended in PyTorch)
is unable to serialize lambda functions (see
https://stackoverflow.com/a/19985580) for detals.
Unfortunately, given how our tests are structured, we require that the
function under test is passed as an argument to another function, so we
cannot sidestep our use of lambda functions.
To resolve this problem, this patch makes use of the `multiprocess` and
`dill` Python modules, which together offers a multiprocessing mechanism
that can serialize lambda functions. The multiprocess module also
offers a process pool, which simplifies the code for our parallel
testing framework.
2022-09-30 01:07:43 +08:00
|
|
|
tests_with_results = {result.unique_name for result in results}
|
2022-03-23 23:34:02 +08:00
|
|
|
all_tests = {test.unique_name for test in tests}
|
|
|
|
# For processes that are crashed due to compile time or runtime error,
|
|
|
|
# the error outputs are printed out all together but no TestResult is
|
|
|
|
# produced when the process crashed.
|
|
|
|
# TODO: Find a clean way to capture the output from crashed process and
|
|
|
|
# create more detailed runtime_error for those tests.
|
|
|
|
aborted_tests = all_tests - tests_with_results
|
|
|
|
aborted_tests_results = [
|
|
|
|
TestResult(
|
|
|
|
unique_name=aborted_test_name,
|
|
|
|
compilation_error=None,
|
|
|
|
runtime_error=
|
|
|
|
"Testing process terminated. Either the compiler crashed or the compiled code crashed at runtime.\n",
|
|
|
|
trace=None,
|
|
|
|
golden_trace=None) for aborted_test_name in aborted_tests
|
|
|
|
]
|
|
|
|
results.extend(aborted_tests_results)
|
|
|
|
results.sort(key=lambda result: result.unique_name)
|
2021-04-20 06:12:29 +08:00
|
|
|
return results
|