Commit 30da6576 authored by Jan Hönig's avatar Jan Hönig
Browse files

Merge branch 'RemoveOpenCL' into 'master'

Removed OpenCL

See merge request pycodegen/pystencils!278
parents 0ed1a87b 9afc38bb
Pipeline #35861 passed with stages
in 24 minutes and 13 seconds
......@@ -4,3 +4,4 @@
### Removed
* LLVM backend because it was not used much and not good integrated in pystencils.
* OpenCL backend because it was not used much and not good integrated in pystencils.
......@@ -53,7 +53,6 @@ Without `[interactive]` you get a minimal version with very little dependencies.
All options:
- `gpu`: use this if an NVIDIA GPU is available and CUDA is installed
- `opencl`: basic OpenCL support (experimental)
- `alltrafos`: pulls in additional dependencies for loop simplification e.g. libisl
- `bench_db`: functionality to store benchmark result in object databases
- `interactive`: installs dependencies to work in Jupyter including image I/O, plotting etc.
......
%% Cell type:code id: tags:
``` python
from pystencils.session import *
```
%% Cell type:markdown id: tags:
# Tutorial 02: Basic Kernel generation with *pystencils*
Now that you have an [overview of pystencils](01_tutorial_getting_started.ipynb),
this tutorial shows in more detail how to formulate, optimize and run stencil kernels.
## 1) Kernel Definition
### a) Defining kernels with assignment lists and the `kernel` decorator
*pystencils* gets a symbolic formulation of the kernel. This can be either an `Assignment` or a sequence of `Assignment`s that follow a set of restrictions.
Lets first create a kernel that consists of multiple assignments:
%% Cell type:code id: tags:
``` python
src_arr = np.zeros([20, 30])
dst_arr = np.zeros_like(src_arr)
dst, src = ps.fields(dst=dst_arr, src=src_arr)
```
%% Cell type:code id: tags:
``` python
grad_x, grad_y = sp.symbols("grad_x, grad_y")
symbolic_description = [
ps.Assignment(grad_x, (src[1, 0] - src[-1, 0]) / 2),
ps.Assignment(grad_y, (src[0, 1] - src[0, -1]) / 2),
ps.Assignment(dst[0, 0], grad_x + grad_y),
]
kernel = ps.create_kernel(symbolic_description)
symbolic_description
```
%%%% Output: execute_result
$$\left [ grad_{x} \leftarrow \frac{{{src}_{E}}}{2} - \frac{{{src}_{W}}}{2}, \quad grad_{y} \leftarrow \frac{{{src}_{N}}}{2} - \frac{{{src}_{S}}}{2}, \quad {{dst}_{C}} \leftarrow grad_{x} + grad_{y}\right ]$$
![]()
$\displaystyle \left[ grad_{x} \leftarrow \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2}, \ grad_{y} \leftarrow \frac{{src}_{(0,1)}}{2} - \frac{{src}_{(0,-1)}}{2}, \ {dst}_{(0,0)} \leftarrow grad_{x} + grad_{y}\right]$
⎡ src_E src_W src_N src_S ⎤
⎢gradₓ := ───── - ─────, grad_y := ───── - ─────, dst_C := gradₓ + grad_y⎥
⎣ 2 2 2 2 ⎦
%% Cell type:markdown id: tags:
We created subexpressions, using standard sympy symbols on the left hand side, to split the kernel into multiple assignments. Defining a kernel using a list of `Assignment`s is quite tedious and hard to read.
To simplify the formulation of a kernel, *pystencils* offers the `kernel` decorator, that transforms a normal Python function with `@=` assignments into an assignment list that can be passed to `create_kernel`.
%% Cell type:code id: tags:
``` python
@ps.kernel
def symbolic_description_using_function():
grad_x @= (src[1, 0] - src[-1, 0]) / 2
grad_y @= (src[0, 1] - src[0, -1]) / 2
dst[0, 0] @= grad_x + grad_y
symbolic_description_using_function
```
%%%% Output: execute_result
$$\left [ grad_{x} \leftarrow \frac{{{src}_{E}}}{2} - \frac{{{src}_{W}}}{2}, \quad grad_{y} \leftarrow \frac{{{src}_{N}}}{2} - \frac{{{src}_{S}}}{2}, \quad {{dst}_{C}} \leftarrow grad_{x} + grad_{y}\right ]$$
![]()
$\displaystyle \left[ grad_{x} \leftarrow \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2}, \ grad_{y} \leftarrow \frac{{src}_{(0,1)}}{2} - \frac{{src}_{(0,-1)}}{2}, \ {dst}_{(0,0)} \leftarrow grad_{x} + grad_{y}\right]$
⎡ src_E src_W src_N src_S ⎤
⎢gradₓ := ───── - ─────, grad_y := ───── - ─────, dst_C := gradₓ + grad_y⎥
⎣ 2 2 2 2 ⎦
%% Cell type:markdown id: tags:
The decorated function can contain any Python code, only the `@=` operator, and the ternary inline `if-else` operator have different meaning.
### b) Ternary 'if' with `Piecewise`
The ternary operator maps to `sympy.Piecewise` functions, that can be used to introduce branching into the kernel. Piecewise defined functions must give a value for every input, i.e. there must be a 'otherwise' clause in the end that is indicated by the condition `True`. Piecewise objects are standard sympy terms that can be integrated into bigger expressions:
%% Cell type:code id: tags:
``` python
sp.Piecewise((1.0, src[0,1] > 0), (0.0, True)) + src[1, 0]
```
%%%% Output: execute_result
$${{src}_{E}} + \begin{cases} 1.0 & \text{for}\: {{src}_{N}} > 0 \\0.0 & \text{otherwise} \end{cases}$$
![]()
$\displaystyle {src}_{(1,0)} + \begin{cases} 1.0 & \text{for}\: {src}_{(0,1)} > 0 \\0.0 & \text{otherwise} \end{cases}$
⎛⎧1.0 for src_N > 0⎞
src_E + ⎜⎨ ⎟
⎝⎩0.0 otherwise ⎠
%% Cell type:markdown id: tags:
Piecewise objects are created by the `kernel` decorator for ternary if-else statements.
%% Cell type:code id: tags:
``` python
@ps.kernel
def kernel_with_piecewise():
grad_x @= (src[1, 0] - src[-1, 0]) / 2 if src[-1, 0] > 0 else 0.0
kernel_with_piecewise
```
%%%% Output: execute_result
$$\left [ grad_{x} \leftarrow \begin{cases} \frac{{{src}_{E}}}{2} - \frac{{{src}_{W}}}{2} & \text{for}\: {{src}_{W}} > 0 \\0.0 & \text{otherwise} \end{cases}\right ]$$
![]()
$\displaystyle \left[ grad_{x} \leftarrow \begin{cases} \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2} & \text{for}\: {src}_{(-1,0)} > 0 \\0.0 & \text{otherwise} \end{cases}\right]$
⎡ ⎧src_E src_W ⎤
⎢ ⎪───── - ───── for src_W > 0⎥
⎢gradₓ := ⎨ 2 2 ⎥
⎢ ⎪ ⎥
⎣ ⎩ 0.0 otherwise ⎦
%% Cell type:markdown id: tags:
### c) Assignment level optimizations using `AssignmentCollection`
When the kernels get larger and more complex, it is helpful to organize the list of assignment into a more structured way. The `AssignmentCollection` offers optimizating transformation on a list of assignments. It holds two assignment lists, one for subexpressions and one for the main assignments. Main assignments are typically those that write to an array.
%% Cell type:code id: tags:
``` python
@ps.kernel
def somewhat_longer_dummy_kernel(s):
s.a @= src[0, 1] + src[-1, 0]
s.b @= 2 * src[1, 0] + src[0, -1]
s.c @= src[0, 1] + 2 * src[1, 0] + src[-1, 0] + src[0, -1] - src[0,0]
dst[0, 0] @= s.a + s.b + s.c
ac = ps.AssignmentCollection(main_assignments=somewhat_longer_dummy_kernel[-1:],
subexpressions=somewhat_longer_dummy_kernel[:-1])
ac
```
%%%% Output: execute_result
Equation Collection for dst_C
AssignmentCollection: dst_C, <- f(src_W, src_S, src_N, src_C, src_E)
%% Cell type:code id: tags:
``` python
ac.operation_count
```
%%%% Output: execute_result
{'adds': 8, 'muls': 2, 'divs': 0}
{'adds': 8,
'muls': 2,
'divs': 0,
'sqrts': 0,
'fast_sqrts': 0,
'fast_inv_sqrts': 0,
'fast_div': 0}
%% Cell type:markdown id: tags:
The `pystencils.simp` submodule offers several functions to optimize a collection of assignments.
It also offers functionality to group optimization into strategies and evaluate them.
In this example we reduce the number of operations by reusing existing subexpressions to get rid of two unnecessary floating point additions. For more information about assignment collections and simplifications see the [demo notebook](demo_assignment_collection.ipynb).
%% Cell type:code id: tags:
``` python
opt_ac = ps.simp.subexpression_substitution_in_existing_subexpressions(ac)
opt_ac
```
%%%% Output: execute_result
Equation Collection for dst_C
AssignmentCollection: dst_C, <- f(src_W, src_S, src_N, src_C, src_E)
%% Cell type:code id: tags:
``` python
opt_ac.operation_count
```
%%%% Output: execute_result
{'adds': 6, 'muls': 1, 'divs': 0}
{'adds': 6,
'muls': 1,
'divs': 0,
'sqrts': 0,
'fast_sqrts': 0,
'fast_inv_sqrts': 0,
'fast_div': 0}
%% Cell type:markdown id: tags:
### d) Ghost layers and iteration region
When creating a kernel with neighbor accesses, *pystencils* automatically restricts the iteration region, such that all accesses are safe.
%% Cell type:code id: tags:
``` python
kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0]))
ps.show_code(kernel)
```
%%%% Output: display_data
%%%% Output: execute_result
%%%% Output: display_data
FUNC_PREFIX void kernel(double * RESTRICT fd_dst, double * RESTRICT const fd_src)
{
for (int ctr_0 = 2; ctr_0 < 18; ctr_0 += 1)
{
double * RESTRICT fd_dst_C = 30*ctr_0 + fd_dst;
double * RESTRICT const fd_src_2E = 30*ctr_0 + fd_src + 60;
double * RESTRICT const fd_src_W = 30*ctr_0 + fd_src - 30;
for (int ctr_1 = 2; ctr_1 < 28; ctr_1 += 1)
{
fd_dst_C[ctr_1] = fd_src_2E[ctr_1] + fd_src_W[ctr_1];
}
}
}
%% Cell type:markdown id: tags:
When no additional ghost layer information is given, *pystencils* looks at all neighboring field accesses and introduces the required number of ghost layers **for all directions**. In the example above the largest neighbor accesses was ``src[2, 0]``, so theoretically we would need 2 ghost layers only the the end of the x coordinate.
By default *pystencils* introduces 2 ghost layers at all borders of the domain. The next cell shows how to change this behavior. Be careful with manual ghost layer specification, wrong values may lead to SEGFAULTs.
%% Cell type:code id: tags:
``` python
gl_spec = [(0, 2), # 0 ghost layers at the left, 2 at the right border
(1, 0)] # 1 ghost layer at the lower y, one at the upper y coordinate
kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0]), ghost_layers=gl_spec)
ps.show_code(kernel)
```
%%%% Output: display_data
%%%% Output: execute_result
%%%% Output: display_data
FUNC_PREFIX void kernel(double * RESTRICT fd_dst, double * RESTRICT const fd_src)
{
for (int ctr_0 = 0; ctr_0 < 18; ctr_0 += 1)
{
double * RESTRICT fd_dst_C = 30*ctr_0 + fd_dst;
double * RESTRICT const fd_src_2E = 30*ctr_0 + fd_src + 60;
double * RESTRICT const fd_src_W = 30*ctr_0 + fd_src - 30;
for (int ctr_1 = 1; ctr_1 < 30; ctr_1 += 1)
{
fd_dst_C[ctr_1] = fd_src_2E[ctr_1] + fd_src_W[ctr_1];
}
}
}
%% Cell type:markdown id: tags:
## 2 ) Restrictions
### a) Independence Restriction
*pystencils* only works for kernels where each array element can be updated independently from all other elements. This restriction ensures that the kernels can be easily parallelized and also be run on the GPU. Trying to define kernels where the results depends on the iteration order, leads to a ValueError.
%% Cell type:code id: tags:
``` python
invalid_description = [
ps.Assignment(dst[1, 0], src[1, 0] + src[-1, 0]),
ps.Assignment(dst[0, 0], src[1, 0] - src[-1, 0]),
]
try:
invalid_kernel = ps.create_kernel(invalid_description)
assert False, "Should never be executed"
except ValueError as e:
print(e)
```
%%%% Output: stream
Field dst is written at two different locations
%% Cell type:markdown id: tags:
The independence restriction makes sure that the kernel can be safely parallelized by checking the following conditions: If a field is modified inside the kernel, it may only be modified at a single spatial position. In that case the field may also only be read at this position. Fields that are not modified may be read at multiple neighboring positions.
Specifically, this rule allows for in-place updates that don't access neighbors.
%% Cell type:code id: tags:
``` python
valid_kernel = ps.create_kernel(ps.Assignment(src[0,0], 2*src[0,0] + 42))
```
%% Cell type:markdown id: tags:
If a field stores multiple values per cell, as in the next example, this restriction only applies for accesses with the same index.
%% Cell type:code id: tags:
``` python
v = ps.fields("v(2): double[2D]")
valid_kernel = ps.create_kernel([ps.Assignment(v[0,0](1), 2*v[0,0](1) + 42),
ps.Assignment(v[0,1](0), 2*v[1,0](0) + 42)])
```
%% Cell type:markdown id: tags:
### b) Static Single Assignment Form
All assignments that don't write to a field must be in SSA form
1. Each sympy symbol may only occur once as a left-hand-side (fields can be written multiple times)
2. A symbol has to be defined before it is used. If it is never defined it is introduced as function parameter
The next cell demonstrates the first SSA restriction:
%% Cell type:code id: tags:
``` python
@ps.kernel
def not_allowed():
a, b = sp.symbols("a b")
a @= src[0, 0]
b @= a + 3
a @= src[-1, 0]
dst[0, 0] @= a + b
try:
ps.create_kernel(not_allowed)
assert False
except ValueError as e:
print(e)
```
%%%% Output: stream
Assignments not in SSA form, multiple assignments to a
%% Cell type:markdown id: tags:
However, for right hand sides that are Field.Accesses this is allowed:
%% Cell type:code id: tags:
``` python
@ps.kernel
def allowed():
dst[0, 0] @= src[0, 1] + src[1, 0]
dst[0, 0] @= 2 * dst[0, 0]
ps.create_kernel(allowed)
```
%%%% Output: execute_result
KernelFunction kernel([<double * RESTRICT fd_dst>, <double * RESTRICT const fd_src>])
KernelFunction kernel([_data_dst, _data_src])
......
......@@ -47,7 +47,7 @@ def generate_c(ast_node: Node,
Args:
ast_node: ast representation of kernel
signature_only: generate signature without function body
dialect: `Backend`: 'C', 'CUDA' or 'OPENCL'
dialect: `Backend`: 'C' or 'CUDA'
custom_backend: use own custom printer for code generation
with_globals: enable usage of global variables
Returns:
......@@ -71,9 +71,6 @@ def generate_c(ast_node: Node,
elif dialect == Backend.CUDA:
from pystencils.backends.cuda_backend import CudaBackend
printer = CudaBackend(signature_only=signature_only)
elif dialect == Backend.OPENCL:
from pystencils.backends.opencl_backend import OpenClBackend
printer = OpenClBackend(signature_only=signature_only)
else:
raise ValueError(f'Unknown {dialect=}')
code = printer(ast_node)
......
acos
acosh
acospi
asin
asinh
asinpi
atan
atan2
atanh
atanpi
atan2pi
cbrt
ceil
copysign
cos
cosh
cospi
erfc
erf
exp
exp2
exp10
expm1
fabs
fdim
floor
fma
fmax
fmax
fmin45
fmin
fmod
fract
frexp
hypot
ilogb
ldexp
lgamma
lgamma_r
log
log2
log10
log1p
logb
mad
maxmag
minmag
modf
nextafter
pow
pown
powr
remquo
intn
remquo
rint
rootn
rootn
round
rsqrt
sin
sincos
sinh
sinpi
sqrt
tan
tanh
tanpi
tgamma
trunc
half_cos
half_divide
half_exp
half_exp2
half_exp10
half_log
half_log2
half_log10
half_powr
half_recip
half_rsqrt
half_sin
half_sqrt
half_tan
native_cos
native_divide
native_exp
native_exp2
native_exp10
native_log
native_log2
native_log10
native_powr
native_recip
native_rsqrt
native_sin
native_sqrt
native_tan
from os.path import dirname, join
import pystencils.data_types
from pystencils.astnodes import Node
from pystencils.backends.cbackend import CustomSympyPrinter, generate_c
from pystencils.backends.cuda_backend import CudaBackend, CudaSympyPrinter
from pystencils.enums import Backend
from pystencils.fast_approximation import fast_division, fast_inv_sqrt, fast_sqrt
with open(join(dirname(__file__), 'opencl1.1_known_functions.txt')) as f:
lines = f.readlines()
OPENCL_KNOWN_FUNCTIONS = {l.strip(): l.strip() for l in lines if l}
def generate_opencl(ast_node: Node, signature_only: bool = False, custom_backend=None, with_globals=True) -> str:
"""Prints an abstract syntax tree node (made for `Target` 'GPU') as OpenCL code. # TODO Backend instead of Target?
Args:
ast_node: ast representation of kernel
signature_only: generate signature without function body
custom_backend: use own custom printer for code generation
with_globals: enable usage of global variables
Returns:
OpenCL code for the ast node and its descendants
"""
return generate_c(ast_node, signature_only, dialect=Backend.OPENCL,
custom_backend=custom_backend, with_globals=with_globals)
class OpenClBackend(CudaBackend):
def __init__(self,
sympy_printer=None,
signature_only=False):
if not sympy_printer:
sympy_printer = OpenClSympyPrinter()
super().__init__(sympy_printer, signature_only)
self._dialect = Backend.OPENCL
def _print_Type(self, node):
code = super()._print_Type(node)
if isinstance(node, pystencils.data_types.PointerType):
return "__global " + code
else:
return code
def _print_ThreadBlockSynchronization(self, node):
raise NotImplementedError()
def _print_TextureDeclaration(self, node):
raise NotImplementedError()
class OpenClSympyPrinter(CudaSympyPrinter):
language = "OpenCL"
DIMENSION_MAPPING = {
'x': '0',
'y': '1',
'z': '2'
}
INDEXING_FUNCTION_MAPPING = {
'blockIdx': 'get_group_id',
'threadIdx': 'get_local_id',
'blockDim': 'get_local_size',
'gridDim': 'get_global_size'
}
def __init__(self):
CustomSympyPrinter.__init__(self)
self.known_functions = OPENCL_KNOWN_FUNCTIONS
def _print_Type(self, node):
code = super()._print_Type(node)
if isinstance(node, pystencils.data_types.PointerType):
return "__global " + code
else:
return code
def _print_ThreadIndexingSymbol(self, node):
symbol_name: str = node.name
function_name, dimension = tuple(symbol_name.split("."))
dimension = self.DIMENSION_MAPPING[dimension]
function_name = self.INDEXING_FUNCTION_MAPPING[function_name]
return f"(int64_t) {function_name}({dimension})"
def _print_TextureAccess(self, node):
raise NotImplementedError()