Commit b7319af9 authored by Lukas Werner's avatar Lukas Werner
Browse files

Added initial end user documentation

parent cf8de2af
## Prerequisites
* HPC account at the RRZE
* SSH key pair for authentification - public key registered with the RRZE, private key for later use
## GitLab configuration
### Setting up authentification
To ensure, that a given repository is entitled to run CI jobs as a given user, a authentification strategy using SSH keys is employed.
Most CI configuration happens in the `.gitlab-ci.yml` file.
However, to make the private SSH key available in the pipeline without exposing it in `.gitlab-ci.yml` a "secret" CI variable needs to be set.
To do so, navigate to `Settings > CI/CD > Variables` on your repositories GitLab page.
Click `Add variable`, use "AUTH_KEY" as `Key` and your SSH private key as `Value` and confirm the dialog.
A second secret variable, your HPC account name, has to be added.
Again click `Add variable`, this time supply "AUTH_USER" as `Key` and your HPC username as `Value` and confirm the dialog.
Together with the public key being deposited on the cluster, this will ensure proper authentification.
### Customizing `.gitlab-ci.yml`
In this file, we configure options for the SLURM submission on the test cluster.
A example config can be found in `example.gitlab-ci.yml`.
SLURM options can be set either globally in the `variables` section, or on a per-job basis.
The latter will override global variables with the same name.
```yaml
variables:
SLURM_NODELIST: "phinally"
SLURM_TIMELIMIT: "30"
...
benchmark-broadep2:
variables:
SLURM_NODELIST: "broadep2" # uses broadep2 instead of phinally for this benchmark
SLURM_TIMELIMIT: "10" # limit time to 10 instead of 30 minutes
```
This configuration already suffices to have the CI jobs running on the node `phinally`.
To pick a node to run your job on, set `SLURM_NODELIST` to the nodes hostname.
`SLURM_NODELIST` can only hold a single entry, as usage of multiple nodes at once is not available on the test cluster.
A list of available nodes with their descriptions can be found [here](https://hpc.fau.de/systems-services/systems-documentation-instructions/clusters/test-cluster/).
A few restrictions apply to the SLURM options:
* The SLURM partition (i.e. `SLURM_PARTITION`) is hardcoded to "work".
* Number of available nodes (`SLURM_NODES`) is hardcoded to 1. On the testcluster only individual nodes can be used.
* The time limit of a single job (`SLURM_TIMELIMIT`) is limited to 120 (minutes). 120 minutes is also the default time limit.
* The default node (i.e. if no `SLURM_NODELIST` is given) is "phinally".
* To optionally enable LIKWID, `SLURM_CONSTRAINT: "hwperf"` has to be added as variable.
In fact, almost all other options for the [`salloc` command](https://slurm.schedmd.com/salloc.html) used to submit your job can be customized.
To do so, pick the argument name to modify, e.g. `--mail-user`, remove leading dashes, uppercase it, replace dashes with underscores and prepend `SLURM_`, leading to e.g. `SLURM_MAIL_USER`.
This string is then to be used as the variable name, while the variable value can be customized as desired.
```yaml
variables:
SLURM_MAIL_USER: your@email.address
...
```
To disable submission of an individual job, add `NO_SLURM_SUMIT: 1` to its variables.
## Notes
A directory named `gitlab-runner` will be created in your `$WORK` directory.
It contains the build and execution directories and files for your CI jobs.
It may happen that your CI job fails if the node is occupied with other jobs for more than 24 hours.
In that case, simply restart the CI job.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment