NEC - basic usage
Note
Please note that a PDF version of the materials contained herein (including SOL) is also available.
Info
This document aims to provide basic information on how to use the NEC SX-Aurora Tsubasa system available at ICM UW computational facility. The contents herein are based on a number of documents, as referenced in the text, to provide a concise quick start guide and suggest further reading material for the ICM users.
To use the Tsubasa installation users must access the login node first
at hpc.icm.edu.pl
through SSH and then establish a further
connection to the Rysy cluster:
ssh username@hpc.icm.edu.pl
ssh rysy
Alternatively, the -J
command line option can be passed to the OpenSSH
client to specify a jump host (here the hpc
login node) through which
the connection to Rysy will be established (issue man ssh
command for
details).
The system runs Slurm Workload Manager for job scheduling and Environment Modules to manage software. The single compute node (PBaran) of the ve partition can be used interactively – as shown below – or as a batch job (see further in the text).
srun -A GRANT_ID -p ve --gres=ve:1 --pty bash -l
Once the interactive shell session has started, the environmental
variable $VE NODE NUMBER
is being automatically set to control which
VE card is to be used by the user programs. This variable can be read
and set manually with echo and export commands,
respectively. The software used to operate the VEs – including
binaries, libraries, header files, etc. – is installed in /opt/nec/ve
directory. Its effective use requires modification of the
environmental variables, such as $PATH
, $LD LIBRARY PATH
and
others, which can be done conveniently with the source command:
source /opt/nec/ve/mpi/2.2.0/bin/necmpivars.sh
Sourcing the variables makes various VE tools accessible within the
user environment. This includes the NEC compilers for C, C++, and
Fortran languages that can be invoked by ncc
, nc++
, and nfort
,
respectively, or by their respective MPI wrappers: mpincc
,
mpinc++
, and mpinfort
. Please note that several compiler versions
are currently installed and it might be necessary to include a version
number in your command, e.g. ncc-2.5.1
. The general usage is
consistent with the GNU GCC: <compiler> <options> <source file>
. The
table below lists several standard options for the NEC compilers – see
documentation for details.
Option | Description |
---|---|
-c |
create object file |
-o |
output file name |
-I/path/to/include |
include header files |
-L/path/to/lib |
include libraries |
-g |
debugger symbols |
-Wall |
enable syntax warnings |
-Werror |
treat warnings as errors |
-O[0-4] |
optimisation levels |
-ftrace |
use the profiler |
-proginf |
enable execution analysis |
-report-all |
report diagnostics |
-traceback |
provides traceback information |
-fdiag-vector=[0-3] |
level of details for vector diagnostics |
The last four of them are used for performance analysis and allow for efficient software development. Some of these, apart from being used as command line options at compile time, also rely on dedicated environmental variables that need to be set at runtime. For a full list of performance-related options, variables, as well as their output description, see PROGINF/FTRACE User’s Guide and the compiler-specific documentation.
The binaries can be run directly by specifying the path or by using
the VE loader program (ve exec
) – a few examples including parallel
execution are listed below:
./program
ve_exec ./program
mpirun ./program
mpirun -v -np 2 -ve 0-1 ./program # enables the use of VE cards 0 and 1
For a full list of options available for mpirun
see the corresponding
manual page or issue mpirun -h command
.
Full documentation for SX-Aurora Tsubasa, its hardware and software components, is available at the NEC website. An accessible introduction to using Tsubasa is also provided on the blog.
Another, non-interactive, mode of operation is a batch mode which requires a script to be submitted to Slurm. An example job script is shown below.
#!/bin/bash -l
#SBATCH -J name
#SBATCH -N 1
#SBATCH --ntasks-per-node 1
#SBATCH --mem 1000
#SBATCH --time=1:00:00
#SBATCH -A <Grant ID>
#SBATCH -p ve
#SBATCH --gres=ve:1
#SBATCH --output=out
./program
It specifies the name of the job (-J
), requested number of nodes (-N
),
CPUs (--ntasks-per-node
), memory (-mem
; here in Megabytes), wall time
limit (--time
), grant ID (-A
), partition (-p
), generic resources
(--gres
), output file (--output
), and the actual commands to be
executed once the resources are granted. See Slurm documentation for
an extensive list of available options.
Below are few basic example commands used to work with job scrips:
submitting the job (sbatch
) which returns the ID number assigned to
the it by the queuing system, listing the user’s jobs along with their
status (squeue
), listing the details of the specified job
(scontrol
), cancelling execution of the job (scancel
). Consult the
documentation for more.
sbatch job.sl # submits the job
squeue -u $USER # lists the user’s current jobs
scontrol show job <ID> # lists the details of the job specified by given <ID>
scancel <ID> # cancels the job with given <ID>
Since there’s no dedicated filesystem to be used for calculations on the Rysy cluster, in contrast to other ICM systems, the jobs should be run from within the $HOME directory. The ve partition (PBaran compute node) is intended for jobs utilizing VE cards, and as such it should not be used for intensive CPU-consuming tasks.