On Cosmos, there are 24+24 cores on each node with 5.3 GB memory on each core.
Thinlink login node: cosmos-dt.lunarc.lu.se
On Aurora, there are 20 cores on each node, in total with 62 GB
memory (3.1 GB/core).
On Abisko, they instead come in
multiples of 6. NB: You cannot run with less than 6 cores - if you
run single-processor jobs, you will still be charged for 6 cores.
On
Kebnekaise, there are 28 cores on each node
However, in the
large-memory queue, they instead come in batches of (4x)18. There are
42 GB memory on each core, in total 3072 GB.
The disk space on
the nodes are very limited. On largemem-nodes, it is around 391 GB in
total. No, in practice it seems to be only 200 GB. This is
shared between all 28 cores, without any rules, so the amount you can
use is random.
On Tetralith, there are 32 (2 x 16) cores
on each node.
Thin cores have 3 GB memory each and a disk of 210
GB in total.
Fat cores have 12 GB memory each and a disk of 874
GB in total.
There are also GPU nodes with 2TB disk.
For
compiling Fortran, use buildenv-gcc,
but only when compiling, not when running.
On Dardel, there are 128 cores (2 x 64) on each node.
Thin
nodes have 2 GB memory each (256 in total).
There are also
large, huge and giant nodes with 4, 8 and 16 GB each.
There are several partitions. To get less than a full node, use
#SBATH - p shared
The max length of a job is 24 h, except in partition long, where it is 168 h (but then you need to use a full node).
You log in
by
kinit -f <user_name>@NADA.KTH.SE
ssh
<user_name>@dardel.pdc.kth.se
If you have problem
with similar jobs taking very different amount of time, try to
use
#SBATCH --exclusive
Then you get a node or your own
(and you are charged for all the cores, i.e. 16 on
Alarik).
snicquota - to show quota on Lunarc
quota -
at HPC2N
kinit - get a new password on HPC2N
Update
pocket pass:
https://lunarc-documentation.readthedocs.io/en/latest/authenticator_howto/#checking-the-validity-of-your-token
Note
that you login with your lunarc user name and password, as when you
log in to Aurora.
Go to:
https://phenix3.lunarc.lu.se/selfservice/authenticate/unpwotp
Click
on Tokens
Click on More
Click on Activate - Says success,
but does change from Expired
Tried Activate Phenix Pocket Pass,
but then a new token was installed.
After one day, some of them
worked, I do not know which.
sbatch - submit a job
squeue - See queued jobs
scancel -
kill a job
scontrol - get lots of information about a job
scontrol show jobid -dd <jobid>
$SNIC_TMP - your folder on the temporary disk
on the remote node
$SLURM_SUBMIT_DIR - the directory from which
you submitted the job
scontrol show jobid 3216884
- gives estimated time for job to start
Template sbatch file on Lunarc
#!/bin/sh
#BATCH -n
1
#SBATCH -t 168:00:00
module add intel
export
AMBERHOME=/lunarc/nobackup/projects/bio/Amber12
export
TURBODIR=/lunarc/nobackup/projects/bio/TURBO/Turbo6.5
export
CNS_SOLVE=/sw/pkg/bio/CNS/cns_solve_1.21
PATH=$AMBERHOME/exe:$PATH
PATH=$TURBODIR/scripts:$TURBODIR/bin/x86_64-unknown-linux-gnu:$PATH
PATH=$PATH:/sw/pkg/bio/Bin/Gfortran:/sw/pkg/bio/Bin:$HOME/Bin
export
PATH
cd $SNIC_TMP
/bin/rm -r *
cp -p
$SLURM_SUBMIT_DIR/* .
jobex -backup -ri -c 800
cp -pu
energy $SLURM_SUBMIT_DIR
d.o. with memory
request
#!/bin/sh
#SBATCH -n 1
#SBATCH -t
168:00:00
#SBATCH --mem-per-cpu 3900
module add
intel
export
AMBERHOME=/lunarc/nobackup/projects/bio/Amber12
export
TURBODIR=/lunarc/nobackup/projects/bio/TURBO/Turbo6.5
export
CNS_SOLVE=/sw/pkg/bio/CNS/cns_solve_1.21
PATH=$AMBERHOME/exe:$PATH
PATH=$TURBODIR/scripts:$TURBODIR/bin/x86_64-unknown-linux-gnu:$PATH
PATH=$PATH:/sw/pkg/bio/Bin/Gfortran:/sw/pkg/bio/Bin:$HOME/Bin
export
PATH
cd $SNIC_TMP
/bin/rm -r *
cp -p
$SLURM_SUBMIT_DIR/* .
for x in 1 2 3 4 5 6 7 8 9 10 ; do
ln
-fs coord-c"$x" coord
kdg restart
ridft
> $SLURM_SUBMIT_DIR/logd"$x"
cp -p out.ccf
$SLURM_SUBMIT_DIR/out"$x".ccf
done
cp -pu energy
$SLURM_SUBMIT_DIR
#!/bin/sh
#SBATCH -t 168:00:00
#SBATCH -p
largemem
#SBATCH -A SNIC2016-34-18
#SBATCH --gres=gpu:k80:x
with x=1, 2, or 4
For
x=2, you are charged for 14 cores, for x=4, 28 cores
#SBATCH --gres=gpu:k80:x,mps Nvidia Multi Process Service (
module add <module> - add module
module spider <module>
- find out how to load a certain module with dependences
module
purge - remove all loaded modules
module help - get some
help
If a job is terminated prematurely, for example, if it exceeds the
requested walltime, the files on the local disk (in $SNIC_TMP) will
be lost. Files that would still be useful can be listed in a special
file $SNIC_TMP/pbs_save_files. Filenames are assumed to be relative
to $SNIC_TMP and should be separated by spaces or listed on separate
lines. These files will be copied to $PBS_O_WORKDIR regardless
whether the job ends as planned or is deleted, unless there is a
problem with the disk or node itself. For parallel jobs, only files
on the master node will be
copied. Note that this feature is unique to Lunarc.
On Alarik the corresponding file should instead be called
$SNIC_TMP/slurm_save_files
(MUL 12/9-12)
Some of you have noted that the queuing system on Aurora sometimes seemingly randomly restarts running jobs from the beginning again.
I just talked to Magnus about this and he said that you can avoid this behaviour by setting in the sbatch file
#SBATCH --no-requeue
Then, the job will instead die (he said that it is caused by communication problems with a certain node) and you have to restart it by hand, but sometimes, especially with long MD simulations, this is strongly preferred.
UR 28/10-16
Information of CPU usage
alarik: projinfo -y (last
year); if projinfo (last month) goes over allocation, you get a low
priority
platon: projinfo (but only for last month)
checkjob
add (on your local machine)
emacs*font: 7x14
to ~/.Xdefaults
and execute
xrdb -merge ~/.Xdefaults
Valera 23/10-14
Interactive jobs on Akka
salloc -n 1 time 1:00:00
(one node and 1 hour)
When it replies that a node is
granted, you can start jobs on that node with srun:
srun command
How to avoid using onetime password
every time loggin in to Aurora
add to .ssh/config:
host *
ControlMaster auto
ControlPath ~/.ssh/ssh_mux_%h_%p_%r
Then, ssh aurora; use your password and onetime password.
Do not close the shell, open another konsole and do ssh or scp.
VV
15/11-17
Platon:
qsub
qstat
qdel
$PBS_O_LOCAL
$PBS_O_WORKDIR
On
Alarik, there are 16 cores on each node (8 on each processor)
Template qsub file on Platon
#/bin/sh
#PBS -l
nodes=1
#PBS -l walltime=70:00:00
#PBS -j oe
.
use_modules
#module add intel/11.1
module add intel
#module
add openmpi/1.4.1/intel/11.1
export
AMBERHOME=/sw/pkg/bio/Amber10
export
TURBOMOLE_SYSNAME=x86_64-unknown-linux-gnu
export
TURBODIR=/sw/pkg/bio/TURBO/Turbo6.5
export
CNS_SOLVE=/sw/pkg/bio/CNS/cns_solve_1.21
PATH=$AMBERHOME/exe:$PATH
PATH=$TURBODIR/scripts:$TURBODIR/bin/x86_64-unknown-linux-gnu:$PATH
PATH=$PATH:/sw/pkg/bio/Bin/Gfortran:/sw/pkg/bio/Bin:$HOME/Bin
export
PATH
cd $PBS_O_LOCAL
#/bin/rm -r *
cp -p
$PBS_O_WORKDIR/* .
jobex -backup -ri -c 800
cp -pu *
$PBS_O_WORKDIR
Set up a new user on the computer centres
New user: Go to https://supr.naiss.se and register new person.
Ulf: Log in to SUPR and add user to the project.
New user: Fill in form (to SUPR) and send it in with a copy of passport
New user: Go in to SUPR and apply for an account at both Lunarc and the other computer centres.