improve tutorials with fixes, module docs, troubleshooting, and exercise verification (b3a68e1f) · Commits · DAIC / docs-experimental

content/en/tutorials/apptainer/index.md

+128 −1

Original line number	Diff line number	Diff line
		@@ -30,7 +30,7 @@ Containerization packages your software, libraries, and dependencies into a sing
		On DAIC specifically, users often encounter issues with limited home directory space or Windows-based `/tudelft.net` mounts (see [Storage](/docs/system/storage)), which can complicate the use of `conda/mamba` and/or `pip`. Containers offer a solution by encapsulating all software and dependencies in a self-contained environment. You can, for instance, store containers on `staff-umbrella` with all required dependencies, including those installed via `pip`, and run them reliably and reproducibly without being limited by home directory size or mount compatibility.

		## Containerization on DAIC: Apptainer
		DAIC supports [Apptainer](https://apptainer.org/docs/user/main/introduction.html) (previously Apptainer), an open-source container platform, designed to run on High-performance computing environments. Apptainer runs container images securely on shared clusters and allows you to use Docker images directly, without needing Docker itself.
		DAIC supports [Apptainer](https://apptainer.org/docs/user/main/introduction.html) (formerly known as Singularity), an open-source container platform designed for high-performance computing environments. Apptainer runs container images securely on shared clusters and allows you to use Docker images directly, without needing Docker itself.

		A typical Apptainer workflow revolves around three key components:

		@@ -584,6 +584,16 @@ Pull the `python:3.11-slim` image from Docker Hub and explore it:
		4. List the contents of `/usr/local/lib/python3.11/`
		5. Exit the container

		{{% alert title="Check your work" color="info" %}}
		After pulling, you should have `python_3.11-slim.sif`. Inside the container:
		```shell-session
		Apptainer> python --version
		Python 3.11.x
		Apptainer> ls /usr/local/lib/python3.11/
		... site-packages ...
		```
		{{% /alert %}}

		### Exercise 2: Run a command in a container

		Using the Python image from Exercise 1:
		@@ -592,6 +602,19 @@ Using the Python image from Exercise 1:
		2. Use `apptainer exec` to run the script inside the container
		3. Try running it with the `-C` flag - what happens to your script?

		{{% alert title="Check your work" color="info" %}}
		Without `-C`:
		```shell-session
		$ apptainer exec python_3.11-slim.sif python hello.py
		Hello from Apptainer!
		```
		With `-C`, you get an error because the container can't see your files:
		```shell-session
		$ apptainer exec -C python_3.11-slim.sif python hello.py
		python: can't open file 'hello.py': [Errno 2] No such file or directory
		```
		{{% /alert %}}

		### Exercise 3: Build a custom image

		Create a definition file for a container with your favorite tools:
		@@ -601,6 +624,17 @@ Create a definition file for a container with your favorite tools:
		3. Add a `%runscript` that displays a welcome message
		4. Build the image and test it with `apptainer run`

		{{% alert title="Check your work" color="info" %}}
		After building:
		```shell-session
		$ apptainer run mytools.sif
		Welcome to my custom container!
		$ apptainer exec mytools.sif which curl jq
		/usr/bin/curl
		/usr/bin/jq
		```
		{{% /alert %}}

		### Exercise 4: GPU container on DAIC

		Test GPU access with a prebuilt image:
		@@ -610,6 +644,16 @@ Test GPU access with a prebuilt image:
		3. Run a Python command that checks `torch.cuda.is_available()`
		4. Verify the GPU is detected with `nvidia-smi` inside the container

		{{% alert title="Check your work" color="info" %}}
		```shell-session
		$ srun apptainer exec --nv pytorch.sif python -c "import torch; print(torch.cuda.is_available())"
		True
		$ srun apptainer exec --nv pytorch.sif nvidia-smi
		... (GPU info displayed) ...
		```
		If you see `False`, check that you used `--nv` and requested a GPU with `--gres=gpu:1`.
		{{% /alert %}}

		### Exercise 5: Bind mounts

		Practice data isolation:
		@@ -619,6 +663,89 @@ Practice data isolation:
		3. Inside the container, verify you can access the test file but not your home directory
		4. Try mounting the directory as read-only with `--mount`

		{{% alert title="Check your work" color="info" %}}
		```shell-session
		$ mkdir testdir && echo "test" > testdir/data.txt
		$ apptainer shell -C --bind testdir:/mnt ubuntu_latest.sif
		Apptainer> cat /mnt/data.txt
		test
		Apptainer> ls /home/$USER
		ls: cannot access '/home/...': No such file or directory
		```
		With read-only mount, writing fails:
		```shell-session
		$ apptainer shell -C --mount type=bind,source=testdir,destination=/mnt,ro ubuntu_latest.sif
		Apptainer> echo "new" >> /mnt/data.txt
		bash: /mnt/data.txt: Read-only file system
		```
		{{% /alert %}}

		---

		## Troubleshooting

		### Build fails with "no space left on device"

		Apptainer uses your home directory for temporary files during builds. Since `/home` on DAIC is limited to 5 MB, builds often fail.

		Solution: Set a different cache directory before building:

		```shell-session
		$ export APPTAINER_CACHEDIR=/tudelft.net/staff-umbrella/<project>/apptainer/cache
		$ export APPTAINER_TMPDIR=/tudelft.net/staff-umbrella/<project>/apptainer/tmp
		$ mkdir -p $APPTAINER_CACHEDIR $APPTAINER_TMPDIR
		```

		Add these to your `~/.bashrc` to make them permanent.

		### GPU not visible inside container

		Your container runs but `torch.cuda.is_available()` returns `False` or `nvidia-smi` fails.

		Possible causes and solutions:

		1. Missing `--nv` flag: Always pass `--nv` to enable GPU access:
		```shell-session
		$ apptainer exec --nv myimage.sif python -c "import torch; print(torch.cuda.is_available())"
		```

		2. Not running on a GPU node: Check that you requested a GPU and are using `srun`:
		```shell-session
		$ salloc --gres=gpu:1 ...
		$ srun apptainer exec --nv myimage.sif nvidia-smi
		```

		3. CUDA version mismatch: The container's CUDA version must be compatible with the host driver. Check host driver version:
		```shell-session
		$ nvidia-smi \| grep "Driver Version"
		```

		### Cache filling up disk space

		Apptainer caches pulled images and build layers. This can consume significant space over time.

		Solution: Periodically clean the cache:

		```shell-session
		$ apptainer cache clean
		```

		To see cache usage:

		```shell-session
		$ apptainer cache list
		```

		### Container can't access my files

		By default, Apptainer mounts your home directory and current working directory. With `-C` (contain), the container is isolated.

		Solution: Explicitly bind the directories you need:

		```shell-session
		$ apptainer exec -C --bind /tudelft.net/staff-umbrella/myproject:/data myimage.sif ls /data
		```

		---

		## Summary

content/en/tutorials/bash/index.md

+44 −3

Original line number	Diff line number	Diff line
		@@ -95,8 +95,8 @@ $ cd ~
		$ pwd
		```

		{{% alert title="Question" color="info" %}}
		What did you see in `/tudelft.net/staff-umbrella`? These are project directories - you'll have access to at least one for your work.
		{{% alert title="Check your work" color="info" %}}
		You should see project directories when listing `/tudelft.net/staff-umbrella`. After `cd ~` and `pwd`, you should see your home directory path (e.g., `/home/netid01`).
		{{% /alert %}}

		## Part 2: Understanding DAIC storage
		@@ -202,6 +202,18 @@ $ echo "Author: $(whoami)" >> nlp-project/README.md
		$ cat nlp-project/README.md
		```

		{{% alert title="Check your work" color="info" %}}
		`ls nlp-project` should show:
		```
		data notebooks outputs src
		```
		`cat nlp-project/README.md` should show:
		```
		# NLP Project
		Author: <your-netid>
		```
		{{% /alert %}}

		## Part 4: Working with files

		Let's create some actual code to work with.
		@@ -309,6 +321,14 @@ $ ls src
		evaluate.py train.py
		```

		{{% alert title="Check your work" color="info" %}}
		`ls src` should show both files:
		```
		evaluate.py train.py
		```
		If `train.py` is missing, you may have forgotten to copy before moving.
		{{% /alert %}}

		## Part 5: Viewing and editing files

		### Viewing file contents
		@@ -418,6 +438,10 @@ $ grep -l "import" src/*.py # Just show filenames
		$ find . -type d -name "data"
		```

		{{% alert title="Check your work" color="info" %}}
		The `find . -mtime -1` command should list files you recently created. The `grep -n` command shows line numbers where "print" appears. The directory search should show `./data` (and any other data directories you created).
		{{% /alert %}}

		## Part 7: Transferring files

		You'll often need to move data between your local computer and DAIC.
		@@ -471,6 +495,14 @@ $ cat ~/linuxhome/test.txt
		test data
		```

		{{% alert title="Check your work" color="info" %}}
		After the `scp` command, you should see:
		```
		test.txt 100% 10 0.0KB/s 00:00
		```
		On DAIC, `cat ~/linuxhome/test.txt` should display "test data".
		{{% /alert %}}

		## Part 8: Automating with scripts

		When you find yourself typing the same commands repeatedly, it's time to write a script.
		@@ -600,6 +632,15 @@ $ chmod +x cleanup_logs.sh
		$ ./cleanup_logs.sh logs/
		```

		{{% alert title="Check your work" color="info" %}}
		Verify the script is executable:
		```shell-session
		$ ls -l cleanup_logs.sh
		-rwxr-xr-x 1 netid01 netid01 ... cleanup_logs.sh
		```
		The `x` in the permissions confirms it's executable. When run, it prints "Cleaning logs in logs/" and "Done!" (plus any files it removes).
		{{% /alert %}}

		## Part 9: Useful shortcuts and tips

		### Tab completion
		@@ -684,4 +725,4 @@ Now that you're comfortable with the command line:

		## Quick reference

		See the [Bash Cheatsheet](/reference/bash-cheatsheet/) for a compact command reference.
		For more advanced shell customization, see [Shell Setup](/quickstart/shell-setup/).

content/en/tutorials/slurm/index.md

+70 −2

Original line number	Diff line number	Diff line
		@@ -309,7 +309,9 @@ srun python train.py
		echo "End time: $(date)"
		```

		### Understanding module load
		### Understanding the module system

		DAIC uses an environment modules system to manage software. Instead of having every version of every library available at once (which would cause conflicts), software is organized into modules that you load when needed.

		The `module` commands set up your software environment:

		@@ -319,7 +321,23 @@ module load 2025/gpu # Load the 2025 GPU software stack
		module load cuda/12.9 # Load CUDA 12.9
		```

		Different software requires different modules. Use `module avail` to see what's available and `module load <name>` to load them.
		Why use modules?

		- Version control: Run `module load python/3.11` today, `python/3.12` tomorrow
		- Avoid conflicts: Different projects can use different library versions
		- Clean environment: `module purge` gives you a fresh start

		Common module commands:

		\| Command \| Purpose \|
		\|---------\|---------\|
		\| `module avail` \| List all available modules \|
		\| `module avail cuda` \| List modules matching "cuda" \|
		\| `module list` \| Show currently loaded modules \|
		\| `module load <name>` \| Load a module \|
		\| `module purge` \| Unload all modules \|

		For a complete guide, see [Loading Software](/howto/loading-software/).

		### Submit and monitor

		@@ -761,18 +779,68 @@ Try these on your own to solidify your understanding:
		### Exercise 1: Basic job submission
		Create and submit a job that prints your username, hostname, and current date. Check the output.

		{{% alert title="Check your work" color="info" %}}
		Your output file should contain something like:
		```
		netid01
		gpu15.ethernet.tudhpc
		Fri Mar 20 10:30:00 CET 2026
		```
		The hostname should be a compute node (not `daic01`).
		{{% /alert %}}

		### Exercise 2: GPU job
		Modify the basic job to request a GPU. Add `nvidia-smi` to verify the GPU is available.

		{{% alert title="Check your work" color="info" %}}
		Your output should include `nvidia-smi` output showing a GPU:
		```
		+-----------------------------------------------------------------------------+
		\| NVIDIA-SMI ... Driver Version: ... CUDA Version: ... \|
		\|-------------------------------+----------------------+----------------------+
		\| GPU Name ...
		```
		If you see "NVIDIA-SMI has failed", check that you requested a GPU with `--gres=gpu:1`.
		{{% /alert %}}

		### Exercise 3: Resource tuning
		Submit a job, then use `seff` to check its efficiency. Was your resource request appropriate?

		{{% alert title="Check your work" color="info" %}}
		Run `seff <jobid>` after your job completes. Good efficiency looks like:
		```
		CPU Efficiency: 70-95%
		Memory Efficiency: 50-90%
		```
		If efficiency is below 50%, reduce your request next time.
		{{% /alert %}}

		### Exercise 4: Job array
		Create a job array that runs 5 tasks. Each task should print its array task ID.

		{{% alert title="Check your work" color="info" %}}
		You should see 5 output files (e.g., `job_12345_1.out` through `job_12345_5.out`). Each should contain its task ID:
		```shell-session
		$ cat job_*_1.out
		Task ID: 1
		$ cat job_*_5.out
		Task ID: 5
		```
		{{% /alert %}}

		### Exercise 5: Dependencies
		Submit two jobs where the second depends on the first completing successfully.

		{{% alert title="Check your work" color="info" %}}
		After submitting both jobs, `squeue -u $USER` should show:
		```
		JOBID PARTITION NAME USER ST REASON
		12346 all second netid01 PD (Dependency)
		12345 all first netid01 R
		```
		The second job shows `(Dependency)` while waiting. After the first completes, the second starts automatically.
		{{% /alert %}}

		## Next steps

		- [Apptainer Tutorial](/tutorials/apptainer/) - Package your environment in containers

content/en/tutorials/vim/index.md

+36 −0

Original line number	Diff line number	Diff line
		@@ -765,21 +765,57 @@ Practice these tasks to build muscle memory:
		### Exercise 1: Basic editing
		Create a new file, add three lines of text, save and quit. Then reopen it and verify your changes.

		{{% alert title="Check your work" color="info" %}}
		After `:wq`, verify with:
		```shell-session
		$ cat myfile.txt
		line one
		line two
		line three
		```
		If the file is empty, you may have quit without saving (`:q!` instead of `:wq`).
		{{% /alert %}}

		### Exercise 2: Navigation
		Open a Python file and practice: go to end (`G`), go to beginning (`gg`), jump by words (`w`, `b`), go to specific line (`10G`).

		{{% alert title="Check your work" color="info" %}}
		Check your position with `:set number` to show line numbers. After `G`, you should be on the last line. After `gg`, you should be on line 1. After `10G`, you should be on line 10.
		{{% /alert %}}

		### Exercise 3: Delete and undo
		Open a file, delete a line (`dd`), undo (`u`), delete a word (`dw`), undo again.

		{{% alert title="Check your work" color="info" %}}
		After each `u`, the deleted content should reappear. If undo doesn't work, make sure you're in Normal mode (press `Esc` first).
		{{% /alert %}}

		### Exercise 4: Copy and paste
		Copy a line (`yy`), move to a new location, paste it (`p`). Then try with multiple lines using `V`.

		{{% alert title="Check your work" color="info" %}}
		After `yy` and `p`, you should see the same line duplicated. With `V`, select multiple lines (they highlight), then `y` to copy and `p` to paste them elsewhere.
		{{% /alert %}}

		### Exercise 5: Search and replace
		Open a file and search for a word (`/word`). Then replace all occurrences of one word with another (`:%s/old/new/g`).

		{{% alert title="Check your work" color="info" %}}
		After `/word` and pressing Enter, the cursor jumps to the first match. Press `n` to see subsequent matches. After `:%s/old/new/g`, Vim reports how many substitutions were made (e.g., "5 substitutions on 3 lines").
		{{% /alert %}}

		### Exercise 6: Real task
		Edit a SLURM batch script: change the time limit, add a new `#SBATCH` directive, and save.

		{{% alert title="Check your work" color="info" %}}
		After saving, verify your changes:
		```shell-session
		$ grep -E "time\|gres" submit.sh
		#SBATCH --time=4:00:00
		#SBATCH --gres=gpu:1
		```
		{{% /alert %}}

		## Keep learning

		- Run `vimtutor` for a 30-minute interactive tutorial