summary refs log tree commit diff
path: root/topics/systems/linux/adding-nvidia-drivers-penguin2.gmi
diff options
context:
space:
mode:
Diffstat (limited to 'topics/systems/linux/adding-nvidia-drivers-penguin2.gmi')
-rw-r--r--topics/systems/linux/adding-nvidia-drivers-penguin2.gmi74
1 files changed, 74 insertions, 0 deletions
diff --git a/topics/systems/linux/adding-nvidia-drivers-penguin2.gmi b/topics/systems/linux/adding-nvidia-drivers-penguin2.gmi
new file mode 100644
index 0000000..81e721f
--- /dev/null
+++ b/topics/systems/linux/adding-nvidia-drivers-penguin2.gmi
@@ -0,0 +1,74 @@
+# GPU Graphics Driver Set-Up
+
+Tux02 has the Tesla K80 (GK210GL) GPU.  For machine learning, we want the official proprietary NVIDIA drivers.
+
+## Installation
+
+* Debian 12 moved NVIDIA driver into the non-free-firmware repo.  Add the following to "/etc/apt/sources.list" and run "sudo apt update":
+
+```
+deb http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
+```
+
+* Make sure the correct kernel headers are installed:
+
+```
+sudo apt install linux-headers-$(uname -r)
+```
+
+* Install "nvidia-tesla-470-driver"⁰ (The NVIDIA line-up of programmable "Tesla" devices, used primarily for simulations and large-scale calculations, also require separate driver packages to function correctly compared to the consumer-grade GeForce GPUs that are instead targeted for desktop and gaming usage)¹:
+
+```
+sudo apt purge 'nvidia-*'
+sudo apt install nvidia-tesla-470-driver
+```
+
+* Black list nouveau since it conflicts with NVIDIA's driver, and regenerate the initramfs "sudo update-initramfs -u":
+
+```
+echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
+echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf
+```
+
+* Reboot and test the nvidia drivers:
+
+```
+sudo reboot
+nvidia-smi
+
+# optional if you want to use nvidia-cuda-toolkit
+sudo apt install nvidia-cuda-dev nvidia-cuda-toolkit
+```
+
+## Issues
+
+Holding on reboot until I check in with the rest of team regarding some initd raspi hook:
+
+```
+update-initramfs: Generating /boot/initrd.img-6.1.0-9-amd64
+raspi-firmware: missing /boot/firmware, did you forget to mount it?
+run-parts: /etc/initramfs/post-update.d//z50-raspi-firmware exited with return code 1
+dpkg: error processing package initramfs-tools (--configure):
+ installed initramfs-tools package post-installation script subprocess returned error exit status 1
+Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.10+dfsg-1+deb12u1) ...
+Errors were encountered while processing:
+ initramfs-tools
+```
+
+Removed the firmware by running:
+
+```
+sudo apt purge raspi-firmware
+
+# Configure all packages that are installed but not yet fully configured
+sudo dpkg --configure -a
+
+# Update initramfs since we updated our drivers
+sudo update-initramfs -u
+```
+
+## References
+
+=> https://us.download.nvidia.com/XFree86/Linux-x86_64/470.129.06/README/supportedchips.html ⁰ Nvidia 470.129.06 Supported Chipsets.
+=> https://wiki.debian.org/NvidiaGraphicsDrivers#Tesla_Drivers ¹ Debian Tesla Drivers.
+=> https://wiki.debian.org/NvidiaGraphicsDrivers/Configuration ² NVIDIA Proprietary Driver: Configuration.