Table of Contents
CS nVidia
Source for the nVidia puppet module can be found in gitlab: http://coresrv03.cs.virginia.edu:8081/puppet/puppet-csnvidia (scroll to bottom of page for README.md)
The csnvidia module will manage the installation of nvidia drivers on CentOS servers.
Puppet Concepts
For a host to have the nvidia applied, we must create a host manifest on the puppet server (if one doesn't already exist) and include csnvidia
from the manifest
Host manifests for the production environment are located on coresrv04
in the directory /etc/puppetlabs/code/environments/production/manifests
. Contents of this directory are tracked via a git repository in gitlab located here.
From the manifests directory you will see a listing of files named after servers or groups of servers:
[root@coresrv04 manifests]# pwd /etc/puppetlabs/code/environments/production/manifests [root@coresrv04 manifests]# ls ai.pp bigdata.pp granger.pp qcuda.pp samba.pp andromeda.pp bigvision.pp labsrv.pp Readme-Default-Linux.pp.txt site.pp applecake.pp deep.pp lynx.pp Readme.md trillian.pp automata.pp doppio.pp prediction.pp recognition.pp vision.pp [root@coresrv04 manifests]#
You will find two files with more information on host manifests, Readme.md
and Readme-Default-Linux.pp.txt
. Readme-Default-Linux.pp.txt
contains the current default manifest for Linux servers.
Applying the Module
Log into coresrv04
and cd
into the host manifests directory and create a file for the host(s) you need to manage:
[root@coresrv04 ~]# cd /etc/puppetlabs/code/environments/production/manifests [root@coresrv04 manifests]# emacs -nw hostname.pp
For a single host, write the following file
node hostname { include csnfs include cslinux include csnvidia }
For a group of hosts where all hosts share the same name (with trailing numbers) use the following entry that uses regex matching:
node /hostname[0-9][0-9]/ { include csnfs include cslinux include csnvidia }
After adding the host manifest, the next time our target host runs the puppet agent, the csnvidia class will be applied. You can either wait for the server to do this automatically, or we can log into the server and run puppet agent -t
from a root shell.
Final Steps
The first time a server has the csnvidia class applied, it will need to be rebooted once before the drivers will work. This is due to the default nouveau
kernel module conflicting with the nvidia kernel module. You will not be able to load the nvidia modules until the nouveau
module is unloaded.
Once you are done creating the manifest, add the file to git and push your changes to gitlab:
[root@coresrv04 manifests]# git add ./applecake.pp [root@coresrv04 manifests]# git commit -m "Host entry for applecake -ktm5j" [root@coresrv04 manifests]# git push Everything up-to-date
Git will tell you things you may not care about. If you run the commands above and receive scary looking output, it may not be an error.