Site Tools


staff-wiki:puppet_csnvidia

CS nVidia

Source for the nVidia puppet module can be found in gitlab: http://coresrv03.cs.virginia.edu:8081/puppet/puppet-csnvidia (scroll to bottom of page for README.md)

The csnvidia module will manage the installation of nvidia drivers on CentOS servers.

Puppet Concepts

For a host to have the nvidia applied, we must create a host manifest on the puppet server (if one doesn't already exist) and include csnvidia from the manifest

Host manifests for the production environment are located on coresrv04 in the directory /etc/puppetlabs/code/environments/production/manifests. Contents of this directory are tracked via a git repository in gitlab located here.

From the manifests directory you will see a listing of files named after servers or groups of servers:

[root@coresrv04 manifests]# pwd
/etc/puppetlabs/code/environments/production/manifests
[root@coresrv04 manifests]# ls
ai.pp         bigdata.pp    granger.pp     qcuda.pp                     samba.pp
andromeda.pp  bigvision.pp  labsrv.pp      Readme-Default-Linux.pp.txt  site.pp
applecake.pp  deep.pp       lynx.pp        Readme.md                    trillian.pp
automata.pp   doppio.pp     prediction.pp  recognition.pp               vision.pp
[root@coresrv04 manifests]# 

You will find two files with more information on host manifests, Readme.md and Readme-Default-Linux.pp.txt. Readme-Default-Linux.pp.txt contains the current default manifest for Linux servers.

Applying the Module

Log into coresrv04 and cd into the host manifests directory and create a file for the host(s) you need to manage:

[root@coresrv04 ~]# cd /etc/puppetlabs/code/environments/production/manifests
[root@coresrv04 manifests]# emacs -nw hostname.pp

For a single host, write the following file

node hostname {
  include csnfs
  include cslinux
  include csnvidia
}

For a group of hosts where all hosts share the same name (with trailing numbers) use the following entry that uses regex matching:

node /hostname[0-9][0-9]/ {
  include csnfs
  include cslinux
  include csnvidia
}

After adding the host manifest, the next time our target host runs the puppet agent, the csnvidia class will be applied. You can either wait for the server to do this automatically, or we can log into the server and run puppet agent -t from a root shell.

Final Steps

The first time a server has the csnvidia class applied, it will need to be rebooted once before the drivers will work. This is due to the default nouveau kernel module conflicting with the nvidia kernel module. You will not be able to load the nvidia modules until the nouveau module is unloaded.

Once you are done creating the manifest, add the file to git and push your changes to gitlab:

[root@coresrv04 manifests]# git add ./applecake.pp
[root@coresrv04 manifests]# git commit -m "Host entry for applecake -ktm5j"
[root@coresrv04 manifests]# git push
Everything up-to-date

Git will tell you things you may not care about. If you run the commands above and receive scary looking output, it may not be an error.

staff-wiki/puppet_csnvidia.txt · Last modified: 2023/08/29 19:56 by 127.0.0.1