Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
compute_resources [2019/04/03 21:26]
ktm5j [SLURM Nodes]
compute_resources [2021/01/13 15:08]
pgh5a
Line 1: Line 1:
-====== General Purpose Nodes ======+==== Computing Resources ​==== 
 +The CS Dept. deploys general purpose compute and GPU compute servers, as well as specialized GPU servers. We also have database, [[nx_lab|NX/​Nomachine (graphical desktop)]], and [[windows_server|Windows servers]]. This section describes the general use and research use servers (see other sections for database and other server information).
  
-The ''​%%power%%'' ​and ''​%%portal%%''​ nodes are meant to be used as general ​purpose systems that anyone can log into.  Feel free to use these servers to codecompiletest etchowever these are //not// meant for running jobs that will tie up resources ​Computationally expensive jobs should be run on the SLURM nodes.+Per CS Dept. policy, non interactive login CS Dept. servers are placed into the SLURM job scheduling queues ​and are available for general use. Alsoper that policyif a user in a research group that originally purchased the hardware requires exclusive use of that hardware, they can be given a reservation for that exclusive use for a specified timeOtherwise, the systems ​are open for use by anyone with a CS accountThis policy was approved by the CS Dept. Computing Committee comprised of CS Faculty.
  
-^ Hostname ^ Node Count ^ GPUs ^ Memory ^ CPU Type ^ CPUs ^ Sockets ^ Cores/​Socket ^ Threads/​Core ^ +This policy allows servers to be used when the project group is not using them. So instead of sitting idle and consuming ​power and cooling, other Dept. users can benefit from the use of these systems.
-power[1-6] | 6 | 0 | 96 | AMD | 16 | 2 | 8 | 2 |  +
-| portal[01-04] | 4 | 0 | 132 | Intel | 16 | 2 | 8 | 2 |+
  
-====== gpusrv nodes ====== +=== portal load balanced servers ​=== 
-The gpusrv* ​nodes are general purpose, interactive ​login nodes that contain a variety of GPU cards, and are intended ​for code development and testingLong running processes ​are discouraged, and are better suited ​to one of the SLURM controlled GPU nodes.+The ''​%%portal%%'' ​nodes are general purpose ​servers into which anyone can login. They are available ​for general useThey are also the "jump off" point for off Grounds connections to the CS network. The servers are in a load balanced cluster, and are accessed through //​ssh// ​to ''​%%portal.cs.virginia.edu%%''​
  
-^ Hostname ^ GPUs ^ Memory ^ CPU Type ^ CPUs ^ Sockets ^ Cores/Socket ^ Threads/Core ^ +Use these servers to code, compile, test, etc.. However these are //not// meant for long running processes that will tie up resources. ​ Computationally expensive processes should be run on other servers listed below. ​
-| gpusrv01 | 2 | 32 | Intel | 12 | 1 | 6 | 2 | +
-| gpusrv02 | 2 | 64 | Intel | 12 | 1 | 6 | 2 | +
-| gpusrv03 | 2 | 64 | Intel | 12 | 1 | 6 | 2 | +
-| gpusrv04 | 2 | 64 | Intel | 12 | 1 | 6 | 2 | +
-| gpusrv05 | 2 | 64 | Intel | 8 | 1 | 4 | 2 | +
-| gpusrv06 | 2 | 64 | Intel | 8 | 1 | 4 | 2 |+
  
-====== SLURM Nodes ======+**All GPU, Memory, CPU, etc. counts are per node**.
  
-//See the main article on [[compute_slurm|Slurm]] for more information.//​+^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/Core ^ Total Cores ^ 
 +portal[01-04| 132 | Intel | 1 | 8 | 2 | 16 |
  
-The following nodes are available ​via SLURM +=== GPU servers === 
 +The gpusrv* servers ​are general purpose servers that contain GPUs into which anyone can login (via '​ssh'​). They are intended for code development,​ testing, and short computations. Long running computations are discouraged,​ and are better suited to one of the GPU servers controlled by the job scheduler (see the next section)
  
-^ Hostname ^ Node Count ^ GPUs ^ Memory ​CPU Type ^ CPUs ^ Sockets ^ Cores/​Socket ^ Threads/​Core ^ GPU Model ^ +**All GPU, MemoryCPU, etc. counts are per node**.
-| hermes[1-4] | 4 | 0 | 256 | AMD | 64 | 4 | 8 | 2 |  +
-| artemis[1-3] | 7 | 0 | 128 | AMD | 32 | 2 | 8 | 2 |  +
-| artemis[4-7] | 7 | 1 | 128 | AMD | 32 | 2 | 8 | 2 |  +
-| slurm[1-5] | 5 | 0 | 512 | Intel | 24 | 1 | 12 | 2 |  +
-| nibbler[1-4] | 4 | 0 | 64 | Intel | 20 | 1 | 10 | 2 |  +
-| trillian[1-3] | 3 | 0 | 256 | AMD | 64 | 4 | 8 | 2 |  +
-| granger[1-6] | 6 | 0 | 64 | Intel | 40 | 2 | 10 | 2 |  +
-| granger[7-8] | 2 | 0 | 64 | Intel | 16 | 2 | 4 | 2 |  +
-| ai0[1-6] | 6 | 4 | 64 | Intel | 32 | 2 | 8 | 2 | GTX1080Ti | +
-| lynx[01-07] | 7 | 4 | 64 | Intel | 32 | 4 | 8 | 2 |P100| +
-| lynx[08-12] | 5 | 0 | 64 | Intel | 32 | 4 | 8 | 2 | +
-| ristretto[01-04] | 4 | 8 | 128 | Intel | 12 | 2 | 6 | 1 |GTX1080Ti| +
-| affogato[01-15] | 16 | 0 | 128 | Intel | 16 | 2 | 8 | 2 |+
  
-====== SLURM Queues ======+^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/​Core ^ Total Cores ^ GPUs ^ GPU Type ^ 
 +| gpusrv[01-08] | 256 | Intel | 2 | 10 | 2 | 20 | 4 | Nvidia RTX 2080Ti |
  
-//See our [[compute_slurm#​partitions|main article on Slurm]] for more information on using partitions//​+=== Nodes controlled by the SLURM Job Scheduler ===
  
-^ Queue ^ Nodes ^ +//See our main article on [[compute_slurm|Slurm]] for more information.//​
-main | hermes[1-4], artemis[1-3], slurm[1-5], nibbler[1-4],​ trillian[1-3],​ granger[1-6],​ granger[7-8],​ lynx[08-12]| +
-| intel | artemis7,​slurm[1-5],​granger[1-6],​granger[7-8],​nibbler[1-4],​ai0[1-6],​ristretto[01-04],​lynx[01-12] | +
-| amd | hermes[1-4],​artemis[1-6],​trillian[1-3] | +
-| share | lynx[01-12] | +
-| gpu | artemis[4-7],​ai0[1-6],​lynx[01-07],​ristretto[01-04] |+
  
 +These servers are available by submitting a job through the SLURM job scheduler. ​
 +
 +**All GPU, Memory, CPU, etc. counts are per node**.
 + 
 +^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/​Core ^ Total Cores ^ GPUs ^ GPU Type ^
 +
 +| affogato[01-15] | 128 | Intel | 2 | 8 | 2 | 32 | 0 |
 +| affogato[11-15] | 128 | Intel | 2 | 8 | 2 | 32 | 4 | Nvidia GTX1080Ti|
 +| ai[01-06] | 64 | Intel | 2 | 8 | 2 | 32 | 4 | Nvidia GTX1080Ti|
 +| cheetah01 | 256 | AMD | 2 | 8 | 2 | 32 | 4 | Nvidia A100 |
 +| cheetah[02-03] | 1024(2) | Intel | 2 | 18 | 2 | 72 | 2 | Nvidia RTX 2080Ti |
 +| cortado[01-10] | 512 | Intel | 2 | 12 | 2 | 48 | 0 |
 +| falcon[1-10] | 128 | Intel | 2 | 6 | 2 | 24 | 0 |
 +| hermes[1-4] | 256 | AMD | 4 | 16 | 1 | 64 | 0 | 
 +| lynx[01-04] | 64 | Intel | 4 | 8 | 2 | 32 | 4 | Nvidia GTX1080Ti|
 +| lynx[05-07] | 64 | Intel | 4 | 8 | 2 | 32 | 4 | Nvidia P100|
 +| lynx[08-09] | 64 | Intel | 4 | 8 | 2 | 32 | 3 | ATI FirePro W9100|
 +| lynx[10] | 64 | Intel | 4 | 8 | 2 | 32 | 0 | Altera FPGA|
 +| lynx[11-12] | 64 | Intel | 4 | 8 | 2 | 32 | 0 |
 +| nibbler[1-4] | 64 | Intel | 2 | 10 | 2 | 20 | 0 | 
 +| optane01 | 512(1) | Intel | 2 | 16 | 2 | 64 | 0 |
 +| ristretto[01-04] | 128 | Intel | 2 | 6 | 1 | 12 | 8 | Nvidia GTX1080Ti|
 +| slurm[1-5] | 512 | Intel | 2 | 12 | 2 | 24 | 0 |
 +| trillian[1-3] | 256 | AMD | 4 | 8 | 2 | 64 | 0 |  ​
 +
 +(1) Intel Optane memory
 +
 +(2) In addition to 1TB of DDR4 RAM, these servers also house a 900GB Optane NVMe SSD and a 1.6TB NVMe regular SSD drive 
 +
 +=== Job Scheduler Queues ===
 +
 +//See our [[compute_slurm#​partitions|main article on Slurm]] for more information on using queues ("​partitions"​)//​
 +
 +^ Queue ^ Nodes ^
 +| main | cortado[01-10],​ falcon[1-10],​ granger[1-8],​ hermes[1-4],​ lynx[10-12],​ nibbler[1-4],​ optane01, slurm[1-5], trillian[1-3] |
 +| gpu | affogato[11-15],​ ai[01-06], cheetah[01-03],​ lynx[01-09],​ ristretto[01-04] |
  
 +=== Group specific computing resources ===
 +Several groups in CS deploy servers that are used exclusively by that group. Approximately 40 servers are deployed in this fashion, ranging from traditional CPU servers to specialized servers containing GPU accelerators.
  
  
  • compute_resources.txt
  • Last modified: 2021/02/16 20:52
  • by pgh5a