Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
compute_resources [2021/01/13 14:56]
pgh5a
compute_resources [2021/09/21 16:13] (current)
Line 1: Line 1:
 ==== Computing Resources ==== ==== Computing Resources ====
-The CS Dept. deploys general purpose compute and GPU compute servers, as well as specialized GPU servers. We also have database, [[nx_lab|NX/​Nomachine (graphical desktop)]], and [[windows_server|Windows ​servers]]. This section describes the general use and research use servers (see other sections for database and other server information)+The CS Dept. deploys general purpose compute and GPU compute servers, as well as specialized GPU servers. We also have database, [[nx_lab|NX/​Nomachine (graphical ​Linux desktop)]], and [[windows_server|Windows ​desktop server]]. This section describes the general use and research use servers (see other sections for database and other server information).
- +
-Per CS Dept. policy, non interactive login CS Dept. servers are placed into the SLURM job scheduling queues and are available for general use. Also, per that policy, if a user in a research group that originally purchased the hardware requires exclusive use of that hardware, they can be given a reservation for that exclusive use for a specified time. Otherwise, the systems are open for use by anyone with a CS account. This policy was approved by the CS Dept. Computing Committee comprised of CS Faculty. +
- +
-This policy allows servers to be used when the project group is not using them. So instead of sitting idle and consuming power and cooling, other Dept. users can benefit from the use of these systems.+
  
 === portal load balanced servers === === portal load balanced servers ===
-The ''​%%portal%%''​ nodes are general purpose servers into which anyone can login. They are available for general use. They are also the "jump off" point for off Grounds connections to the CS network. The servers are in a load balanced cluster, and are accessed through //ssh// to ''​%%portal.cs.virginia.edu%%''​. ​+The //portal// servers ​are general purpose servers into which anyone can login. They are available for general use. They are also the "jump off" point for off Grounds connections to the CS network. The servers are in a load balanced cluster, and are accessed through //ssh// to ''​%%portal.cs.virginia.edu%%''​. ​See: [[compute_portal|The portal cluster]]
  
 Use these servers to code, compile, test, etc.. However these are //not// meant for long running processes that will tie up resources. ​ Computationally expensive processes should be run on other servers listed below. ​ Use these servers to code, compile, test, etc.. However these are //not// meant for long running processes that will tie up resources. ​ Computationally expensive processes should be run on other servers listed below. ​
Line 21: Line 17:
 **All GPU, Memory, CPU, etc. counts are per node**. **All GPU, Memory, CPU, etc. counts are per node**.
  
-^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/​Core ^ Total Cores ^ GPUs ^ GPU Type ^ +^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/​Core ^ Total Cores ^ GPUs ^ GPU Type ^ GPU RAM (GB) 
-| gpusrv[01-08] | 256 | Intel | | 10 | 2 | 20 | 4 | Nvidia RTX 2080Ti |+| gpusrv[01-08] | 256 | Intel | | 10 | 2 | 20 | 4 | Nvidia RTX 2080Ti ​| 11 | 
 +| gpusrv[09-16] | 512 | Intel | 2 | 10 | 2 | 40 | 4 | Nvidia RTX 4000   | 8  ​|
  
 === Nodes controlled by the SLURM Job Scheduler === === Nodes controlled by the SLURM Job Scheduler ===
Line 28: Line 25:
 //See our main article on [[compute_slurm|Slurm]] for more information.//​ //See our main article on [[compute_slurm|Slurm]] for more information.//​
  
-These servers are available by submitting a job through the SLURM job scheduler. ​+These servers are available by submitting a job through the SLURM job scheduler. They are not available for direct logins via '​ssh'​. However, users can login to these servers without a job script using the 'srun -i' direct login command. See the SLURM section for more information. 
 + 
 +Per CS Dept. policy, servers are placed into the SLURM job scheduling queues and are available for general use. Also, per that policy, if a user in a research group that originally purchased the hardware requires exclusive use of that hardware, they can be given a reservation for that exclusive use for a specified time. Otherwise, the systems are open for use by anyone with a CS account. This policy was approved by the CS Dept. Computing Committee comprised of CS Faculty. 
 + 
 +This policy allows servers to be used when the project group is not using them. So instead of sitting idle and consuming power and cooling, other Dept. users can benefit from the use of these systems.
  
 **All GPU, Memory, CPU, etc. counts are per node**. **All GPU, Memory, CPU, etc. counts are per node**.
    
-^ Hostname ^ Memory ​(GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/Core ^ Total Cores ^ GPUs ^ GPU Type ^ +^ Hostname ^ Mem (GB) ^ CPU ^ #CPUs ^ Cores ^ Threads ^ Total Threads ​^ GPUs ^ GPU Type ^ GPU RAM (GB) 
- +| adriatic[01-06] | 1024   | Intel | 2 | 8  | 2 | 32 | 4 | Nvidia RTX 4000    | 8  | 
-| affogato[01-15] | 128 | Intel | 2 | 8 | 2 | 32 | 0 | +| affogato[01-15] | 128    | Intel | 2 | 8  | 2 | 32 | 0 | 
-| affogato[11-15] | 128 | Intel | 2 | 8 | 2 | 32 | 4 | Nvidia GTX1080Ti| +| affogato[11-15] | 128    | Intel | 2 | 8  | 2 | 32 | 4 | Nvidia GTX1080Ti ​  | 11 
-| ai[01-06] | 64 | Intel | 2 | 8 | 2 | 32 | 4 | Nvidia GTX1080Ti| +| ai[01-06] ​      ​| 64     ​| Intel | 2 | 8  | 2 | 32 | 4 | Nvidia GTX1080Ti ​  | 11 
-| cheetah01 | 256 | AMD | 2 | 8 | 2 | 32 | 4 | Nvidia A100 | +| cheetah01 ​      ​| 256    | AMD   ​| 2 | 8  | 2 | 32 | 4 | Nvidia A100        | 40 
-| cheetah[02-03] | 1024(2) | Intel | 2 | 18 | 2 | 72 | 2 | Nvidia RTX 2080Ti | +| cheetah[02-03] ​ | 1024(2)| Intel | 2 | 18 | 2 | 72 | 2 | Nvidia RTX 2080Ti ​ | 11 
-| cortado[01-10] | 512 | Intel | 2 | 12 | 2 | 48 | 0 | +| cortado[01-10] ​ | 512    | Intel | 2 | 12 | 2 | 48 | 0 | 
-| hermes[1-4] | 256 | AMD | 4 | 16 | 1 | 64 | 0 |  +| doppio[01-05] ​  | 128    | Intel | 2 | 16 | 2 | 64 | 0 | 
-| lynx[01-04] | 64 | Intel | 4 | 8 | 2 | 32 | 4 | Nvidia GTX1080Ti| +| falcon[1-10] ​   | 128    | Intel | 2 | 6  | 2 | 24 | 0 | 
-| lynx[05-07] | 64 | Intel | 4 | 8 | 2 | 32 | 4 | Nvidia P100| +| hermes[1-4] ​    ​| 256    | AMD   ​| 4 | 16 | 1 | 64 | 0 |  
-| lynx[08-09] | 64 | Intel | 4 | 8 | 2 | 32 | 3 | ATI FirePro W9100| +| lotus           | 256    | Intel | 2 | 20 | 2 | 80 | 8 | Nvidia RTX 6000    | 24 
-| lynx[10] | 64 | Intel | 4 | 8 | 2 | 32 | 0 | Altera FPGA| +| lynx[01-04] ​    ​| 64     ​| Intel | 4 | 8  | 2 | 32 | 4 | Nvidia GTX1080Ti ​  | 11 
-| lynx[11-12] | 64 | Intel | 4 | 8 | 2 | 32 | 0 | +| lynx[05-07] ​    ​| 64     ​| Intel | 4 | 8  | 2 | 32 | 4 | Nvidia P100        | 16 
-| nibbler[1-4] | 64 | Intel | 2 | 10 | 2 | 20 | 0 |  +| lynx[08-09] ​    ​| 64     ​| Intel | 4 | 8  | 2 | 32 | 3 | ATI FirePro W9100  | 32 
-| optane01 | 512(1) | Intel | 2 | 16 | 2 | 64 | 0 | +| lynx[10] ​       | 64     ​| Intel | 4 | 8  | 2 | 32 | 0 | Altera FPGA        
-| ristretto[01-04] | 128 | Intel | 2 | 6 | 1 | 12 | 8 | Nvidia GTX1080Ti| +| lynx[11-12] ​    ​| 64     ​| Intel | 4 | 8  | 2 | 32 | 0 | 
-| slurm[1-5] | 512 | Intel | 2 | 12 | 2 | 24 | 0 | +| nibbler[1-4] ​   | 64     ​| Intel | 2 | 10 | 2 | 20 | 0 |  
-| trillian[1-3] | 256 | AMD | 4 | 8 | 2 | 64 | 0 |  ​+| optane01 ​       | 512(1) | Intel | 2 | 16 | 2 | 64 | 0 | 
 +| ristretto[01-04]| 128    | Intel | 2 | 6  | 1 | 12 | 8 | Nvidia GTX1080Ti ​  | 11 
 +| slurm[1-5](3)   | 512    | Intel | 2 | 12 | 2 | 24 | 0 | 
 +| trillian[1-3] ​  ​| 256    | AMD   ​| 4 | 8  | 2 | 64 | 0 |  ​
  
 (1) Intel Optane memory (1) Intel Optane memory
  
 (2) In addition to 1TB of DDR4 RAM, these servers also house a 900GB Optane NVMe SSD and a 1.6TB NVMe regular SSD drive  (2) In addition to 1TB of DDR4 RAM, these servers also house a 900GB Optane NVMe SSD and a 1.6TB NVMe regular SSD drive 
 +
 +(3) All users have '​sudo'​ privileges to execute /​usr/​bin/​docker on these nodes
  
 === Job Scheduler Queues === === Job Scheduler Queues ===
Line 61: Line 67:
  
 ^ Queue ^ Nodes ^ ^ Queue ^ Nodes ^
-| main | cortado[01-10],​ granger[1-8],​ hermes[1-4],​ lynx[10-12], nibbler[1-4], optane01, slurm[1-5], trillian[1-3] | +| main | cortado[01-10], doppio[01-05],​ falcon[1-10], granger[1-8],​ hermes[1-4],​ lynx[10-12],​ optane01, slurm[1-5], trillian[1-3] | 
-| gpu | affogato[11-15],​ ai[01-06], cheetah[01-03],​ lynx[01-09],​ ristretto[01-04] | +| gpu | adriatic[01-06], ​affogato[11-15],​ ai[01-06], cheetah[01-03], lotus, lynx[01-09],​ ristretto[01-04] |
  
 +=== Group specific computing resources ===
 +Several groups in CS deploy servers that are used exclusively by that group. Approximately 40 servers are deployed in this fashion, ranging from traditional CPU servers to specialized servers containing GPU accelerators.
  
  
  • compute_resources.1610549762.txt.gz
  • Last modified: 2021/01/13 14:56
  • by pgh5a