compute_resources
Differences
This shows you the differences between two versions of the page.
compute_resources [2021/08/12 14:18] – pgh5a | compute_resources [2023/09/29 18:41] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ==== Computing Resources ==== | ||
+ | The CS Dept. deploys general purpose compute and GPU compute servers, as well as specialized GPU servers. We also have database, [[nx_lab|NX/ | ||
+ | |||
+ | === portal load balanced servers === | ||
+ | The //portal// servers are general purpose servers into which anyone can login. They are available for general use. They are also the "jump off" point for off Grounds connections to the CS network. The servers are in a load balanced cluster, and are accessed through //ssh// to '' | ||
+ | |||
+ | Use these servers to code, compile, test, etc.. However these are //not// meant for long running processes that will use excessive resources. | ||
+ | |||
+ | //(all GPU, Memory, CPU, etc. counts are per node)//. | ||
+ | ^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/ | ||
+ | | portal[01-10] | 132-256 | Intel | 1 | 8 | 2 | 16 | Ubuntu | | ||
+ | |||
+ | === GPU servers === | ||
+ | The gpusrv* servers are general purpose servers that contain GPUs into which anyone can login (via ' | ||
+ | |||
+ | //(all GPU, Memory, CPU, etc. counts are per node)//. | ||
+ | |||
+ | ^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/ | ||
+ | | gpusrv[01-08] | 256 | Intel | 1 | 10 | 2 | 20 | 4 | Nvidia RTX 2080Ti | 11 | | ||
+ | | gpusrv[09-16] | 512 | Intel | 2 | 10 | 2 | 40 | 4 | Nvidia RTX 4000 | 8 | | ||
+ | | gpusrv[17-18] | 128 | Intel | 1 | 10 | 2 | 20 | 4 | Nvidia RTX 2080Ti | 11 | | ||
+ | |||
+ | === Docker servers === | ||
+ | The docker servers allow users to instantiate a docker container without the need for super-user (root) privileges. Once a user uses ' | ||
+ | |||
+ | //(all GPU, Memory, CPU, etc. counts are per node)//. | ||
+ | |||
+ | ^ Hostname ^ Memory (GB) ^ CPU Type ^ CPUs ^ Cores/CPU ^ Threads/ | ||
+ | | docker01 | 256 | Intel | 2 | 10 | 2 | 40 | | ||
+ | | docker02 | 64 | Intel | 1 | 4 | 2 | 8 | | ||
+ | | docker03 | 8 | Intel | 1 | 2 | 2 | 4 | | ||
+ | |||
+ | === Nodes controlled by the SLURM Job Scheduler === | ||
+ | |||
+ | //See our main article on [[compute_slurm|Slurm]] for more information.// | ||
+ | |||
+ | These servers are available by submitting a job through the SLURM job scheduler. They are not available for direct logins via ' | ||
+ | |||
+ | Per CS Dept. policy, servers are placed into the SLURM job scheduling queues and are available for general use. Also, per that policy, if a user in a research group that originally purchased the hardware requires exclusive use of that hardware, they can be given a reservation for that exclusive use for a specified time. Otherwise, the systems are open for use by anyone with a CS account. This policy was approved by the CS Dept. Computing Committee comprised of CS Faculty. | ||
+ | |||
+ | This policy allows servers to be used when the project group is not using them. So instead of sitting idle and consuming power and cooling, other Dept. users can benefit from the use of these systems. | ||
+ | |||
+ | //(all GPU, Memory, CPU, etc. counts are per node)//. | ||
+ | //This list may not be up to date. Use the ' | ||
+ | |||
+ | ^ Hostname | ||
+ | | adriatic[01-06] | 1024 | Intel | 2 | 8 | 2 | 32 | 4 | Nvidia RTX 4000 | 8 | Ubuntu Server 22.04 | | ||
+ | | affogato[01-10] | 128 | Intel | 2 | 8 | 2 | 32 | 0 | ||
+ | | affogato[11-15] | 128 | Intel | 2 | 8 | 2 | 32 | 4 | Nvidia GTX1080Ti | ||
+ | | ai[01-06] | ||
+ | | ai[07-10] | ||
+ | | cheetah01 | ||
+ | | cheetah[02-03] | ||
+ | | cheetah04 | ||
+ | | cortado[01-10] | ||
+ | | doppio[01-05] | ||
+ | | epona | 64 | Intel | 1 | 4 | 2 | 8 | 0 | ||
+ | | heartpiece | ||
+ | | hydro | 256 | Intel | 2 | 16 | 2 | 64 | 0 | ||
+ | | jaguar01 | ||
+ | | jaguar02 | ||
+ | | jaguar03 | ||
+ | | jaguar04 | ||
+ | | jaguar05 | ||
+ | | jaguar06 | ||
+ | | lotus | 256 | Intel | 2 | 20 | 2 | 80 | 8 | Nvidia RTX 6000 | 24 | Ubuntu Server 22.04 | | ||
+ | | lynx[01-04] | ||
+ | | lynx[05-07] | ||
+ | | lynx[08-09] | ||
+ | | lynx10 | ||
+ | | lynx[11, | ||
+ | | optane01 | ||
+ | | panther01 | ||
+ | | pegasusboots | ||
+ | | ristretto[01-04]| 128 | Intel | 2 | 6 | 1 | 12 | 8 | Nvidia GTX1080Ti | ||
+ | | sds[01-02] | ||
+ | | slurm[1-5](3) | ||
+ | | titanx[01-03] | ||
+ | | titanx[04-06] | ||
+ | |||
+ | (1) 512GB Intel Optane memory, 512 DDR4 memory | ||
+ | |||
+ | (2) In addition to 1TB of DDR4 RAM, these servers also house a 900GB Optane NVMe SSD and a 1.6TB NVMe regular SSD drive | ||
+ | |||
+ | (3) All users have ' | ||
+ | |||
+ | (4) Scaled to 96GB by pairing GPUs with NVLink technology that provides a maximum bi-directional bandwidth of 112GB/s between the paired A40s | ||
+ | |||
+ | (5) Scaled to 320GB by aggregating all four GPUs with NVLink technology | ||
+ | |||
+ | === Job Scheduler Queues === | ||
+ | |||
+ | //See our [[compute_slurm# | ||
+ | |||
+ | The main partition contains nodes without GPUs, the gpu partition contains nodes with gpus, the nolom partition has nodes with no gpus and no time limit on jobs, and the gnolim partition has nodes with GPUs but no time limit on jobs. The time limit on jobs in the main and gpu partitions is shown using the ' | ||
+ | |||
+ | ^ Queue ^ Nodes ^ | ||
+ | | main | cortado[01-10], | ||
+ | | gpu | adriatic[01-06], | ||
+ | | nolim | doppio[01-05], | ||
+ | | gnolim | ai[01-10], titanx[01-06] | | ||
+ | |||
+ | === Group specific computing resources === | ||
+ | Several groups in CS deploy servers that are used exclusively by that group. Approximately 40 servers are deployed in this fashion, ranging from traditional CPU servers to specialized servers containing GPU accelerators. | ||
+ | |||
+ | === Docker Server === | ||
+ | |||
+ | //See our main article on [[compute_docker|Docker]] for more information.// | ||
+ | |||
+ | The Docker server on hostname ' | ||
+ | |||
+ | |||
+ | |||
+ | |||
compute_resources.txt · Last modified: 2023/09/29 18:41 by 127.0.0.1