diff options
author | Pjotr Prins | 2023-12-16 17:04:22 -0600 |
---|---|---|
committer | Pjotr Prins | 2023-12-16 17:04:22 -0600 |
commit | fd75152f90b9af1b0c3ee296f518a968f9aedd0b (patch) | |
tree | 5ba8618c794722beafef48f614667d3653ce2a19 /general/help | |
parent | a93fa0de4647060f4ba3d87b8b61412b04b05340 (diff) | |
download | gn-docs-fd75152f90b9af1b0c3ee296f518a968f9aedd0b.tar.gz |
Updated facilities
Diffstat (limited to 'general/help')
-rw-r--r-- | general/help/facilities.md | 32 |
1 files changed, 16 insertions, 16 deletions
diff --git a/general/help/facilities.md b/general/help/facilities.md index 5b75a7e..eabb5c6 100644 --- a/general/help/facilities.md +++ b/general/help/facilities.md @@ -1,16 +1,16 @@ # Equipment -The core [GeneNetwork team](https://github.com/genenetwork/) and [Pangenome team](https://github.com/pangenome) maintains modern Linux servers and storage systems for genetic, genomic, pangenome, pangenetics and phenome analyses. +The core [GeneNetwork team](https://github.com/genenetwork/) and [Pangenome team](https://github.com/pangenome) at UTHSC maintains modern Linux servers and storage systems for genetic, genomic, pangenome, pangenetics and phenome analyses. Machines are located in in the main UTHSC machine room of the Lamar Alexander Building at UTHSC (Memphis TN campus). -The team has access to this space for upgrades and hardware maintenance. -We use remote racadm and/or ipmi to all machines out-of-band. +We have access to this space for upgrades and hardware maintenance. +We use remote racadm and/or ipmi to all machines for out-of-band maintenance. Issues and work packages are tracked through our 'tissue' [tracker board](https://issues.genenetwork.org/) and we use git repositories for documentation, issue tracking and planning (mostly public and some private repos available on request). We also run [continuous integration](https://ci.genenetwork.org/) and [continuous deployment](https://cd.genenetwork.org/) services online (CI and CD). -The computing facility has four computer racks dedicated to GeneNetwork-related work. +The computing facility has four computer racks dedicated to GeneNetwork-related work and pangenomics. Each rack has a mix of Dell PowerEdge servers (from a few older low-end R610s, R6515, and two R7425 AMD Epyc 64-core 256GB RAM systems - tux01 and tux02 - running the GeneNetwork web services). -We also support several more experimental systems, including a 40-core R7425 system with 196 GB RAM and 2x NVIDIA V100 GPU (tux03), and one Penguin Computing Relion 2600GT systems (Penguin2) with NVIDIA Tesla K80 GPU used for software development and to serve outside-facing less secure R/shiny and Python services that run in isolated containers. Effectively, we have three outward facing servers that are fully used by the GeneNetwork team with a total of 64+64+40+28 = 196 real cores. -In 2023 we added upgrades to tux01 and tux02 -- tux04 and tux05 resp. --- using the latest AMD Genoa EPYC processors adding a total of 96 real CPU cores running at 4GHz. These two machines have 768Gb RAM each. +We also support several experimental systems, including a 40-core R7425 system with 196 GB RAM and 2x NVIDIA V100 GPU (tux03), and one Penguin Computing Relion 2600GT systems (Penguin2) with NVIDIA Tesla K80 GPU used for software development and to serve outside-facing less secure R/shiny and Python services that run in isolated containers. Effectively, we have three outward facing servers that are fully utilized by the GeneNetwork team with a total of 64+64+40+28 = 196 real cores. +In 2023 we added two machines to upgrade from tux01 and tux02 -- named tux04 and tux05 resp. --- that have the latest AMD Genoa EPYC processors adding a total of 96 real CPU cores running at 4GHz. These two machines have 768Gb RAM each. ## Octopus HPC cluster @@ -20,10 +20,10 @@ All machines have large SSD storage (~10TB) driving 100+ TB shared network stora All Octopus nodes run Debian and GNU Guix and use Slurm for batch submission. We run lizardfs for distributed network file storage and we run the common workflow language (CWL) and Docker containers. The racks have dedicated 10Gbs high-speed Cisco switches and firewalls that are maintained by UTHSC IT staff. -This heavily used cluster, however, is almost self-managed by its users and was featured on the GNU Guix High Performance Computing [2020](https://hpc.guix.info/blog/2021/02/guix-hpc-activity-report-2020/) and [2022](https://hpc.guix.info/blog/2023/02/guix-hpc-activity-report-2022/) activity reports! -In 2023 we added 4 new AMD Genoa machines processors adding a total of 192 real CPU cores running at 4GHz. These machines also have 768Gb RAM. +This heavily used cluster, notably, is almost self-managed by its users and features on the GNU Guix High Performance Computing [2020](https://hpc.guix.info/blog/2021/02/guix-hpc-activity-report-2020/) and [2022](https://hpc.guix.info/blog/2023/02/guix-hpc-activity-report-2022/) activity reports! +In 2023 we added 4 new AMD Genoa machines processors adding a total of 192 real CPU cores running at 4GHz. These machines also have 768Gb RAM each. -The total number of cores for Octopus is now 456 real CPU cores. +The total number of cores for Octopus has doubled to a total of 456 real CPU cores and the Lizardfs SSD distributed network storage is getting close to 200TB with fiber optic interconnect. <table border="0" style="width:95%"> <tr> @@ -35,16 +35,16 @@ The total number of cores for Octopus is now 456 real CPU cores. ## Lambda server -Since August 2023, for large language models (LLMs) and AI, we have a 128 real core Lambda server with 1TB RAM, 40TB of nvme storage AND 8x NVIDIA RTX6000: a total of approx. 144,000 compute cores. +Additionally, since August 2023, we run a 128 real core Lambda server with 1TB RAM, 40TB nvme storage AND 8x NVIDIA RTX6000: a total of approx. 144,000 compute cores for large language models (LLMs) and AI. ## Backups -We run three Synology servers with a total of 300TB of storage. -We also have an off-site fallback server and encrypted backups in the Amazon cloud for the main web-service databases and files. +For backups we run three Synology servers with a total of 300TB of storage. +On demand we also deploy an off-site fallback server and encrypted backups in the Amazon cloud for the main web-service databases and files. ## Specials -We also run some 'specials' including an ARM-based NVIDIA Jetson and a +We run some 'specials' including an ARM-based NVIDIA Jetson and a RISC-V [PolarFire SOC](https://www.cnx-software.com/2020/07/20/polarfire-soc-icicle-64-bit-risc-v-and-fpga-development-board-runs-linux-or-freebsd/). We @@ -52,12 +52,12 @@ have also two RISC-V [SiFive](https://www.sifive.com/blog/the-heart-of-risc-v-development-is-unmatched) computers for development purposes. -Additionally, together with Chris Batten of Cornell and Michael Taylor of the University of Washington, Erik Garrison and Pjotr Prins are UTHSC PIs responsible for leading the NSF-funded [RISC-V supercomputer for pangenomics](https://news.cornell.edu/stories/2021/11/5m-grant-will-tackle-pangenomics-computing-challenge). This supercomputer will come online in 2025. +Additionally, together with Chris Batten of Cornell and Michael Taylor of the University of Washington, Erik Garrison and Pjotr Prins are UTHSC PIs responsible for leading the NSF-funded [RISC-V supercomputer for pangenomics](https://news.cornell.edu/stories/2021/11/5m-grant-will-tackle-pangenomics-computing-challenge). This RISC-V supercomputer 'in a rack' will come online in 2025. ## ISAAC access In addition to above hardware the GeneNetwork team has batch submission access to the HIPAA complient cluster computing resource at the ISAAC computing facility operated by the UT Joint Institute for Computational Sciences in a secure setup at the DOE Oak Ridge National Laboratory (ORNL) and on the UT Knoxville campus. -We have a 10 Gbit connection from the machine room at UTHSC to data transfer nodes at ISAAC. ISAAC has been upgraded in the past year (see [ISAAC system overview](https://oit.utk.edu/hpsc/available-resources/)) and has over 6 PB of high-performance Lustre DDN storage and contains about 18,000 cores with some large RAM nodes and 19 GPU nodes. +We have a 10 Gbit connection from the machine room at UTHSC to data transfer nodes at ISAAC. ISAAC has been upgraded in the past year (see [ISAAC system overview](https://oit.utk.edu/hpsc/available-resources/)) and has over 7 PB of high-performance Lustre DDN storage and contains over 20,000 cores with some large RAM nodes and 29 GPU nodes. Drs. Prins, Garrison, Colonna, Chen, Ashbrook and other team members use ISAAC systems to analyze genomic and genetic data sets. Note that we can not use ISAAC and storage facilities for public-facing web services because of stringent security requirements. ISAAC however, can be highly useful for precomputed genomics and genetics results using standardized pipelines. @@ -69,4 +69,4 @@ All current tools are maintained on [https://gitlab.com/genenetwork/guix-bioinfo ## Cloud computing -In addition the the "bare metal" described above we increasingly use cloud services for running VMs for teaching and fallbacks, as well as for storing data, including backups. Also we depend on cloud services for GPT-type work. +In addition the the "bare metal" described above we increasingly use cloud services for running VMs for teaching and fallbacks, as well as for storing data, including backups. |