LQCD Home
QDCOC Computing
Lattice Archives at BNL
Contacts
Accessing QCDOC
Command Line Allocator
(replaces the Web Allocator)
Call Tracking System (CTS)
(Account is required)
User Guide
Batch System
File Transfers to/from BNL
Transfering Files between US LQCD Sites
(FNAL Link)
Columbia Physics System (CPS)
(COLUMBIA UNIVERSITY Link)
| US LQCD Common Runtime Environment |
CRE: Setup
(CRE_HOME, setup scripts etc. )
CRE: Filesystems
(QDATA, QCACHE, QSCRATCH etc.)
CRE: Interactive System
(Compilers, Libraries, devel. tools, etc.)
CRE: File Management
(qsplit, qunsplit etc.)
CRE Definition (pdf)
(as of June 14th, 2006)
Web Display (Under Construction)
(Allocation status of all available partitions)
QCDOC Status (USDOE only)
(Partitions, Jobs DB, etc.)
Batch System: Current Status
(Available Queues, Running Jobs, etc.)
Errors Database
(DB of ASIC and Wire errors.)
Computer Accounts
Accessing QCDOC
CTS accounts
CyberSecurity Training
RBRC Users Mailing List
USDOE Users Mailing List
|
QCDOC Computing at BNL
Brookhaven National Laboratory (BNL)
currently hosts two large QCDOC machines:
one for the RBRC community
and the other for the
US Lattice Gauge Theory community (image).
In addition there are four Air Cooled Crates (ACC) with Single Motherboard (64-node)
partitions (image) available for testing and debugging and two Single Slot Back Plane
(SSBP) used by BNL techs.
Each QCDOC machine consists of 12288 processing nodes (ASICs) hosted
in twelve water cooled racks (1024 nodes each) with a peak performance
of 10 Tflop (see image). ASICs, designed by our collaboration
and built by IBM, are interconnected in a six-dimensional, low-latency mesh
network with the topology of a torus. Each has a 4MBytes Embedded DRAM
and a 128MBytes external DRAM and is currently running at 400Mhz.
More information about this architecture can be found on the
QCDOC architecture
and publication web pages.
Front-End hosts and Remote Access
The front-end node of each QCDOC (qcdochosta for the RBRC community, qcdochostb
for the US LQCD community) provides the physical connection to the machine
partitions via multiple network interfaces. Users cross-compile their codes and
manage the machine partitions on the front-end node.
The front-end hosts can be accessed remotely via ssh gateways (ssh.qcdoc.bnl.gov).
Since the QCDOC machines reside in a network enclave, even users within BNL
need to go through the ssh gateways (ssh.qcdoc.bnl.local).
Available File Systems
The "host" filesystem is globally shared by all processing nodes.
It is usually provided by the front-end host (500GB) but it may also be provided
by NAS linux servers.
The parallel file system (pfs) is used for high IO throughput from the processing nodes.
It is provided by NAS linux servers; one per machine rack for the RBRC machine,
two per rack for the US LQCD machine. It is similar to scratch disks on cluster
processing nodes. Each qcdoc nodes writes to a unique directory on the pfs systems.
All pfs systems are NFS mounted on the corresponding front-end host.
|