LQCD Homepage

LQCD Home

QDCOC Computing

QBATCH Information


Introduction

Current Status

Quick Start

File Management

Job Description

Interactive Jobs

Queues on ACC Mbds

Power Cycling

qhdwcheck wrapper
qhdwcheck errors database web front

Deleting jobs

Basic PBS commands

PBS accounting

Machine Status


Web Display (Under Construction)
(Allocation status of all available partitions)

QCDOC Status (USDOE only)
(Partitions, Jobs DB, etc.)

Batch System: Current Status
(Available Queues, Running Jobs, etc.)

Errors Database
(DB of ASIC and Wire errors.)

Accounting and Usage Statistics


QCDOC Usage (USDOE only)(Under Construction)
USDOE QCDOC Machine Usage

New Users


Computer Accounts

Accessing QCDOC

CTS accounts

CyberSecurity Training

RBRC Users Mailing List

USDOE Users Mailing List


Internal Links
(Available to QCDOC Admins Only)

QBATCH: Interactive Jobs

How to run an Interactive PBS job

Some ACC single MBd partitions have been reserved for Interactive PBS queues that can be used for testing and debugging jobs.
To run an Interactive job the user does not need to copy any of the batch startup scripts (like qbatch.pbs) required by Batch jobs. It is up to the user to start the necessary qdaemon and qcsh processes and issue the appropriate qcsh commands (like qinit, qpartition_connect etc.). A detailed description of the steps that are needed to run a QCDOC job can be found at the User Guide .

Interactive queues have a one hour walltime limit.

To submit an Interactive PBS job you must specify the -I argument in the qsub command and the ACCI (all capitals) queue:

 qsub -I -q ACCI
 
After the job has started you should allocate the machine partition that has been assigned to your job by running pallocate:
 $> /usr/local/bin/pallocate
 
After the partition has been allocated, depending on your SHELL init scripts, you may have to source the appropriate QOS setup script and (if needed) set the QMACHINE env. variable. The latest QOS version is advertised in the motd file and is currently at: /qcdoc/sfw/qos/v2.6.0/pro.
You can now run qsession:
 $> qsession acc7/slot3
 
or start the qdaemon and the qcsh processes:
 $> qdaemon -d -m acc7/slot3 &
 $> qcsh
 
(starting the qdaemon and qcsh processes (or simply running qsession) can be done on a separate shell window).

You should now have a qcsh prompt ready to accept commands. For a detailed descriptions of the steps needed to startup a machine partitions see the User Guide .

Example of running an Interactive job

-bash:stratos@qcdochostb:~> qsub -I -q ACCI 
qsub: waiting for job 5523.qcdochostb.qcdoc.bnl.gov to start
qsub: job 5523.qcdochostb.qcdoc.bnl.gov ready

mom_close_poll entered

-bash:stratos@qcdochostb:~> source $CRE_HOME/bin/setup.sh  

-bash:stratos@qcdochostb:~> /qcdoc/local/etc/pallocate  
 Machine Partition acc7/slot3 is now allocated. 

 You should now source the QOS setup script (if you haven't already done so) 
 and (if needed) set the QMACHINE env. variable to acc7/slot3.

 You may then start the qdaemon process (in the background): qdaemon -d -m acc7/slot3 & 
 and the qcsh process: qcsh

 OR simply run qsession:  qsession acc7/slot3
 (and you may even do all these on another shell) 

-bash:stratos@qcdochostb:~> qdaemon -d -m acc7/slot3 &  
[1] 2322536
-bash:stratos@qcdochostb:~>Initialising Qdaemon
Detaching from terminal and backgrounding

-bash:stratos@qcdochostb:~> qcsh  

(qcdochostb:/home/stratos:QCSH)% qinit acc7/slot3 
qhelper on socket 5 
..created system socket for commands.. ..authorization complete. 
Child
Exec /qcdoc/sfw/qos/devel/v2.6.0-CJ/aix5.2f/qhost/bin/qhelper 5

(qcdochostb:/home/stratos:QCSH)% qpartition_connect -p 0 
QD:TextParser::doqpartition_connect.M> partition connect
QD:TextParser::doqpartition_op.M> doqpartition_op
QD:TextParser::doqpartition_op.M> got client
QD:TextParser::doqpartition_op.M> trying to parse
QD:TextParser::parse_partitionID.M> Operating on partition 0
QD:TextParser::doqpartition_op.M> Calling PartitionMgr
QD:PartitionManager::process().M> PartitionManager thread:....
QD:PartitionManager::process().M> Connect request
QD:PartitionManager::qpartition_connect.M> Connecting for execute
QD:PartitionManager::qpartition_connect.M> Reserved partition 0 for client--1
QD:PartitionManager::qpartition_connect.M> Connection Established
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr 

(qcdochostb:/home/stratos:QCSH)% qreset_sys 
QD:Partition::PrintState.M> Run kernel is running
QD:Partition::PrintState.M> Serial communications are NOT up
QD:Partition::PrintState.M> Application axes are NOT mapped
RBC:Boot0::doResetSys.M> Sending Update IP to 64 nodes
RBC:Boot0::doResetSys.M> Sending Reset SYS to 64 nodes
RBC:Boot0::doResetSys.M> Ignore the following error messages
    ...........  
    ...........  
    ...........  
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr 

(qcdochostb:/home/stratos:QCSH)% qreset_boot 
QD:Partition::PrintState.E> No idea what is running
QD:Partition::PrintState.M> Serial communications are NOT up
QD:Partition::PrintState.M> Application axes are NOT mapped
RBC:Boot0::EstablishComms.M> Successfully contacted all nodes
    ...........  
    ...........  
    ...........  
QD:Partition::qreset_boot.M> Success
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr 

(qcdochostb:/home/stratos:QCSH)% qdiscover 
QD:Partition::PrintState.M> Run kernel is running
QD:Partition::PrintState.M> Serial communications are NOT up
QD:Partition::PrintState.M> Application axes are NOT mapped
QD:Partition::qscu_train.> Setting up SCU hardware
QD:Partition::qscu_train.> Starting training process
QD:Partition::qscu_train.> Completing training process
QD:Partition::qscu_train.> Switching to idle bytes
QD:Partition::qscu_train.> Releasing Dircom units
    ...........  
    ...........  
    ...........  
QD:Partition::DiscoverTopology.M> QCDOC is ready
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr 

(qcdochostb:/home/stratos:QCSH)%  qrun $QOS/quser/usertest/usertest.x 
QD:Partition::PrintState.M> Run kernel is running
QD:Partition::PrintState.M> Serial communications are up
QD:Partition::PrintState.M> Application axes are NOT mapped
    ...........  
    ...........  
    ...........  
QD:RkerMgr::ThreadMain.M> Exiting server thread 738
QD:RkerMgr::ApplicationFinish.M> Retrieving machine status
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr Normal exit

(qcdochostb:/home/stratos:QCSH)% qdetach 
disconnected

(qcdochostb:/home/stratos:QCSH)% exit 
exit
[1]+  Done                    qdaemon -d -m acc7/slot3

-bash:stratos@qcdochostb:~> exit 
logout

qsub: job 5525.qcdochostb.qcdoc.bnl.gov completed

One of ten national laboratories overseen and primarily funded by the Office of Science of the U.S. Department of Energy (DOE), Brookhaven National Laboratory conducts research in the physical, biomedical, and environmental sciences, as well as in energy technologies and national security. Brookhaven Lab also builds and operates major scientific facilities available to university, industry and government researchers. Brookhaven is operated and managed for DOE's Office of Science by Brookhaven Science Associates, a limited-liability company founded by Stony Brook University, the largest academic user of Laboratory facilities, and Battelle, a nonprofit, applied science and technology organization.
Privacy and Security Notice