QBATCH: Interactive Jobs
How to run an Interactive PBS job
Some ACC single MBd partitions have been reserved for Interactive PBS queues
that can be used for testing and debugging jobs.
To run an Interactive job the user does not need to copy any of the batch
startup scripts (like qbatch.pbs) required by Batch jobs.
It is up to the user to start the necessary qdaemon and qcsh processes
and issue the appropriate qcsh commands (like qinit,
qpartition_connect etc.). A detailed description of the steps that are
needed to run a QCDOC job can be found at the
User Guide .
Interactive queues have a one hour walltime limit.
To submit an Interactive PBS job you must specify the -I argument
in the qsub command and the ACCI (all capitals) queue:
qsub -I -q ACCI
After the job has started you should allocate the machine partition that has been assigned
to your job by running pallocate:
$> /usr/local/bin/pallocate
After the partition has been allocated, depending on your SHELL init scripts, you may have
to source the appropriate QOS setup script and (if needed) set the QMACHINE env. variable.
The latest QOS version is advertised in the motd file and is currently at:
/qcdoc/sfw/qos/v2.6.0/pro.
You can now run qsession:
$> qsession acc7/slot3
or start the qdaemon and the qcsh processes:
$> qdaemon -d -m acc7/slot3 &
$> qcsh
(starting the qdaemon and qcsh processes (or simply running qsession) can be done on a separate
shell window).
You should now have a qcsh prompt ready to accept commands. For a detailed
descriptions of the steps needed to startup a machine partitions see the
User Guide .
Example of running an Interactive job
-bash:stratos@qcdochostb:~> qsub -I -q ACCI
qsub: waiting for job 5523.qcdochostb.qcdoc.bnl.gov to start
qsub: job 5523.qcdochostb.qcdoc.bnl.gov ready
mom_close_poll entered
-bash:stratos@qcdochostb:~> source $CRE_HOME/bin/setup.sh
-bash:stratos@qcdochostb:~> /qcdoc/local/etc/pallocate
Machine Partition acc7/slot3 is now allocated.
You should now source the QOS setup script (if you haven't already done so)
and (if needed) set the QMACHINE env. variable to acc7/slot3.
You may then start the qdaemon process (in the background): qdaemon -d -m acc7/slot3 &
and the qcsh process: qcsh
OR simply run qsession: qsession acc7/slot3
(and you may even do all these on another shell)
-bash:stratos@qcdochostb:~> qdaemon -d -m acc7/slot3 &
[1] 2322536
-bash:stratos@qcdochostb:~>Initialising Qdaemon
Detaching from terminal and backgrounding
-bash:stratos@qcdochostb:~> qcsh
(qcdochostb:/home/stratos:QCSH)% qinit acc7/slot3
qhelper on socket 5
..created system socket for commands.. ..authorization complete.
Child
Exec /qcdoc/sfw/qos/devel/v2.6.0-CJ/aix5.2f/qhost/bin/qhelper 5
(qcdochostb:/home/stratos:QCSH)% qpartition_connect -p 0
QD:TextParser::doqpartition_connect.M> partition connect
QD:TextParser::doqpartition_op.M> doqpartition_op
QD:TextParser::doqpartition_op.M> got client
QD:TextParser::doqpartition_op.M> trying to parse
QD:TextParser::parse_partitionID.M> Operating on partition 0
QD:TextParser::doqpartition_op.M> Calling PartitionMgr
QD:PartitionManager::process().M> PartitionManager thread:....
QD:PartitionManager::process().M> Connect request
QD:PartitionManager::qpartition_connect.M> Connecting for execute
QD:PartitionManager::qpartition_connect.M> Reserved partition 0 for client--1
QD:PartitionManager::qpartition_connect.M> Connection Established
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr
(qcdochostb:/home/stratos:QCSH)% qreset_sys
QD:Partition::PrintState.M> Run kernel is running
QD:Partition::PrintState.M> Serial communications are NOT up
QD:Partition::PrintState.M> Application axes are NOT mapped
RBC:Boot0::doResetSys.M> Sending Update IP to 64 nodes
RBC:Boot0::doResetSys.M> Sending Reset SYS to 64 nodes
RBC:Boot0::doResetSys.M> Ignore the following error messages
...........
...........
...........
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr
(qcdochostb:/home/stratos:QCSH)% qreset_boot
QD:Partition::PrintState.E> No idea what is running
QD:Partition::PrintState.M> Serial communications are NOT up
QD:Partition::PrintState.M> Application axes are NOT mapped
RBC:Boot0::EstablishComms.M> Successfully contacted all nodes
...........
...........
...........
QD:Partition::qreset_boot.M> Success
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr
(qcdochostb:/home/stratos:QCSH)% qdiscover
QD:Partition::PrintState.M> Run kernel is running
QD:Partition::PrintState.M> Serial communications are NOT up
QD:Partition::PrintState.M> Application axes are NOT mapped
QD:Partition::qscu_train.> Setting up SCU hardware
QD:Partition::qscu_train.> Starting training process
QD:Partition::qscu_train.> Completing training process
QD:Partition::qscu_train.> Switching to idle bytes
QD:Partition::qscu_train.> Releasing Dircom units
...........
...........
...........
QD:Partition::DiscoverTopology.M> QCDOC is ready
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr
(qcdochostb:/home/stratos:QCSH)% qrun $QOS/quser/usertest/usertest.x
QD:Partition::PrintState.M> Run kernel is running
QD:Partition::PrintState.M> Serial communications are up
QD:Partition::PrintState.M> Application axes are NOT mapped
...........
...........
...........
QD:RkerMgr::ThreadMain.M> Exiting server thread 738
QD:RkerMgr::ApplicationFinish.M> Retrieving machine status
QDaemonReturn 0
QKerReturn 0
AppReturn 0
ExitStr Normal exit
(qcdochostb:/home/stratos:QCSH)% qdetach
disconnected
(qcdochostb:/home/stratos:QCSH)% exit
exit
[1]+ Done qdaemon -d -m acc7/slot3
-bash:stratos@qcdochostb:~> exit
logout
qsub: job 5525.qcdochostb.qcdoc.bnl.gov completed
|