We are informing you about the submission and utilization of batch job.
About HTCondor
HTCondor is a batch system of our CMS Tier-3. You can use HTCondor command for submitting jobs, checking job results, and checking server conditions.
Import Infomation for HTCondor
You can not access /xrootd or /xrootd_user directory on WN. You should use xrootd protocol directly.
You can use a private network on KISTI server. Its prefix is root://cms-xrdr.private.lo:2094
On the outside, you need to public address to access KISTI storage. Its address is root://cms-xrdr.sdfarm.kr:1094/~~.
This path requires User proxy certificates which is signed for VOCMS (Please see voms-proxy-init command)
You need to copy your proxy CA to WN's /tmp directory or set up a working directory ($_CONDOR_SCRATCH_DIR) as a certificate directory.
Important HTCondor Command
condor_status
condor_status command is used to check machine status. Generally, no option command was used.
[geonmo@ui20 geonmo]$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@cms-gpu01.sdfarm.kr LINUX X86_64 Unclaimed Idle 0.000 386684 2+21:24:51
slot1@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Unclaimed Idle 0.000 193349 0+16:29:10
slot2@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Unclaimed Idle 0.000 87842 2+18:49:40
slot2_1@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Claimed Busy 0.010 2944 0+14:22:49
slot2_9@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Claimed Busy 1.000 2944 2+19:09:24
slot2_10@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Claimed Busy 1.000 2944 2+19:06:09
The job slot can be divided dynamically depending on the characteristics of the requested job. KISTI CMS Tier-3 prepared three dynamic slots for each machine, and each slot allocated CPU resources 2:1:1.
For CMS jobs without any resource option, the resource was requested as 1 core, 2933MB RAM. If you want more resources for slots, you can request to modify resource options. However, slot matching will be difficult for big size jobs. (Longer wait time)
condor_q
condor_q can check information about the job. For more detailed information on the submitted job, you can check it using the -l( -long) option.
If you want to know job resource request information, you use condor_q -better-analyze command. If the job is held, you can check hold reason.
[geonmo@ui20 geonmo]$ condor_q -better-analyze 3671768.0
-- Schedd: ui10.sdfarm.kr : <134.75.124.121:9618?...
The Requirements expression for job 3671768.000 is
((HasSingularity == true)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) &&
(TARGET.Memory >= RequestMemory) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))
Job 3671768.000 defines the following attributes:
DiskUsage = 2
FileSystemDomain = "sdfarm.kr"
RequestDisk = DiskUsage
RequestMemory = 2930
The Requirements expression for job 3671768.000 reduces to these conditions:
Slots
Step Matched Condition
----- -------- ---------
[0] 211 HasSingularity == true
[1] 211 TARGET.Arch == "X86_64"
[3] 211 TARGET.OpSys == "LINUX"
[5] 211 TARGET.Disk >= RequestDisk
[7] 211 TARGET.Memory >= RequestMemory
[9] 211 TARGET.FileSystemDomain == MY.FileSystemDomain
3671768.000: Job is running.
Last successful match: Wed May 20 21:41:45 2020
3671768.000: Run analysis summary ignoring user priority. Of 88 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match and are already running your jobs
0 match but are serving other users
88 are able to run your job
-- Schedd: ui20.sdfarm.kr : <134.75.124.127:9618?...
condor_history
It is a command to check information about the finished jobs. The method of use is the same as the condor_q command.
condor_submit
Submit your job using the Job Description Submit File (.sub or .jds). The basic job submission statement form is as follows.
filename : template.jds
#### Job Batch name / Your jobs will be displayed as this name.
JobBatchName = condor_status_check
#### Executable main program file
# Generally, a binary file was used.
# However, you can choose a bash script to setup envirionment and run.
executable = test.sh
#### Most cases, the "vanilla" universe is selected for a normal job.
# Java universe for java application, docker universe for docker application.
# Each universe provides the application's environment and extra ClassADs.
universe = vanilla
#### Argument for application.
# eg) test.sh 13
# $(Process) means job's process ID
# FYI) JobId = $(Cluster).$(Process),
arguments = $(Process)
### Sync OS environments.
# However, it can not provide perfectly. Please, check WN's env setting.
getenv = True
### Enable the feature to send and receive files
# 이 기능이 꺼졌다면 executable 파일을 WN로 전송하지 않습니다.
# 또한, 실행 결과 파일도 가져오지 않습니다.
# 작업 결과 파일을 가져오지 않고 공유 디렉토리에서 실행할 경우에는 NO로 설정합니다.
should_transfer_files = YES
### 위 키워드와 같이 사용합니다.
when_to_transfer_output = ON_EXIT
### 요구조건 설정
# 여기서는 Hostname과 동일한 Machine을 사용하도록 설정하였습니다.
# 특별한 조건이 필요한 머신만 특정지어 사용할 때 사용됩니다.
requirements = (Machine =?= "$(Hostname)")
### CMS에서만 사용되는 별도의 태그들 (선택사항)
# Tag는 프로그램의 이름, JobType은 MC, Analysis 중 설정합니다.
+Tag = "condor_check v1.22"
+JobType = "Analysis"
### 실행시 표준 출력 및 에러 저장 파일
output = job_$(Hostname).out
error = job_$(Hostname).err
### 작업 제출 때의 로그, submit 머신의 로그라고 볼 수 있습니다.
log = job.log
### 송신할 입력 파일 및 수신할 결과 파일 이름 지정
transfer_input_files = input_sandbox.tar.gz
transfer_output_files = result.root
### 결과파일의 이름이 작업마다 겹칠 경우 다른 이름으로 저장하도록 지정하여야 합니다.
# 여기서는 result.root 파일들을 Hostname 변수를 추가하여 저장합니다.
transfer_output_remaps = "result.root = result_$(Hostname).root"
### 자원 요구량 설정
#request_Cpus=1
#request_GPUs =0
#request_memory=2933
#request_disk = 1
### 이메일 알람 설정
#notification = Error
#notify_user = cmst3-support@kisti.re.kr
### Group Account 정보
#queue 13
queue 1 Hostname from test.txt
When submitting a job using the JDS file, run it with the command below.
condor_submit template.jds
Currently, there are several differences between KISTI GSDC Tier-3 and the general environment regarding the condor_submit. In order to utilize the integrated farm cluster, CMS users must create a JDS with the accounting_group="group_cms". We set default bash environment for this to alias condor_submit. However, if you write a bash script directly, please add the contents.
alias condor_submit='condor_submit -append accounting_group="group_cms"'
alias condor_submit6='condor_submit -append accounting_group="group_cms" -append "+SingularityImage=\"/cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el6:latest\"" -append "+SingularityBindCVMFS=True" -append "+SingularityBind=\"/cvmfs,/cms,/share,/cms_scratch\""'
alias condor_submit7='condor_submit -append accounting_group="group_cms" -append "+SingularityImage=\"/cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest\"" -append "+SingularityBindCVMFS=True" -append "+SingularityBind=\"/cvmfs,/cms,/share,/cms_scratch\""'
If you modify the contents a little, you can submit the job to the ubuntu linux environment.