We are informing you about the submission and utilization of batch job.
About HTCondor
HTCondor is a batch system of our CMS Tier-3. You can use HTCondor command for submitting jobs, checking job results, and checking server conditions.
Import Infomation for HTCondor
You can not access /xrootd or /xrootd_user directory on WN. You should use xrootd protocol directly.
You can use a private network on KISTI server. Its prefix is root://cms-xrdr.private.lo:2094
On the outside, you need to public address to access KISTI storage. Its address is root://cms-xrdr.sdfarm.kr:1094/~~.
This path requires User proxy certificates which is signed for VOCMS (Please see voms-proxy-init command)
You need to copy your proxy CA to WN's /tmp directory or set up a working directory ($_CONDOR_SCRATCH_DIR) as a certificate directory.
Important HTCondor Command
condor_status
condor_status command is used to check machine status. Generally, no option command was used.
[geonmo@ui20 geonmo]$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@cms-gpu01.sdfarm.kr LINUX X86_64 Unclaimed Idle 0.000 386684 2+21:24:51
slot1@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Unclaimed Idle 0.000 193349 0+16:29:10
slot2@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Unclaimed Idle 0.000 87842 2+18:49:40
slot2_1@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Claimed Busy 0.010 2944 0+14:22:49
slot2_9@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Claimed Busy 1.000 2944 2+19:09:24
slot2_10@cms-t3-wn3001.sdfarm.kr LINUX X86_64 Claimed Busy 1.000 2944 2+19:06:09
The job slot can be divided dynamically depending on the characteristics of the requested job. KISTI CMS Tier-3 prepared three dynamic slots for each machine, and each slot allocated CPU resources 2:1:1.
For CMS jobs without any resource option, the resource was requested as 1 core, 2933MB RAM. If you want more resources for slots, you can request to modify resource options. However, slot matching will be difficult for big size jobs. (Longer wait time)
condor_q
condor_q can check information about the job. For more detailed information on the submitted job, you can check it using the -l( -long) option.
If you want to know job resource request information, you use condor_q -better-analyze command. If the job is held, you can check hold reason.
[geonmo@ui20 geonmo]$ condor_q -better-analyze 3671768.0 --Schedd:ui10.sdfarm.kr:<134.75.124.121:9618?...TheRequirementsexpressionforjob3671768.000is ((HasSingularity == true)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) &&
(TARGET.Memory>=RequestMemory) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))Job3671768.000definesthefollowingattributes:DiskUsage=2FileSystemDomain="sdfarm.kr"RequestDisk=DiskUsageRequestMemory=2930TheRequirementsexpressionforjob3671768.000reducestotheseconditions:SlotsStepMatchedCondition----------------------[0] 211 HasSingularity == true[1] 211 TARGET.Arch == "X86_64"[3] 211 TARGET.OpSys == "LINUX"[5] 211 TARGET.Disk >= RequestDisk[7] 211 TARGET.Memory >= RequestMemory[9] 211 TARGET.FileSystemDomain == MY.FileSystemDomain3671768.000:Jobisrunning.Lastsuccessfulmatch:WedMay2021:41:4520203671768.000:Runanalysissummaryignoringuserpriority.Of88machines,0arerejectedbyyourjob's requirements 0 reject your job because of their own requirements 0 match and are already running your jobs 0 match but are serving other users 88 are able to run your job-- Schedd: ui20.sdfarm.kr : <134.75.124.127:9618?...
condor_history
It is a command to check information about the finished jobs. The method of use is the same as the condor_q command.
condor_submit
Submit your job using the Job Description Submit File (.sub or .jds). The basic job submission statement form is as follows.
filename : template.jds
#### Job Batch name / Your jobs will be displayed as this name.JobBatchName=condor_status_check#### Executable main program file# Generally, a binary file was used. # However, you can choose a bash script to setup envirionment and run. executable=test.sh#### Most cases, the "vanilla" universe is selected for a normal job. # Java universe for java application, docker universe for docker application.# Each universe provides the application's environment and extra ClassADs.universe=vanilla#### Argument for application.# eg) test.sh 13# $(Process) means job's process ID# FYI) JobId = $(Cluster).$(Process), arguments=$(Process)### Sync OS environments.# However, it can not provide perfectly. Please, check WN's env setting.getenv=True### Enable the feature to send and receive files# 이 기능이 꺼졌다면 executable 파일을 WN로 전송하지 않습니다.# 또한, 실행 결과 파일도 가져오지 않습니다. # 작업 결과 파일을 가져오지 않고 공유 디렉토리에서 실행할 경우에는 NO로 설정합니다.should_transfer_files=YES### 위 키워드와 같이 사용합니다.when_to_transfer_output=ON_EXIT### 요구조건 설정# 여기서는 Hostname과 동일한 Machine을 사용하도록 설정하였습니다.# 특별한 조건이 필요한 머신만 특정지어 사용할 때 사용됩니다.requirements= (Machine =?="$(Hostname)")### CMS에서만 사용되는 별도의 태그들 (선택사항)# Tag는 프로그램의 이름, JobType은 MC, Analysis 중 설정합니다.+Tag="condor_check v1.22"+JobType="Analysis"### 실행시 표준 출력 및 에러 저장 파일output=job_$(Hostname).outerror=job_$(Hostname).err### 작업 제출 때의 로그, submit 머신의 로그라고 볼 수 있습니다.log=job.log### 송신할 입력 파일 및 수신할 결과 파일 이름 지정transfer_input_files=input_sandbox.tar.gztransfer_output_files=result.root### 결과파일의 이름이 작업마다 겹칠 경우 다른 이름으로 저장하도록 지정하여야 합니다.# 여기서는 result.root 파일들을 Hostname 변수를 추가하여 저장합니다.transfer_output_remaps="result.root = result_$(Hostname).root"### 자원 요구량 설정#request_Cpus=1#request_GPUs =0 #request_memory=2933#request_disk = 1### 이메일 알람 설정#notification = Error#notify_user = cmst3-support@kisti.re.kr### Group Account 정보#queue 13queue1Hostnamefromtest.txt
When submitting a job using the JDS file, run it with the command below.
condor_submittemplate.jds
Currently, there are several differences between KISTI GSDC Tier-3 and the general environment regarding the condor_submit. In order to utilize the integrated farm cluster, CMS users must create a JDS with the accounting_group="group_cms". We set default bash environment for this to alias condor_submit. However, if you write a bash script directly, please add the contents.