6 debugging applications, 7 monitoring node activity, 8 tuning applications – HP XC System 3.x Software User Manual

Page 5: 9 using slurm, 10 using lsf-hpc

Advertising
background image

5.4 Submitting a Batch Job or Job Script.....................................................................................................53
5.5 Submitting a Job from a Host Other Than an HP XC Host......................................................................55
5.6 Running Preexecution Programs..........................................................................................................56

6 Debugging Applications.............................................................................................57

6.1 Debugging Serial Applications.............................................................................................................57
6.2 Debugging Parallel Applications..........................................................................................................57

6.2.1 Debugging with TotalView..........................................................................................................57

6.2.1.1 SSH and TotalView..............................................................................................................58
6.2.1.2 Setting Up TotalView...........................................................................................................58
6.2.1.3 Using TotalView with SLURM..............................................................................................58
6.2.1.4 Using TotalView with LSF-HPC...........................................................................................59
6.2.1.5 Setting TotalView Preferences..............................................................................................59
6.2.1.6 Debugging an Application...................................................................................................59
6.2.1.7 Debugging Running Applications........................................................................................60
6.2.1.8 Exiting TotalView................................................................................................................61

7 Monitoring Node Activity............................................................................................63

7.1 Installing the Node Activity Monitoring Software.................................................................................63
7.2 Using the xcxclus Utility to Monitor Nodes...........................................................................................63
7.3 Plotting the Data from the xcxclus Datafiles..........................................................................................65
7.4 Using the xcxperf Utility to Display Node Performance.........................................................................66
7.5 Plotting the Node Performance Data....................................................................................................67
7.6 Running Performance Health Tests.......................................................................................................68

8 Tuning Applications.....................................................................................................73

8.1 Using the Intel Trace Collector and Intel Trace Analyzer........................................................................73

8.1.1 Building a Program — Intel Trace Collector and HP-MPI...............................................................73
8.1.2 Running a Program – Intel Trace Collector and HP-MPI.................................................................74

8.2 The Intel Trace Collector and Analyzer with HP-MPI on HP XC.............................................................75

8.2.1 Installation Kit............................................................................................................................75
8.2.2 HP-MPI and the Intel Trace Collector............................................................................................75

8.3 Visualizing Data – Intel Trace Analyzer and HP-MPI.............................................................................77

9 Using SLURM................................................................................................................79

9.1 Introduction to SLURM.......................................................................................................................79
9.2 SLURM Utilities..................................................................................................................................79
9.3 Launching Jobs with the srun Command..............................................................................................79

9.3.1 The srun Roles and Modes...........................................................................................................80

9.3.1.1 The srun Roles....................................................................................................................80
9.3.1.2 The srun Modes..................................................................................................................80

9.3.2 Using the srun Command with HP-MPI.......................................................................................80
9.3.3 Using the srun Command with LSF-HPC......................................................................................80

9.4 Monitoring Jobs with the squeue Command.........................................................................................80
9.5 Terminating Jobs with the scancel Command........................................................................................81
9.6 Getting System Information with the sinfo Command...........................................................................81
9.7 Job Accounting...................................................................................................................................81
9.8 Fault Tolerance...................................................................................................................................82
9.9 Security..............................................................................................................................................82

10 Using LSF-HPC............................................................................................................83

10.1 Information for LSF-HPC...................................................................................................................83
10.2 Overview of LSF-HPC Integrated with SLURM...................................................................................84
10.3 Differences Between LSF-HPC and LSF-HPC Integrated with SLURM..................................................85
10.4 Job Terminology................................................................................................................................86

Table of Contents

5

Advertising