Frequently Asked Questions about Computing at the Center for Theoretical Biological Physics.
Q: How to apply for a computing account on supercomputers at Rice?
A: You may use your NetID to apply for accounts on the supercomputers at Rice. You may apply for a regular user account (for most students and research staff) or a guest account (for external collaborators and guests not directly affiliated with Rice University).
Remember to provide exactly the required information, especially your account sponsor's information. For most CTBP users, the sponsor is our Project Manager Colleen Morimoto: Email: firstname.lastname@example.org, Phone: 713-348-8202.
Before you apply for an account, you may wonder what kind of supercomputer is best for your computing needs. You can learn about the Rice supercomputers by looking through our site at Center for Research Computing.
After you submit your application, your sponsor will review and either approve or reject your application. If approved, the system administrator of supercomputers will notify you when your account is ready. To apply for accounts please visit the CRC Accounts Application.
Q: What can I do if I have a computing question or problem?
A: You may submit a ticket to ask for help at this site: Request Help. When submitting a ticket, please describe your problem as specifically as possible, e.g. what is the problem (program compiling, job submitting, computing error, etc.), which supercomputer, how did the problem happen, where are the input files, output error message, etc.
You may also ask for help from Dr. Xiaoqin Huang: Room 1061, BRC building, Email: email@example.com, Phone: 713-348-8868.
For questions about your desktop/laptop, you may submit a ticket through the Rice OIT Help Desk.
Other Suggestions: You may also search for information online related to your particular program or question, e.g. for questions using GROMACS, this site is very helpful: GROMACS Mailing List.
Looking for possible clues or solutions in multiple ways will help you become more familiar with the supercomputer and the programs you are using, therefore solving your questions more efficiently.
Q: What computing resources are available for us at Rice University?
A: For research-related computing needs, either the High Performance/Throughput Computing (HPC and/or HTC), or data storage and virtual machines, the Center for Research Computing runs and maintains a collection of shared computing facilities and services that are available to all Rice-affiliated researchers. The documentation of these supercomputers is available at Rice Computing Resources.
There are currently three supercomputers at Rice University:
NOTS: The details of NOTS are here: NOTS Information. This supercomputer is the biggest and fastest so far at Rice and is continuing to expand. This computer was initially built with Hewlett Packard Enterprise (HPE) hardware structure and then Dell PowerEdge Boxes, all embedded with Intel Chips. It is now composed of 136 nodes with Intel Ivy Bridge CPUs (Intel Xeon CPU E5-2650 v2 @2.60GHZ), 28 nodes with Intel Broadwell CPUs (Intel Xeon CPU E5-2650 v4 at 2.20GHz), and 60 nodes with Intel Skylake CPUs (Intel Xeon Gold 6126 CPU at 2.60GHz). The Ivy Bridge nodes have no high-speed interconnection and can take only single node (16 CPUs) jobs, but the Broadwell and Skylake nodes (24 CPUs per node) are connected with OmniPath massage passing interface, so are able to take jobs requesting multiple nodes. In addition, 2 Intel Ivy Bridge nodes (bc8u27n1.nots.rice.edu, bc8u27n2.nots.rice.edu) at NOTS are also equipped with 4 Nvidia Tesla K80 GPUs at each node, which help GPGPU applications to run much faster.
DAVinCI: The details of DAVinCI are here: DAVinCI Information. This system has 2400 processor cores in 192 Intel Westmere nodes (12 CPUs/node, at 2.83 GHz clock speed with 48 GB RAM/node) and 6 Intel Sandy Bridge nodes (16 CPUs/node, at 2.2GHz clock speed with 128 GB RAM/node). All of the nodes are connected via QDR InfiniBand (40 Gb/sec). Among the Westmere nodes, 16 nodes are equipped with NVIDIA Fermi GPGPUs (M2650). Both parallel and serial jobs can be submitted through SLURM script.
PowerOmics: The details of PowerOmics are here: PowerOmics Information. This computer is different from the other two, equipped with 6 IBM POWER8 S822L nodes. Each node has 24 IBM POWER8 processors, and each processor supports 8 threads. The clock speed is 3.02GHz. In total, it has 1152 threads and 144 POWER8 CPUs, supporting the HPC (MPI) jobs and HTC (OpenMP) jobs, or hybrid MPI+OpenMP jobs. 2 nodes of this computer have 1TB RAM for the needs of large-memory computing.
In addition, CTBP purchased condo nodes on both DAVinCI (Intel sandy bridge CPUs, the queues like ctbp-onuchic, ctbp-common, ctbp-wolynes) and NOTS (all three kinds of Intel CPUs as described above, queues as ctbp-onuchic, ctbp-common).
Q: What should I do after my accounts are open?
A: Please consider or ask yourself the following:
- How much of the quota for $HOME directory, $PROJECTS and $WORK directories, and /or $SCRATCH directory is needed? Use different commands (e.g. “quota -s” or “mmlsquota –block-size 1G -g $(id) -gn” on DAVinCI only) to see the quota size of your specific directory, and remember not to use more than that size. Otherwise, you will not be able to write any files to these directories.
- Are my files backed up? Usually, the data at your home directory are backed up by default, but data at other directories may not be, e.g. data at scratch are subject to purge policy. Make sure all your data are backed up properly and regularly.
- What kinds of programs/applications are already installed as modules? Check system information for your supercomputer, e.g. with command "module avail" and/or “module spider”, to see what programs/software are already compiled and commonly shared. You may load a program by typing "module load program-name", so you can directly use the executable file of the loaded program. If you need additional programs or a new version of your specific program, you may ask our CRC staff (submit a ticket) to compile it for you, or install under your $HOME directory.
- How do I submit a job? Each of the supercomputers uses the similar job scheduling and job submitting script. For tutorials on how to write and submit job scripts, please visit “Getting Started” information of each supercomputer.
►Tips for writing and submitting job scripts:
- Set wall_clock_limit properly, e.g. 30 minutes for interactive testing, 8 hours, 72 hours (queue “serial_long” on DAVinCI), 24 hours
- Set group name, e.g. cms16, or another name, check it first
- Determine the number of requested processors
- Determine the queue to run your jobs
- Check the script before submitting, trying the least but most confident -args in the script, in order to avoid the possible script syntax error.
Q: How do I transfer data/files from/to Rice supercomputers?
A: There are two methods for data/file transfer:
- Transferring data/files between different directories within the same platform or the platforms within the same local network. One can use the common Linux commands “cp,” ”scp,” “ftp,” and “sftp” to transfer data/files. The disadvantage is that the speed is slow, and is easily interrupted because of the network overcrowding.
- Transferring data/files from remote host or a host not in the same local network. Nowadays, the Data Transfer Nodes (DTN) is commonly set up, and is already available on Rice supercomputers, and the details are at this site: CRC DTNs. The Globus package greatly facilitates remote data/files transfers. In order to use this tool, first download the package and install it on a local machine, for example, a Mac laptop. During the installation, a local subdirectory is assigned for the outside-coming data/files. At the interface of Globus website, login with your Rice NetID and Password, then you can set up two endpoints at the “Manage Data” pulldown. One endpoint is your local subdirectory on the machine at your hand, and another endpoint is the DTN site at Rice supercomputers. You click the “Transfer Files” at the “Manage Data” pulldown, and you can start to transfer data/files from the /dascratch/yourNetID or /work/cms16/yourNetID to local machine.
Q: What is the Rice BOX for CTBP?
A: The Rice BOX storage is for the long-term saving of research data/file. Rice has a long-term contract with the BOX cloud storage. This BOX Cloud Storage provides Rice affiliates with long-term and unlimited size for saving research data/files. To get started, read through the documentation (Knowledge of Rice BOX).
In order to meet the NSF funding requirement regarding data management, we use the Rice BOX as the long-term storage for our research data/files, and already set up a sub-directory as “CTBP”. All the data/files of a finished research project or published paper have to be backed up in the Rice BOX. This local policy has been established and every user at CTBP should follow, that is, you are required to back up all your research data before your graduation or moving to your next employer. To back up your research data/files, you will be invited to join the CTBP subdirectory at the Rice BOX right after you join the CTBP family.
Q: How do I compile a particular program for my research projects?
A: As coding and compiling are important for computing, the process below should be followed:
- Get the source code with the version that meets your calculation needs. You can also write or modify your own codes.
- After obtaining the right version of the code, read the documentation to understand the structures (i.e. folders and sub-directories) of the program and the developer-offered instructions.
- Set the proper compilers, libraries, and flags.
- Write a correct compiling script, and run your script.
- Check the warnings and errors, to make sure the executable is generated correctly. Be sure each set of compilers has a particular set of libraries and a special set of flag options. In order to avoid compiling errors, it is best to be consistent and use one set of compilers and the related libraries
- Find the information system and determine if it is a 64-bit or 32-bit processor. To find the system information type the command "cat /proc/cpuinfo." A 64-bit processor is capable of storing 264 computational values, including memory addresses. This means it’s able to access over four billion times as much physical memory than a 32-bit processor. It’s possible that a 64-bit processor can run a 32-bit program/software, but a 32-bit processor cannot run a program/software designed with 64-bit architecture in mind. Locally, all Rice supercomputers have 64-bit processors.
- Select the proper set of compilers, by "module avail", “module spider” and/or "which C++", etc. Each supercomputer has a different set of compilers. For examples: NOTS has different version of both GNU compilers (gcc, g++, gfortran, mpicc, mpic++, mpif77/f90), and Intel compilers (icc, icpc, ifort, mpicc, mpic++, mpif77/f90). Usually, the newer version of compilers, the better, and the Intel compilers have better performance (i.e. make your executable run faster).
It is the best to use one set of compilers consistently throughout the whole compiling process, and to ensure that all compiling flags are compatible with one another. For more detailed instructions on compiling programs, visit the “Getting Started” documentation of each supercomputer.
Q: How do I find and link application libraries?
A: Each supercomputer has a set of application libraries/tools installed, and mostly put at the path as: /opt/apps/, e.g. on NOTS, under the path /opt/apps/software, and you can find then by the command “module spider”. The application libraries include BOOST, FFTW, HDF5, NETCDF, PYTHON3, etc. For a particular library, you may ask for help to build, or try to build by yourself. To link a library, use "-I" and/or "-L" flags, e.g. "-I/opt/apps/software/MPI/GCC/7.3.0/OpenMPI/3.1.2/FFTW/3.3.8/include" to use the FFTW head files. The "-L/opt/apps/software/MPI/GCC/7.3.0/OpenMPI/3.1.2/FFTW/3.3.8/lib " to link the FFTW library. Correctly linking the libraries is critical to successfully compiling a particular program.
Q: How do I know that my program/executable runs correctly and efficiently?
A: After the program is compiled, use a very typical case to do a test run, and check the results are reasonable and correct. A very typical case means an example with the least but the most confident number of parameters in your input file, and this can be done in a short time, e.g. half hour. In order to run your program more efficiently, you are encouraged to do benchmarking, especially for parallelized programs, e.g. using different number of CPUs to see the time-need to finish certain amount of output data, or how much data can be generated during certain amount of time.
Q: How do I identify possible errors and debug a program?
A: It is usually not easy to identify the possible source of error in a program. For parallel jobs, these signals as listed here are typical, but partly reasonable. Signal 6: SIGABRT, a job died immediately after submitted and has no or very little content in the output file. One possible reason is that the executable image is too large to load. Signal 7: SIGBUS, a job terminated unexpectedly with a message "killed with signal 7", indicates the program experienced an unhandled alignment error. This error could occur when an improperly memory aligned data value is accessed. Signal 9 or Signal 11: SIGKILL, a job terminated in this way, possibly the job ran past its allotted time and was killed by the scheduler. In C/C++ program, this also possibly indicates a pointer pointed to some area of code within the program, which it should not (Signal 9), or a pointer pointed to a location in memory outside of the program space (Signal 11). Signal 10: this is rare in the LINUX/UNIX system, possibly indicates a "bus error", and comes from incorrect assembly instructions being written to CPU. This error could also happen when using the wrong "bit" compiler, e.g. use a 64-bit compiler on a 32-bit platform. Signal 13: possibly pipe failure, that is, one process is trying to write to a process but there is no process to receive the data. Exit all processes and restart the program run, to see it helps or not. To debug a program, you need to put "-g" flag in your compiling option or script, to allow the compiler to collect the debugging information. After that, one way to debug is manually, i.e. read through the code piece by piece, and print out the supposed output, to see where the program goes wrong or stop, then fix the error by modifying/re-writing the code. Another way is to use tools, e.g. gdb (at /usr/bin/gdb) to debug your executable compiled with GNU compilers; VTiune to debug your executable compiled with Intel compilers and libnraries; valgrind (download from valgrind.org/ and install) to detect possible memory leaks.
Q: What is MPI?
A: MPI, the Message Parsing Interface, is a library standard designed specifically for parallel computing, which helps to move the data from the address space of one process to that of another process through cooperative operations on each process. Here is a good tutorial of MPI basics: MPI tutorial. It provides the means to enable computing communication between different processors. The MPI library has been implemented on all the Rice supercomputers, and can be found by using "module avail" and/or “module spider” to see which version of MPI is available (e.g. “module spider OpenMPI/3.1.2”). Please read through the “Getting Started” information of each Rice supercomputer: Rice Computing Resources.
To parallel a code/program, follows six steps:
- To include the MPI header file, e.g. include "mpif.h" or include "mpi.h" in C/C++;
- To get MPI started, i.e. "MPI_Init (&argc, &argv)";
- To decide how many MPI tasks and the master "ID", e.g. "MPI_Comm_size (MPI_COMM_WORLD, &nprocs)"; "MPI_Comm_rank(MPI_COMM_WORLD, &myid)";
- To send out data to all the computing processes by MPI_Bcast or MPI_Send, e.g. "MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD)"; where the "&n" is the starting address, "1" is the number of entries, "MPI_INT" is the data type, "0" is the rank of broadcast root. "MPI_COMM_WORLD" is the communicator. MPI_Send must be used together with MPI_Recv;
- To receive data at each process by "MPI_Recv", or collect data from all the processes by "MPI_Reduce" after computing, e.g. "MPI_Reduce (&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD)"; at which "&mypi" is the data address to send from each process, "&pi" is the data address to receive, "1" is the number of data to collect, "MPI_DOUBLE" is the data type, "MPI_SUM" is to sum up all the data from all the processes, "0" is the rank of root where the summed data to go to, "MPI_COMM_WORLD" is the communicator.
- To get MPI stopped after calculation is finished, i.e. "MPI_Finalize()".
The key points are:
- All the MPI tasks have to call "MPI_Init" and "MPI_Finalize", and these two functions can be called only one time in the whole code/program, i.e. no MPI calls are allowed outside the region between "MPI_Init" and "MPI_Finalize". This is true for all kinds of program parallelization, no matter how big and how many of subroutines of a program or software package has;
- MPI functions of data sending, receiving and/or collecting (e.g. MPI_Send, MPI_Recv, MPI_Reduce) can be used as many times as needed and can be scattered everywhere inside the code/program.
To learn more about MPI, Rice has offered these courses for beginners to advanced users: "COMP 322", "COMP 422", and "COMP 522": MPI Courses.
Q: What is GPGPU?
A: GPGPU, General Purpose computing on Graphics Processing Unites, is a methodology that handles the High-Performance-Computing (HPC) with the properties of highly data parallel and intensive throughput. Highly data parallel means that all processors can simultaneously operate on different data elements, and intensive throughput means that the algorithm will process lots of data elements, ensuring huge data elements will be operated in parallel. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model, which is created by NVIDIA. The CUDA platform is accessible to software developers through CUDA-accelerated libraries, compiler directives, and extensions to programming languages, e.g. C, C++ (CUDA C/C++, i.e. nvcc compiler), and Fortran (PGI CUDA Fortran). This book “Programming Massively Parallel Processors, second edition: A Hands-on Approach” (by David B. Kirk and Wen-mei W. Hwu from UIUC) explains well the theory and concepts about CUDA parallel computing. This book "CUDA by Example: An Introduction to General-Purpose GPU Programming” (by Jason Sanders and Edward Kandrot) is very good for practicing CUDA programming. The NVIDIA website has a lot of information about GPU calculations. GPUs are also available on both NOTS (K80) and DAVinCI (M2650).
The GPGPU is now commonly implemented into a lot of programs/software packages. Here is the list: GPU-enabled Applications. For examples, GPU-enabled GROMACS; AMBER Project; LAMMPS with GPU; GPU-accelerated NAMD and VMD, etc.