An Introduction to Parallel Programming

An Introduction to Parallel Programming

Language: English

Pages: 392

ISBN: 0123742609

Format: PDF / Kindle (mobi) / ePub


Author Peter Pacheco uses a tutorial approach to show students how to develop effective parallel programs with MPI, Pthreads, and OpenMP. The first undergraduate text to directly address compiling and running parallel programs on the new multi-core and cluster architecture, An Introduction to Parallel Programming explains how to design, debug, and evaluate the performance of distributed and shared-memory programs. User-friendly exercises teach students how to compile, run and modify example programs.

Key features:

  • Takes a tutorial approach, starting with small programming examples and building progressively to more challenging examples
  • Focuses on designing, debugging and evaluating the performance of distributed and shared-memory programs
  • Explains how to develop parallel programs using MPI, Pthreads, and OpenMP programming models
  • Writing Compilers and Interpreters: A Software Engineering Approach

    iPad All-in-One For Dummies (4th Edition)

    Programming Language Pragmatics (3rd Edition)

    Computer Networking: A Top-Down Approach (5th Edition)

    Data Science at the Command Line: Facing the Future with Time-Tested Tools

    The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    cost of the partial tour. Therefore, rather than just using an array for the tour data structure and recomputing these values, we use a struct with three members: the array storing the cities, the number of cities, and the cost of the partial tour. To improve the readability and the performance of the code, we can use preprocessor macros to access the members of the struct. However, since macros can be a nightmare to debug, it’s a good idea to write “accessor” functions for use during initial

    Feasible, Add_city—need to access the adjacency matrix representing the digraph, so all the threads will need to access the digraph. However, since these are only read accesses, this won’t result in a race condition or contention among the threads. Program 6.7 Pseudocode for a Pthreads implementation of a statically parallelized solution to TSP There are only four potential differences between this pseudocode and the pseudocode we used for the second iterative serial implementation: • The use

    and they’re equally spaced. Its syntax is int MPI_Type_vector( int    count /* in */, int    blocklength /* in */, int    stride /* in */, MPI_Datatype old_mpi_t /* in */, MPI_Datatype* new_mpi_t_p /* out */); For example, if we had an array x of 18 doubles and we wanted to build a type corresponding to the elements in positions 0, 1, 6, 7, 12, 13, we could call int MPI_Type_vector(3, 2, 6, MPI_DOUBLE, &vect_mpi_t); since the type consists of 3 blocks, each of which has 2 elements, and the

    64 3.56 0.38 If we compare the performance of the version that uses busy-waiting with the version that uses mutexes, we don’t see much difference in the overall run-time when the programs are run with fewer threads than cores. This shouldn’t be surprising, as each thread only enters the critical section once; so unless the critical section is very long, or the Pthreads functions are very slow, we wouldn’t expect the threads to be delayed very much by waiting to enter the critical section.

    coherence can have a dramatic effect on the performance of shared-memory systems. To illustrate this, recall our Pthreads matrix-vector multiplication example: The main thread initialized an m × n matrix A and an n-dimensional vector x. Each thread was responsible for computing m/t components of the product vector y = Ax. (As usual, t is the number of threads.) The data structures representing A, x, y, m and n were all shared. For ease of reference, we reproduce the code in Program 4.13. Program

    Download sample

    Download