Introduction

The role of a profiling tool is to associate computational bottlenecks that arise during program execution with easily identifiable segments of the underlying source code. The usefulness of a profiling tool depends upon the ease in which users can employ this information to alleviate identified bottlenecks within their programs.

The success of profiling tools in sequential languages has been predominantly based on the employment of three criteria as the platform on which profiling tools are built. The first of these criteria is `what' is measured; typically this might be the percentage of execution time spent in each part of the program. The second criteria is `where' in the code these costs should be attributed; costs may be associated with functions or libraries for example. The third criteria is `how-to-use' the profiling information to optimise programs in a quantifiable, portable and universal way; for example, problematic portions of code may be rewritten using an algorithm with improved asymptotic complexity.

The difference between profiling parallel programs as opposed to sequential programs is that parallel programs are executed on a number of processors. Consequently, each part of the code may be associated with up-to p costs, where p is the number of processors. The major challenge for the developers of profiling tools for parallel languages is to identify and expose the relationship (imbalance) of computational costs amongst processors, and subsequently express this relationship in terms of the three criteria outlined above. Unfortunately, within a parallel framework, there is a multiplicity of interacting issues that make these criteria significantly more obscure and complex:

What-to-cost: In parallel programming there are at least two kinds of cost which can cause bottlenecks within programs, computation and communication. These costs should not be decoupled and profiled independently as it is of paramount importance that the interaction between the two is identified and exposed to the user. The motivation being that if programs are optimised with respect to one of these costs it is not at the detriment of the other.
Where-to-cost: Costing communication can be problematic due to the fact that `related' communication costs on different processors may be caused by up-to p different (and interacting) parts of a program. For example, in message-passing systems, there exist p distinct and independently interacting `costable' parts of code. Profiling tools designed for such systems may therefore clutter the user with vast amounts of indigestible information unless careful attention is paid to the design. One such graphical system which suffers from this problem is upshot.
How-to-use: Most parallel algorithms written today are built upon programming models that have no usable cost model. Therefore, when profile information is used to optimise bottlenecks within programs, care has to be taken that these optimisations are not specifically tailored to a particular machine or architecture. As in the sequential setting, portable optimisation can only be achieved by improving the overall structure of algorithms in a quantifiable, portable and universal way---without a pragmatic cost model this cannot be realised.

In this paper it is demonstrated that parallel programs written using the disciplined approach of the BSP model are amenable to the three profiling criteria stated above. The development of a BSP profiling tool is documented. The work motivates the notion of computation and communication balance as the metric by which programs are optimised. It is shown that by minimising imbalance, significant improvements in the algorithmic complexity of parallel algorithms usually follows. This approach provides the foundation upon which portable and architecture independent optimisation can be achieved.

Back | Top | Next

Jonathan Hill

Last updated: June 14th 1997