In the introduction it was stated that care has to be taken when optimising programs based on profile information. A problem with most commercially available profiling tools is that profile data is often error-prone, especially when wall-clock time is used as a cost metric. One of the important features of BSP is that the size of h-relations directly influences the cost of communication. Therefore, instead of using actual communication time as a cost metric, which may involve errors, the predicted cost of communication, hg + l, is used. This is error-free as the value of h, which is not affected by the choice of the underlying machine or architecture, is accurately recorded at runtime. Therefore, our hypothesis is encapsulated in the notion that the imbalance in maximum and average h-relation can be effectively used as the metric by which BSP programs are optimised and optimal architecture independent parallel algorithms can be developed. The hypothesis is reinforced by both the BSP cost analysis formulae and experimental results.
The cost of the two broadcasting algorithms support this hypothesis. It can be clearly seen from the nodes in Figure 3 labelled one_stage_bcast and two_stage_broadcast, that the two-stage broadcast is superior to the one stage, as it is revealed by the accumulated values for the computation, communication and idle costs. For example, the performance on a 16 processor Cray T3E gives an improvement of: