NAME
bsp_put, bsp_hpput - deposit data into a remote processes
memory
C SYNOPSIS
#include "bsp.h"
void bsp_put(int pid,const void *src,void *dst,int offset,int nbytes);
void bsp_hpput(int pid,const void *src,void *dst,int offset,int nbytes);
FORTRAN SYNOPSIS
SUBROUTINE bspput(pid,src,dst,offset,nbytes)
INTEGER, intent(IN) :: pid,offset,nbytes
<TYPE>, intent(IN) :: src
<TYPE>, intent(OUT) :: dst
SUBROUTINE bsphpput(pid,src,dst,offset,nbytes)
INTEGER, intent(IN) :: pid,offset,nbytes
<TYPE>, intent(IN) :: src
<TYPE>, intent(OUT) :: dst
DESCRIPTION
The aim of bsp_put(3) and bsp_hpput(3) is to provide an
operation akin memcpy(3C) available in the Unix <string.h>
library. Both operations copy a specified number of bytes,
from a byte addressed data structure in the local memory of
one process into contiguous memory locations in the local
memory of another process. The distinguishing factor between
these operations is provided by the buffering choice.
The semantics buffered on source, buffered on destination
(see drma(3)) is used for bsp_put(3) communications. While
the semantics is clean and safety is maximised, puts may
unduly tax the memory resources of a system. Consequently,
bsplib(3) also provides a high performance put operation
bsp_hpput(3) whose semantics is {unbuffered on source,
unbuffered on destination . The use of this operation
requires care as correct data delivery is only guaranteed
if: (1) no communications alter the source area; (2) no sub-
sequent local computations alter the source area; (3) no
other communications alter the destination area; and (4) no
computation on the remote process alters the destination
area during the entire superstep. The main advantage of this
operation is its economical use of memory. It is therefore
particularly useful for applications which repeatedly
transfer large data sets.
EXAMPLES
1) The reverse function shown below highlights the
interaction between registration and put communica-
tions. This example defines a simple collective commun-
ication operation, in which all processes have to call
the function within the same superstep. The result of
the function on process i will be the value of the
parameter x from process bsp_nprocs()-i-1.
int reverse(int x) {
bsp_push_reg(&x,sizeof(int));
bsp_sync();
bsp_put(bsp_nprocs()-bsp_pid()-1,&x,&x,0,sizeof(int));
bsp_sync();
bsp_pop_reg(&x);
return x;
}
The function would be defined in Fortran as:
INTEGER FUNCTION reverse(x)
INCLUDE 'fbsp.h'
INTEGER x
CALL bsppushreg(x,BSPINT)
CALL bspsync()
CALL bspput(bspnprocs()-bsppid()-1,x,x,0,BSPINT)
CALL bspsync()
CALL bsppopreg(x)
reverse=x
END
By the end of the first superstep, identified by the
first bsp_sync(3), all the processes will have
registered the parameter x as being available for
remote access by any subsequent drma(3) operation. Dur-
ing the second superstep, each process transfers its
local copy of the variable x into a remote copy on pro-
cess bsp_nprocs()-bsp_pid()-1 . Although communications
occur to and from the same variable within the same
superstep, the algorithm does not suffer from problems
of concurrent assignment because of the buffered on
source, buffered on destination semantics of
bsp_put(3). This buffering ensures conflict-free com-
munication between the outgoing communication from x,
and any incoming transfers from remote processes. The
popregister at the end of the function reinstates the
registration properties that were active on entry to
the function at the next bsp_sync(3) encountered during
execution.
As can be seen from the Fortran code above, bsplib(3)
defines a collection of constant parameters that
specify the size, in bytes, of the various Fortran
data-types. BSPWORD, BSPINT, BSPLOGICAL, BSPREAL,
BSPDOUBLE, and BSPCOMPLEX can be used by including the
file fbsp.h within a compilation unit. The relationship
between the sizes of each of these types follows that
of Fortran 77. i.e.,
BSPWORD = BSPINT = BSPLOGICAL = BSPREAL
2*BSPWORD = BSPDOUBLE = BSPCOMPLEX
2) The procedure put_array shown below has a semantics
assignment:
forall i in {0,..n-1} xs[xs[i]] := xs[i]
Conceptually, the algorithm manipulates a global array
xs of n elements that are distributed among the
processes. The role of bsplib(3) is to provide the
infrastructure for the user to take care of the data
distribution, and any implied communication necessary
to manipulate parts of the data structure that are on a
remote process. Therefore, if the user distributes the
global array in a block-wise manner (i.e., process zero
gets elements 0 to n/(p-1), process one gets n/p to
(2n)/(p-1), etc.) with each process owning an n/p chunk
of elements, then the bsplib(3) communications neces-
sary to perform the concurrent assignment are shown
below.
void put_array(int *xs, int n) {
int i,pid,local_idx,n_over_p= n/bsp_nprocs();
if ((n % bsp_nprocs()) != 0)
bsp_abort("{put_array} n=%d not divisible by p=%d",
n,bsp_nprocs());
bsp_push_reg(xs,n_over_p*sizeof(int));
bsp_sync();
for(i=0;i<n_over_p;i++) {
pid = xs[i]/n_over_p;
local_idx = xs[i]%n_over_p;
bsp_put(pid,&xs[i],xs,local_idx*sizeof(int),sizeof(int));
}
bsp_sync();
bsp_pop_reg(xs);
}
Similarly, the function can be defined in Fortran as:
SUBROUTINE putarray(xs,n)
INCLUDE 'fbsp.h'
INTEGER xs(*),n
INTEGER i,pid,localidx,noverp
noverp=n/bspnprocs()
IF (MOD(n,bspnprocs()) .NE. 0) THEN
CALL bspabort('N not divisible by p')
END IF
CALL bsppushreg(xs,noverp*BSPINT)
CALL bspsync()
DO i=1,noverp
pid = xs(i)/noverp
localidx = MOD(xs(i),noverp)
CALL bspput(pid,xs(i),xs,localidx*BSPINT,BSPINT)
END DO
CALL bspsync()
CALL bsppopreg(xs)
END
The procedure highlights the use of bsp_abort(3) and
the offset parameter in bsp_put(3). Each process's
local section of the array xs is registered in the
first superstep. Next, n/p puts are performed, in which
the global numbering used in the distributed array
(i.e., indices in the range 0 through to n-1), are con-
verted into pairs of process identifier and local
numbering in the range 0 to n/(p-1). Once the conver-
sion from the global scheme to process-id/local index
has been performed, elements of the array can be
transferred into the correct index on a remote process.
It should be noted that if the value of the variable
pid is the same as bsp_pid(3), then a local assignment
(i.e., memory copy) will occur at the end of the super-
step.
3) Consider a function cyclic_shift executed on each pro-
cess, that takes an integer x as its argument, and
returns the value of x on its left neighbouring pro-
cess.
int cyclic_shift(int x) {
bsp_push_reg(&x,sizeof(int));
bsp_sync();
bsp_put( (bsp_pid() + 1) % bsp_nprocs(),
&x,
&x,0,
sizeof(int));
bsp_sync();
bsp_pop_reg(&x);
return x;
}
4) An alternative definition of cyclic shift that uses a
high performance put.
int cyclic_shift(int x) {
int result;
bsp_push_reg(&result,sizeof(int));
bsp_sync();
bsp_hpput( (bsp_pid() + 1) % bsp_nprocs(),
&x,
&result,0,
sizeof(int));
bsp_sync();
bsp_pop_reg(&result);
return result;
}
5) Consider a function bsp_allsums that calculates the
running sums of p values stored on p processors. i.e.,
if x_i is stored on process i, then the result on each
processor is x_0 + .. x_i.
#include "bsp.h"
#include <stdio.h>
int bsp_allsums(int x) {
int i, left, right;
bsp_push_reg(&left,sizeof(int));
bsp_sync();
right=x;
for(i=1;i<bsp_nprocs();i*=2) {
if (bsp_pid()+i < bsp_nprocs())
bsp_put(bsp_pid()+i,&right,&left,0,sizeof(int));
bsp_sync();
if (bsp_pid()>=i) right=left+right;
}
bsp_pop_reg(&left);
return right;
}
void main() {
int y;
bsp_begin(bsp_nprocs());
y = bsp_pid()+1;
printf("y=%d sums=%d\n",y,bsp_allsums(y));
bsp_end();
}
A compilation, and an example run on four processors is
shown below:
pippin> bspcc allsums.c
pippin> ./a.out
y=4 sums=10
y=2 sums=3
y=1 sums=1
y=3 sums=6
SEE ALSO
drma(3), bsp_push_reg(3), bsp_pop_reg(3), bsp_get(3),
bsp_hpget(3)
``BSPlib: The BSP Programming Library'' Jonathan M. D. Hill,
Bill McColl, Dan C. Stefanescu, Mark W. Goudreau, Kevin
Lang, Satish B. Rao, , Torsten Suel, Thanasis Tsantilas, and
Rob Bisseling. Parallel Computing, to appear 1998. See
http://www.bsp-worldwide.org for more details.
NOTES
i The destination memory area used in a put has to be
registered. It is an error to communicate into a data
structure that has not been registered.
ii The source of a put does not have to be registered.
iii If the destination memory area dst is registered with
size x, then it is a bounds error to perform the com-
munication bsp_put(pid,src,dst},o,n) if o+n>x.
iv A communication of zero bytes does nothing.
v A process can communicate into its own memory if pid =
bsp_pid(). However, for bsp_put(3), due to the buffered
at destination semantics, the memory copy only takes
effect at the end of the superstep.
vi The process numbering and offset parameter start from
zero, even for the FORTRAN bindings of the operations.
BUGS
Problems and bug reports should be mailed to bsplib-
bugs@comlab.ox.ac.uk
AUTHORS
The Oxford BSP Toolset implementation of BSPlib was written
by Jonathan.Hill@comlab.ox.ac.uk
http://www.comlab.ox.ac.uk/oucl/people/jonathan.hill.html
Man(1) output converted with
man2html