Stencil computations arise in a wide range of applications of computational sciences. This paper focuses on stencil computations arising in the context of a biomedical simulation. Compute-intensive bio-medical simulations represent an attractive application for the Cell Broadband Engine Architecture (CBEA) and for graphics processing units (GPUs) as hardware accelerators. Due to the low arithmetic intensity of stencil computations and bandwidth limitations of the compute hardware, the performance is usually only a fraction of peak performance. We detail an implementation of parallel stencil computations on the CBEA and GPUs, which improves performance by exploiting temporal locality. We report on performance improvements over CPU implementations.