The BLAS - Basic Linear Algebra Subprograms - are a library of subroutines designed to provide efficient computation of commonly-used linear algebra routines, like dot products, matrix-vector multiplies, and matrix-matrix multiplies. The naming convention is not unlike other libraries - the fist letter indicates precision, the rest gives a hint (maybe) of what the routine does, e.g. SAXPY, DGEMM.
The BLAS are divided into 3 levels: vector-vector, matrix-vector, and matrix-matrix. The biggest speed-up can be in level 3, because of the size of MM.
Examples:
Level 1
Level 2
Level 3
Roughly, Level 1 can give about 20 Mflops, Level 2 about 30, and Level 3 about 60, on 1997-98 generation chips, IF THE PROBLEM SIZE IS BIG ENOUGH.
How efficient is the BLAS?
load/store float ops refs/ops level 1 SAXPY 3N 2N 3/2 level 2 SGEMV MN+N+2M 2MN 1/2 level 3 SGEMM 2MN+MK+KN 2MNK 2/N