The BLAS (Basic Linear Algebra Subprograms) are high quality "building block" routines for performing basic vector and matrix operations. Level 1 BLAS do vector-vector operations, Level 2 BLAS do matrix-vector operations, and Level 3 BLAS do matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they're commonly used in the development of high quality linear algebra software. Machine-specific optimized BLAS libraries are available for a variety of computer architectures. These optimized BLAS libraries are provided by the computer vendor or by an independent software vendor. See, among other sources, CSEP e-book} http://csep1.phy.ornl.gov/la/la.html and the Templates book} http://www.netlib.org/linalg/html_templates/Templates.html
Alternatively, a user can download ATLAS to automatically generate an optimized BLAS library for his architecture. ATLAS is an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. The production of such software for machines ranging from desktop workstations to embedded processors can be a tedious and time consuming task. ATLAS has been designed to automate much of this process. In effect, ATLAS performs an array of basic computations designed to explore the architecture of the machine, and generates optimized kernels that constitute the heart of the BLAS.
There is a version of BLAS for parallel architectures, and a version for sparse matrix computations.
To give some notion of work, let us look at efficiency of some routines.
It is instructive to classify the BLAS according to the number of floating point operations and the number of memory references required for a basic linear algebra operation.
The parameter q is the ratio of flops to memory references. Larger values of q maximize useful work to time spent moving data. Roughly speaking, the higher the level of the BLAS, the larger q.
Thus, higher-level BLAS are preferred in constructing algorithms for modern computers that implement vector and matrix operations efficiently.