Title: Recent improvements on the kernel operations used in iterative methods Abstract: This talk is focused on the solution of sparse linear system of equations using iterative methods. Given that the total time of an iterative methods can simply be obtained by mupltiplying the number of iterations by the (mean) time of an iteration, traditionnaly, to optimize the time to solution, mathematicians work on the iterative methods and their preconditioner to reduce the number of iterations and Computer Scientists try to reduce the (mean) time of an iteration. In this talk, three recent approaches to improve the latter point are given. The work presented addresses the two main costs of an iteration namely (a) the matrix-vector product and (b) the orthogonalization scheme. First of all, we consider the technique of 'blocking' the matrix in order to use BLAS2 in the sparse matrix-vector multiply. A new startegy to automatically select the block size of the optimal implementation for a given blocked matrix-vector product is given (see [1]). This involves a setup phase that probes machine characteristics, and a run-time phase where stored characteristics are combined with a measure of the actual sparse matrix to find the optimal kernel implementation. We present a performance model that is shown to be accurate over a large range of matrices. Then I explain how to relax the precision of the matrix-vector product during the convergence of an iterative methods without compromising the final accuracy (see [2]). The interest is that the less accurate the matrix-vector products are, the faster. Finally a new theoretical result of error analysis is given (see [3]). After a fine error analysis in floating-point arithmetic, we have been able to quantify theoretically the quality of the Classic Gram-Schmidt algorithm (a very fast orthogonalization scheme). This enables us to switch the orthogonalization scheme (fast or not fast) depending on the requested final accuracy (high precision requested or not). Reference: 1. Alfredo Buttari, Victor Eijkhout, Julien Langou and Salvatore Filippone (2004). Performance optimization and modeling of blocked sparse kernels. Technical Report UT-CS-04-543 2. Luc Giraud, Serge Gratton, and Julien Langou (2004). A note on relaxed and flexible GMRES. Technical Report CERFACS TR-PA-04-41 3. Luc Giraud, Julien Langou, Miroslav Rozloznik, and Jasper van den Eshof (2004). Rounding error analysis of the classical Gram-Schmidt orthogonalization process. Technical Report CERFACS TR-PA-04-77