`flops`

Subject: Flop Count Missing in Matlab R12 From: Stephen VavasisDate: Thu, 30 Nov 2000 15:13:46 -0500 I installed Matlab 6.0 (R12) last week and was dismayed to find that the 'flops' function is gone. I have used this function many times in my published papers to compare two algorithms in computational tests. Probably many of you also have used 'flops' in your own research. I'm an editor of several NA journals and have seen 'flops' used by papers I have handled. I propose that numerical analysts should lobby Mathworks to reinstate the 'flops' function in a future version of Matlab. Do you agree? I'm not sure how to proceed. Perhaps you can send me email if you agree or disagree. Or maybe you can make a followup posting to NA Digest if you have some thoughts about 'flops'. Clearly there is a connection between our community and Mathworks, considering that Cleve Moler is the editor of this newsletter! Maybe we can use that connection to our advantage! Steve Vavasis (vavasis@cs.cornell.edu) [Cleve's response:] You don't need to lobby us, Steve. You've already got our vote for resurrecting FLOPS as soon as we can. The trouble is, LAPACK and, especially, optimized BLAS have no provision for keeping track of counts of floating point operations. Unfortunately, it would require more extensive software modifications than we are able to make. There might also be some degratation of performance. Jack Dongarra and his colleagues at the University of Tennessee ATLAS project tell us that they can get flop counts through access to privileged hardware instructions available on some modern chips, but I'm sure yet how that works. It would also be feasible to provide rough flop estimates that would have approximately the right coefficient of n^3 for matrix operations, but we don't think that's good enough. A number of people were concerned that the traditional FLOP function didn't count every single operation, including those in system level libraries outside of our control. By the way, I mentioned the FLOP difficulties briefly in my MATLAB News and Notes column about LAPACK. (http://www.mathworks.com/company/newsletter/ clevescorner/winter2000.cleve.shtml) Cleve

Subject: Re: Flop Count Missing in Matlab R12 From: Tim DavisDate: Tue, 05 Dec 2000 10:53:38 -0500 To count flops, we need to first know what they are. What is a flop? LAPACK is not the only place where the question "what is a flop?" is relevant. Sparse matrix codes are another. Multifrontal and supernodal factorization algorithms store L and U (and intermediate submatrices, for the multifrontal method) as a set of dense submatrices. It's more efficient that way, since the dense BLAS can be used within the dense submatrices. It is often better explicitly store some of the numerical zeros, so that one ends up with fewer frontal matrices or supernodes. So what happens when I compute zero times zero plus zero? Is that a flop (or two flops)? I computed it, so one could argue that it counts. But it was useless, so one could argue that it shouldn't count. Computing it allowed me to use more BLAS-3, so I get a faster algorithm that happens to do some useless flops. How do I compare the "mflop rate" of two algorithms that make different decisions on what flops to perform and which of those to include in the "flop count"? A somewhat better measure would be to compare the two algorithms based an external count. For example, the "true" flop counts for sparse LU factorization can be computed in Matlab from the pattern of L and U as: [L,U,P] = lu (A) ; Lnz = full (sum (spones (L))) - 1 ; % off diagonal nz in cols of L Unz = full (sum (spones (U')))' - 1 ; % off diagonal nz in rows of U flops = 2*Lnz*Unz + sum (Lnz) ; The same can be done on the LU factors found by any other factorization code. This does count a few spurious flops, namely the computation a_ij + l_ik*u_kj is always counted as two flops, even if a_ij is initially zero. However, even with this "better" measure, the algorithm that does more flops can be much faster. You're better off picking the algorithm with the smallest memory space requirements (which is not always the smallest nnz (L+U)) and/or fastest run time. So my vote is to either leave out the the flop count, or at most return a reasonable agreed-upon estimate (like the "true flop count" for LU, above) that is somewhat independent of algorithmic details. Matrix multiply, for example, should report 2*n^3, as Cleve states in his Winter 2000 newsletter, even though "better" methods with fewer flops (Strassen's method) are available. Tim Davis University of Florida davis@cise.ufl.edu

Subject: Re: Flop Count Missing in Matlab R12 From: Jack DongarraDate: Tue, 05 Dec 2000 15:29:45 -0500 This is to clarify some points in the discussion last week by Steve Vavasis and Cleve Moler in the na-digest about counting floating point operations. We have a project at the University of Tennessee called PAPI (http://icl.cs.utk.edu/papi/). The Performance API (PAPI) project specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. For years collecting performance data on applications programs has been an imprecise art. The user has had to rely on timers with poor resolution or granularity, imprecise empirical information on the number of operations performed in the program in question, vague information on the effects of the memory hierarchy, etc. Today hardware counters exist on every major processor platform. These counters can provide application developers valuable information about the performance of critical parts of the application and point to ways for improving performance. The current problem facing users and tool developers is that access to these counters is often poorly documented, unstable or unavailable to the user level program. The focus of the PAPI project is to provide an easy to use, common set of interfaces that will gain access to these performance counters on all major processor platforms, thereby providing application developers the information they need to tune their software on different platforms. The goal is to make it easy for users to gain access to the counters to aid in performance analysis, modeling, and tuning. For more details on PAPI see http://www.netlib.org/utk/people/JackDongarra/PAPERS/papi-sc2000.pdf Jack