Decreasing the Miss Rate and Eliminating the Performance Penalty of a Data Filter Cache
While data filter caches (DFCs) have been shown to be effective at reducing data access energy, they have not been adopted in processors due to the associated performance penalty caused by high DFC miss rates. In this paper we present a design that both decreases the DFC miss rate and completely eliminates the DFC performance penalty even for a level-one data cache (L1 DC) with a single cycle access time. First, we show that a DFC that lazily fills each word in a DFC line from an L1 DC only when the word is referenced is more energy efficient than eagerly filling the entire DFC line. For a 512B DFC, we are able to eliminate loads of words into the DFC that are never referenced before being evicted, which occurred for about 75% of the words in 32B lines. Second, we demonstrate that a lazily word filled DFC line can effectively share and pack data words from multiple L1 DC lines to lower the DFC miss rate. For a 512B DFC, we completely avoid accessing the L1 DC for loads about 23% of time and avoid a fully associative L1 DC access for loads 50% of the time, where the DFC only requires about 2.5% of the size of the L1 DC. Finally, we present a method that completely eliminates the DFC performance penalty by speculatively performing DFC tag checks early and only accessing DFC data when a hit is guaranteed. For a 512B DFC, we improve data access energy usage for the DTLB and L1 DC by 33% with no performance degradation.
David Whalley received his PhD in CS from the University of Virginia in 1990. He is the E.P. Miles professor in the Computer Science Department at Florida State University, a Distinguished Member of the ACM, an IEEE Fellow, and a Fulbright Distinguished Chair Award recipient. His research interests include low-level compiler optimizations, tools for supporting the development and maintenance of compilers, computer architecture, and embedded systems. He has developed compiler optimizations and architectural features to improve performance, decrease code size, and reduce energy usage. More information about his background and research can be found on his home page, http://www.cs.fsu.edu/~whalley.