• English
    • français
    • Deutsch
    • español
    • português (Brasil)
    • Bahasa Indonesia
    • русский
    • العربية
    • 中文
  • English 
    • English
    • français
    • Deutsch
    • español
    • português (Brasil)
    • Bahasa Indonesia
    • русский
    • العربية
    • 中文
  • Login
View Item 
  •   Home
  • OAI Data Pool
  • OAI Harvested Content
  • View Item
  •   Home
  • OAI Data Pool
  • OAI Harvested Content
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

All of the LibraryCommunitiesPublication DateTitlesSubjectsAuthorsThis CollectionPublication DateTitlesSubjectsAuthorsProfilesView

My Account

Login

The Library

AboutNew SubmissionSubmission GuideSearch GuideRepository PolicyContact

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors

Efficient Cache Organization For Application Specific And General Purpose Processors

  • CSV
  • RefMan
  • EndNote
  • BibTex
  • RefWorks
Author(s)
Rajan, Kaushik
Contributor(s)
Govindarajan, R
Keywords
Internet Cache Organization (Computer Science)
Internet Microprocessors
Processor Architectures
Packet Forwarding Engines (Routers)
General Purpose Processors
Heterogeneously Segmented Cache Architecture
Shephepherd Cache
Cache Architecture
Computer Science

Full record
Show full item record
URI
http://hdl.handle.net/20.500.12424/2581247
Online Access
http://hdl.handle.net/2005/838
Abstract
The performance gap between processor and memory continues to remain a major performance bottleneck in both application specific and general purpose processors. This thesis strives to ease the above bottleneck by exploiting the characteristics of the application domain to improve the cache organization for two distinct processor architectures: (1) application specific processors for packet forwarding, (2) general purpose processors. Packet forwarding algorithms make use of a trie data structure to determine the forwarding route. We observe that the locality characteristics of the nodes at various levels of such a trie are different. Nodes that are closer to the root node, especially those that are immediate children of the root node (level-one nodes), exhibit higher temporal locality than nodes lower down the trie. Based on this observation we propose a novel Heterogeneously Segmented Cache Architecture (HSCA) that uses separate caches for level-one and lower-level nodes, each with carefully chosen sizes. We also propose a new replacement policy to enhance the performance of HSCA. Performance evaluation indicates that HSCA results in up to 32% reduction in average memory access time over a unified cache that shares the same cache space among all levels of the trie. HSCA also outperforms a previously proposed results cache. The use of a large root branching factor in a forwarding trie forcefully introduces a large number of nodes at level-one. Among these, only nodes that cover prefixes from the routing table are useful while the rest, are superfluous. We find that as many as 75% of the level-one nodes are superfluous. This leads to a skewed distribution of useful nodes among the cache sets of the level-one nodes cache. We propose a novel two-level mapping framework that achieves a better nodes to cache set mapping and hence incurs fewer conflict misses. Two-level mapping first aggregates nodes into Initial Partitions (IPs) using lower order bits and then remaps them from IPs into Refined Partitions (RPs), that form sets, based on some higher order bits. It provides flexibility in placement by allowing each IP to choose a different remap function. We propose three schemes conforming to the framework. A speedup in average memory access time of as much as 16% is gained over HSCA. In general purpose processor architectures, the design objectives of caches at various levels of the hierarchy are different. To ensure low access latencies, L1 caches are small and have low associativities, making them more susceptible to conflict misses. The extent of conflict misses incurred is governed by the placement function and the memory access patterns exhibited by the program. We propose a mechanism to learn the access characteristics of the program at runtime by analyzing the repetitive phases of program. We then make use of the two-level mapping framework to dynamically adapt the placement function. Further, we elegantly incorporate two-level mapping into the cache organization without increasing the cache access latency. Performance evaluation reveals that the proposed adaptive placement mechanism eliminates 32—36% of misses on average over a range of cache sizes. To prevent expensive off-chip accesses, L2 caches are larger and have higher associativities. Hence, the replacement policy plays a significant role in determining L2 cache performance. Further, as the inherent temporal locality in memory accesses is filtered out by the L1 cache, an L2 cache using the widely prevalent LRU replacement policy incurs significantly higher misses than the optimal replacement policy (OPT). We propose to bridge this gap through a novel replacement strategy that mimics the replacement decisions of OPT. The L2 cache is logically divided into two components, a Shepherd Cache (SC) with a simple FIFO replacement and a Main Cache (MC) with an emulation of optimal replacement. The SC plays the dual role of caching lines and shepherding the replacement decisions close to optimal for MC. Our proposed organization can cover 40% of the gap between LRU and OPT, resulting in 7% overall speedup.
Date
2010-08-25
Type
Thesis
Identifier
oai:etd.ncsi.iisc.ernet.in:2005/838
http://hdl.handle.net/2005/838
Collections
OAI Harvested Content

entitlement

 
DSpace software (copyright © 2002 - 2022)  DuraSpace
Quick Guide | Contact Us
Open Repository is a service operated by 
Atmire NV
 

Export search results

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.