Yıl: 2022 Cilt: 30 Sayı: 5 Sayfa Aralığı: 1758 - 1772 Metin Dili: İngilizce DOI: 10.55730/1300-0632.3903 İndeks Tarihi: 08-12-2022

Using a static naming approach to implement remote scope promotion

Öz:
GPUs employ simple coherence mechanisms and require explicit use of costly synchronization operations for data integrity. Local-scoped synchronization can be utilized to lower the performance penalty of synchronization when sharing is within a subgroup of threads. Unfortunately, in asymmetric sharing (which is an important dynamic sharing pattern), it is necessary to use global-scoped synchronization due to possible accesses by remote sharers. Remote Scope Promotion (RSP) was introduced to take advantage of local-scoped synchronization at regular accesses while using scope promotion at occasional remote accesses. First implementation of RSP makes use of a simple approach that performs costly cache operations on all L1 data caches when implementing scope promotion, and therefore, it performs poorly on large scale GPU systems. We present nRSP which utilizes a static naming mechanism to identify regularly accessing agent in asymmetric sharing and avoids applying costly coherence actions on every L1 data cache when implementing scope promotion. We evaluate nRSP using timing detailed Gem5-APU simulator modeling a GPU system with 128 Compute Units and show that nRSP lowers remote synchronization overhead greatly and improves performance considerably. On average, nRSP provides around 28% speedup on a 128 Compute Unit GPU device.
Anahtar Kelime: Asymmetric synchronization GPUs remote scope promotion work-stealing

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Hower D, Hechtman B, Beckmann B, Gaster B, Hill M et al. Heterogeneous-race-free memory models. SIGPLAN Not. 2014; 49 (4): 427–440.
  • [2] Blumofe R, Leiserson C. Scheduling multithreaded computations by work stealing. J. ACM 1999; 46 (5): 720–748.
  • [3] Orr M, Che S, Yilmazer A, Beckmann B, Hill M et al. Synchronization Using Remote-Scope Promotion. SIGPLAN Not. 2015; 50 (4): 73–86.
  • [4] Singh I, Shriraman A, Fung W, O’Connor M, Aamodt T. Cache coherence for GPU architectures. In: IEEE 19th International Symposium on High Performance Computer Architecture (HPCA); 2013. pp. 578-590.
  • [5] Ren X, Lis M. Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence. In: IEEE International Symposium on High Performance Computer Architecture (HPCA); 2017. pp. 625-636.
  • [6] Tabbakh A, Qian X, Annavaram M. G-TSC: Timestamp Based Coherence for GPUs. In: IEEE International Symposium on High Performance Computer Architecture (HPCA); 2018. pp. 403-415.
  • [7] Sinclair M, Alsop J, Adve S. Efficient GPU synchronization without scopes: saying no to complex consistency models. In: The 48th International Symposium on Microarchitecture (MICRO-48); Association for Computing Machinery, New York, NY, USA; 2015. pp. 647–659.
  • [8] Choi B, Komuravelli R, Sung H, Smolinski R, Honarmand N et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism. In: International Conference on Parallel Architectures and Compilation Techniques; 2011. pp. 155-166.
  • [9] Fung W, Singh I, Brownsword A, Aamodt T. Hardware transactional memory for GPU architectures. In: The 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44); Association for Computing Machinery, New York, NY, USA; 2011. pp. 296–307.
  • [10] Xu Y, Wang R, Goswami N, Li T, Gao L et al. Software Transactional Memory for GPU Architectures. In: The Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’14); Association for Computing Machinery, New York, NY, USA; 2014. pp. 1–10.
  • [11] Cederman D, Tsigas P, Chaudhry M. Towards a software transactional memory for graphics processors. In: The 10th Eurographics conference on Parallel Graphics and Visualization (EG PGV’10); Eurographics Association, Goslar, DEU; 2010. pp. 121–129.
  • [12] Villegas A, Asenjo R, Navarro A, Plata O, Kaeli D. Lightweight Hardware Transactional Memory for GPU Scratchpad Memory. IEEE Transactions on Computers 2018; 67 (6): 816-829.
  • [13] Ren X, Lis M. High-Performance GPU Transactional Memory via Eager Conflict Detection. In: IEEE International Symposium on High Performance Computer Architecture (HPCA); 2018. pp. 235-246.
  • [14] Villegas A, Navarro A, Asenjo R, Plata O. Toward a software transactional memory for heterogeneous CPU–GPU processors. J Supercomput 2019; 75: 4177–4192.
  • [15] Li A, Braak G, Corporaal H, Kumar A. Fine-Grained Synchronizations and Dataflow Programming on GPUs. In: The 29th ACM on International Conference on Supercomputing (ICS ’15); Association for Computing Machinery, New York, NY, USA; 2015. pp. 109–118.
  • [16] Wang K, Fussell D, Lin C. Fast Fine-Grained Global Synchronization on GPUs. In: The Twenty-Fourth Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’19); Association for Computing Machinery, New York, NY, USA; 2019. pp. 793–806.
  • [17] Vasudevan N, Namjoshi K, Edwards S. Simple and fast biased locks. In: The 19th international conference on Parallel architectures and compilation techniques (PACT ’10); Association for Computing Machinery, New York, NY, USA; 2010. pp. 65–74.
  • [18] Dice D, Moir M, William S. Quickly Reacquirable Locks. 2010.
  • [19] Alsop J, Orr M, Beckmann B, Wood D. Lazy release consistency for GPUs. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO); 2016. pp. 1-14.
  • [20] Yilmazer-Metin A. sRSP: An efficient and scalable implementation of remote scope promotion. Concurrency Com- putat Pract Exper. 2022; 34 (9): e6483.
  • [21] Gaster B, Hower D, Howes L. HRF-Relaxed: Adapting HRF to the Complexities of Industrial Heterogeneous Memory Models. ACM Trans. Archit. Code Optim. 2015; 12 (1): 1–26.
  • [22] Hechtman B, Che S, Hower D, Tian Y, Beckmann B et al. QuickRelease: A throughput-oriented approach to release consistency on GPUs. In: IEEE 20th International Symposium on High Performance Computer Architecture (HPCA); 2014. pp. 189-200.
  • [23] Binkert N, Beckmann B, Black G, Reinhardt S, Saidi A et al. The gem5 simulator. SIGARCH Comput. Archit. News 2011; 39 (2): 1–7.
  • [24] Che S, Beckmann B, Reinhardt S, Skadron K. Pannotia: Understanding irregular GPGPU graph applications. In: IEEE International Symposium on Workload Characterization (IISWC); 2013. pp. 185-195.
  • [25] Cederman D, Tsigas P. Dynamic load balancing using work-stealing. Hwu Wen-mei W., GPU Computing Gems Jade Edition. In: Applications of GPU Computing Series Burlington, MA; Morgan Kaufmann; 2012. pp. 485-499.
APA Yilmazer A (2022). Using a static naming approach to implement remote scope promotion. , 1758 - 1772. 10.55730/1300-0632.3903
Chicago Yilmazer Ayse Using a static naming approach to implement remote scope promotion. (2022): 1758 - 1772. 10.55730/1300-0632.3903
MLA Yilmazer Ayse Using a static naming approach to implement remote scope promotion. , 2022, ss.1758 - 1772. 10.55730/1300-0632.3903
AMA Yilmazer A Using a static naming approach to implement remote scope promotion. . 2022; 1758 - 1772. 10.55730/1300-0632.3903
Vancouver Yilmazer A Using a static naming approach to implement remote scope promotion. . 2022; 1758 - 1772. 10.55730/1300-0632.3903
IEEE Yilmazer A "Using a static naming approach to implement remote scope promotion." , ss.1758 - 1772, 2022. 10.55730/1300-0632.3903
ISNAD Yilmazer, Ayse. "Using a static naming approach to implement remote scope promotion". (2022), 1758-1772. https://doi.org/10.55730/1300-0632.3903
APA Yilmazer A (2022). Using a static naming approach to implement remote scope promotion. Turkish Journal of Electrical Engineering and Computer Sciences, 30(5), 1758 - 1772. 10.55730/1300-0632.3903
Chicago Yilmazer Ayse Using a static naming approach to implement remote scope promotion. Turkish Journal of Electrical Engineering and Computer Sciences 30, no.5 (2022): 1758 - 1772. 10.55730/1300-0632.3903
MLA Yilmazer Ayse Using a static naming approach to implement remote scope promotion. Turkish Journal of Electrical Engineering and Computer Sciences, vol.30, no.5, 2022, ss.1758 - 1772. 10.55730/1300-0632.3903
AMA Yilmazer A Using a static naming approach to implement remote scope promotion. Turkish Journal of Electrical Engineering and Computer Sciences. 2022; 30(5): 1758 - 1772. 10.55730/1300-0632.3903
Vancouver Yilmazer A Using a static naming approach to implement remote scope promotion. Turkish Journal of Electrical Engineering and Computer Sciences. 2022; 30(5): 1758 - 1772. 10.55730/1300-0632.3903
IEEE Yilmazer A "Using a static naming approach to implement remote scope promotion." Turkish Journal of Electrical Engineering and Computer Sciences, 30, ss.1758 - 1772, 2022. 10.55730/1300-0632.3903
ISNAD Yilmazer, Ayse. "Using a static naming approach to implement remote scope promotion". Turkish Journal of Electrical Engineering and Computer Sciences 30/5 (2022), 1758-1772. https://doi.org/10.55730/1300-0632.3903