UM E-Theses Collection (澳門大學電子學位論文庫)

check Full Text

Modeling the performance of many-core programs on GPUs with advanced features

English Abstract

Many-core graphics processing units (GPUs) as accelerators are increasingly used in scientific and parallel computing, but the performance optimization of general-purpose GPU computing is still a big challenge for developers. This paper provides a new analytical performance model in order to simulate all the main features of advanced GPU micro-architectures and evaluate performance of the features for specific GPGPU programs. The proposed model can be used to estimate the overall execution time and find the potential bottlenecks of GPGPU applications. Developers can analyze the major GPU features because the latency of every GPU component can be calculated using the tool which based on the proposed model. Two latency relationships are provided in order to integrate the overlapping component latencies. The latencies of global memory, shared memory and cache accesses also are analyzed in the proposed model. Two approaches are provided to calculate the number of global memory transactions, one (pattern method) is very simple and efficient, and the other (traversal method) can be used in all cases. Moreover, two corresponding approaches also are designed to estimate the number of the shared memory accesses (bank conflicts). In addition, a configurable trace-driven simulation is proposed for GPU caches so as to estimate the cache miss ratios and the latency

Issue date



Pei, Mo Mo


Faculty of Science and Technology




Software Engineering -- Department of Computer and Information Science

Rendering (Computer graphics)

Graphics processing units -- Programming

Computer graphics.

Real-time data processing

Image processing -- Digital techniques


Xu, Qi Wen

Files In This Item

Full-text (Intranet)

Full-text (internet)

1/F Zone C
Library URL