Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware... (read more)
PDF