GalliumCompute

Accelerated OpenCL using Gallium3D

Summary

The goal of the project is to address the architectural issues standing on the way of an open accelerated implementation of OpenCL, placing as much burden as possible on the common Gallium3D infrastructure, and, at the same time, to get most of the relevant low-level changes done in the Nouveau driver. The project was carried out by Francisco Jerez as part of the EVoC program, building upon the previous work done in the same direction by Zack Rusin and Denis Steckelmacher during the previous months and years.

Proposed schedule

Driver-specific changes in the Nouveau driver (17 Oct 2011 - 21 Nov 2011)

At the end of this period the Nouveau driver will be able to run compute grids of arbitrary machine code reliably on the hardware in question.

  1. Implement compute kernel set-up and execution on the nv50 platform, using the "compute" object of the graphics engine. I can only test it on cards with the "0x85c0" variant of this object (nvA3 and up), but I don't rule out adding support for earlier or newer generations if I find volunteers willing to send traces and test patches on different hardware.
  2. Extend the nv50 compiler to support the peculiarities of compute programs, including the defined TGSI language extensions.
  3. Extend the Nouveau DRM interface to sidestep the memory addressing limitations of compute shaders on that hardware generation.

Extend the TGSI IR and the Gallium API for GPGPU (21 Nov 2011 - 26 Dec 2011)

At the end of this period all the mentioned gallium API and TGSI language changes will be ready and the code will be usable as a sort of non-standard computing library.

  1. Entry points for binding a TGSI compute shader and executing the contained compute kernels.
  2. Entry points for defining the input parameters of a compute program, grid domain and thread group layout, required heap size for the various address spaces.
  3. Entry points for the mapping of arbitrary buffer objects into the CS addressable global memory space.
  4. Extend the TGSI language with instructions for random memory access within the usual address spaces present in compute programs, and make provisions for texture write-back.
  5. Expose some compute-specific execution parameters, like grid layout and thread coordinates, to the compute program using TGSI system values.
  6. Extend the TGSI language with cross-thread synchronization primitives, barriers and atomic operations.
  7. Entry points for assigning buffer objects, textures and samplers to a set of binding points defined by the shader - not necessarily the compute shader, in the spirit of Direct3D 11.
  8. The main design principles would be those of the OpenCL API, but the idiosyncrasies of other similar APIs like DirectCompute, AMD FireStream and CUDA would be taken into account (e.g. the buffer management peculiarities of the latter make it difficult to write a clean implementation of it in terms of OpenCL, this problem should be addressed in the proposed API).

Reshape the clover library into a Gallium state tracker (26 Dec 2011 - 23 Jan 2012)

At the end of this period all the mentioned OpenCL APIs will be functional (modulo bugs and lack of documentation to be addressed in the next point), assuming the library is provided with TGSI bytecode as input instead of C source code.

  1. Implement context, queue, buffer, texture and sampler management on top of Gallium3D.
  2. Implement accelerated memory and image transfer operations on top of Gallium3D.
  3. Implement compute kernel execution and parameter passing via Gallium3D.
  4. Implement OpenCL events in terms of Gallium fences, describing any synchronization requirements that are needed on the pipe driver for multi-context and multi-device synchronization to work correctly.
  5. Implement enumeration of and binding to the available physical hardware devices.

Misc. changes, final clean-up and documentation (23 Jan 2012 - 6 Feb 2012)

Work done so far

The device-independent part of this project (i.e. the OpenCL state tracker and remaining Gallium support changes) has already been included in mesa master. Most of the driver-specific work done until now can be found in a git repository: https://github.com/curro/mesa/commits/master

Driver-specific changes in the Nouveau driver (17 Oct 2011 - 21 Nov 2011)

Extend the TGSI intermediate representation and the Gallium API to allow for general-purpose computing (21 Nov 2011 - 26 Dec 2011)

Reshape the clover library into a Gallium state tracker (26 Dec 2011 - 23 Jan 2012)