PciReworkProposal

PCI Subsystem Rework for Xorg v.next Proposal

This wiki entry details a proposal for reworking the PCI handling code in Xorg for the 7.1 release.

Background

It is fairly common knowledge in the X.org developer community that the PCI handling code in Xorg 7.0 (and earlier) is a big, ugly mess. The code is complex, understood by very few developers, and does things that only the kernel should do. In fact, most of the existing code originates from a time before most kernels implemented the required functionality. Since that time, kernels have greatly expanded the functionality provided to user space for probing and accessing PCI devices.

The PCI bus has also changed (e.g., multiple PCI domains, complex PCI-PCI bridges, AGP, and PCI-Express). This has required that the code to support probing and accessing PCI devices has also needed to change. Unfortuantely these changes tend to be platform specific. Certain features, such as multiple domains, are only supported on certain platforms by X.org. These features tend to be supported universally by the kernels on those platforms.

Rather than duplicating the efforts of kernel developers, X.org needs to use the interfaces provided by the kernel as much as possible. It is currently unclear as to whether X.org still needs to support any platforms with kernels that do not export this functionality to user mode.

Required Functionalty

There are seven broad pieces of functionality that X.org needs to work with devices on the PCI bus. These are:

  1. Get a list of all devices matching some criteria. This criteria is typically either the device class (i.e., find all devices of the class "display" or "multimedia") or the device vendor.
  2. Reading the device's expansion ROM.
  3. Accessing the device's IO ports.
  4. Accessing the device's memory regions (BARs).
  5. Reading the device's capabilities (i.e., determine if the device is AGP).
  6. Power management.
  7. VgaArbiter. Ultimately this should be in the kernel, but it isn't currently. Devices thatdecode legacy VGA IO and MEM need to be identified. In addition, their IO/MEM enable/disable bits need to be toggled (along with the VGA forwarding enables on any bridges on the path to the devices). This is needed to prevent multiple devices from decoding legacy access. Drivers need to disable legacy decoding completely on their hardware, which most modern cards can do. The driver must be able to inform the arbiter of that fact to take out the card from the picture. This must be done carefuly since bad things will happen if the card generates an interrupt when the arbiter has disabled MEM decoding on the card. The arbiter needs to either forbid cards to use interrupts if they are set to decode legacy space (and thus can be disabled at any time) or have a driver callback for disabling IRQ emission on a given card when it's being disabled by the arbiter. IO and MEM can't be treated separately since there is only one VGA forward bit on PCI-to-PCI bridges.

In the best case scenario, nearly all of this functionality is trivially provided by Linux's sysfs interface. A certain amount of this functionality is also provided by the libpci.a library in the pciutils package. The missing functionality and license issues (libpci.a is GPL) prevent use of libpci.a from being a viable option.

Proposed Implementation

The current proposal is to implement a new library that implements the require functionality in a generic way. The existing code for accessing PCI devices would then be removed, in its entirety, from the X-server and the drivers. At this time the X-server and drivers would be ported to the new interface. The remainder of this section describes the interface to the proposed new library.

The interface consists of a set of initialization / cleanup routines and a single primary data type. The overall structures is intentionally similar to that of libpci.a, but there are some significant differences. The initialization / cleanup routines are roughly analogous to pci_access. pci_device is roughly analogous to pci_dev.

Initialization / Cleanup

Access to the PCI system is obtained by calling pci_system_init. This function returns a either zero on success or an errno value on failure. It initializes global data that is private to the library.

When access to the PCI system is no longer needed pci_system_cleanup is called. This destroys all of the internal data used by the library and all of the structures created by the library for the application. That is, all pci_device, pci_device_iterator, pci_agp_info, etc. are destroyed by calling pci_system_cleanup.

Device Iteration

Lists of PCI devices are obtained by creating a pci_device_iterator structure. This structure is created by calling either pci_slot_match_iterator_create or pci_id_match_iterator_create.

struct pci_slot_match {
    /*
     * Device slot matching controls
     *
     * Control the search based on the domain, bus, slot, and function of
     * the device.  Setting any of these fields to PCI_MATCH_ANY will cause
     * the field to not be used in the comparison.
     */
    uint32_t    domain;
    uint32_t    bus;
    uint32_t    dev;
    uint32_t    func;

    intptr_t    match_data;
};

struct pci_device_iterator *pci_slot_match_iterator_create(const struct pci_slot_match *match);

struct pci_id_match {
    /*
     * Device / vendor matching controls
     *
     * Control the search based on the device, vendor, subdevice, or subvendor
     * IDs.  Setting any of these fields to PCI_MATCH_ANY will cause the
     * field to not be used in the comparison.
     */
    uint32_t    vendor_id;
    uint32_t    device_id;
    uint32_t    subvendor_id;
    uint32_t    subdevice_id;

    /*
     * Device class matching controls
     */
    uint32_t    device_class;
    uint32_t    device_class_mask;

    intptr_t    match_data;
};

struct pci_device_iterator *pci_id_match_iterator_create(const struct pci_id_match *match);

This allows devices to be iterated either by bus location, by vendor, by class, or by function. These interfaces roughly match similar interfaces available within the Linux kernel.

If the match parameter to either function is NULL, all devices will be matched.

Devices are iterated by calling pci_device_next with the pci_device_iterator. After the last device has been returned, the next call to pci_device_next will return NULL.

struct pci_device *pci_device_next(struct pci_device_iterator *iter);

When an iterator will not be used any further, it must be destroyed using pci_iterator_destroy.

void pci_iterator_destroy(struct pci_device_iterator *iter);

pci_device

The pci_device structure contains all of the expected fields and is very similar to libpci.a's pci_dev structure. Some fields that are important to X (e.g., subvendor_id) have been added, and some fields that are unnecessary (e.g., rom_base_addr) have been removed.

struct pci_mem_region {
    void * memory;
    pciaddr_t bus_addr;
    pciaddr_t base_addr;
    pciaddr_t size;
};

struct pci_device {
    uint16_t    domain;
    uint8_t     bus;
    uint8_t     dev;
    uint8_t     func;

    uint16_t    vendor_id;
    uint16_t    device_id;
    uint16_t    subvendor_id;
    uint16_t    subdevice_id;

    uint32_t    device_class;

    struct pci_mem_region regions[6];

    pciaddr_t   rom_size;

    int irq;

    void * user_data;
};

Once a pointer to a device has been obtained, memory regions of the device can be mapped via pci_device_map_region. Mapped regions can be unmapped with pci_device_unmap_region. Once a region is mapped, it can be accessed via the pci_mem_region::memory pointer.

int pci_device_map_region( struct pci_device * dev, unsigned region,
    int write_enable );

int pci_device_unmap_region( struct pci_device * dev, unsigned region );

ISSUE: Should special routines for reading / writing MMIO regions ala xf86WriteMmio8 be added?

The device's expansion ROM is treated specially. Rather than mapping the ROM and reading it, a special function, pci_device_read_rom, is provided. The supplied buffer must be at least pci_device::rom_size bytes.

int pci_device_read_rom( struct pci_device * dev, void * buffer );

Device configuration and capability data can be accessed via traditional, libpci.a style read and write routines.

int pci_device_cfg_read_u8   ( struct pci_device * dev, unsigned offset, uint8_t * val );
int pci_device_cfg_read_u16  ( struct pci_device * dev, unsigned offset, uint16_t * val );
int pci_device_cfg_read_u32  ( struct pci_device * dev, unsigned offset, uint32_t * val );
int pci_device_cfg_read_block( struct pci_device * dev, unsigned offset, void * val, unsigned length );
int pci_device_cfg_write_u8   ( struct pci_device * dev, unsigned offset, uint8_t val );
int pci_device_cfg_write_u16  ( struct pci_device * dev, unsigned offset, uint16_t val );
int pci_device_cfg_write_u32  ( struct pci_device * dev, unsigned offset, uint32_t val );
int pci_device_cfg_write_block( struct pci_device * dev, unsigned offset, const void * val,
    unsigned length );

In addition, specific routines and data types exist for common capabilities that are important to X. The pci_device_get_agp_info function parses the device's configuration header and returns a fully popluated pci_agp_info structure. If the device does not have an AGP capability entry, NULL is returned.

struct pci_agp_info {
    unsigned    config_offset;

    uint8_t     major_version;
    uint8_t     minor_version;

    uint8_t    rates;

    uint8_t    fast_writes:1;
    uint8_t    addr64:1;
    uint8_t    htrans:1;
    uint8_t    gart64:1;
    uint8_t    coherent:1;
    uint8_t    sideband:1;
    uint8_t    isochronus:1;

    uint8_t    async_req_size;
    uint8_t    calibration_cycle_timing;
    uint8_t    max_requests;
};

const struct pci_agp_info * pci_device_get_agp_info( struct pci_device * dev );

In the future, similar routines may be added for other common device capabilities (e.g., power management, PCI-Express, etc.).

Status

Core X-server Status

The core X-server portion of the PCI-rework is, essentially, finished. It lives in the pci-rework branch.

libpciaccess Status

OS Status Point of Contact
Linux Working with sysfs idr
FreeBSD Working (7.x+) anholt
NetBSD ? -
OpenBSD Working herrb
Solaris Working edward.shu@sun.com
AIX ? -

Driver Status

Driver Status Point of Contact
apm Not ported -
ark Not ported -
ast Not ported -
ati/ati Bugzilla fufutos
ati/atimisc Bugzilla fufutos
ati/r128 Not ported -
ati/radeon Not ported -
chips Not ported -
cirrus Not ported -
cyrix Not ported -
dummy Not ported -
fbdev Trunk idr
glide Not ported -
glint Not ported -
i128 Not ported -
i740 Not ported -
impact Not ported -
imstt Not ported -
intel Not ported -
mga Trunk idr
neomagic Not ported -
newport Not ported -
nsc Not ported -
nv Not ported -
rendition Trunk idr
s3 Not ported -
s3virge Not ported -
savage pci-rework branch idr
siliconmotion Not ported -
sis Not ported -
sisusb Not ported -
sunbw2 Not ported -
suncg14 Not ported -
suncg3 Not ported -
suncg6 Not ported -
sunffb Not ported -
sunleo Not ported -
suntcx Not ported -
tdfx In progress idr
tga Not ported -
trident Not ported -
tseng Not ported -
v4l Not ported -
vesa Trunk idr
vga Not ported -
via Not ported -
vmware Not ported -
voodoo Not ported -
wsfb Not ported -
xgi Not ported -
xgixp Not ported -

Status key: