Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88 www.WebConneXXion.com | It s all about RAID
Warning: array_key_exists() [function.array-key-exists]: The second argument should be either an array or an object in /var/www/vhosts/webconnexxion.com/httpdocs/raid/includes/functions/sessions.php on line 88 My Account | Cart Contents | Checkout
With the launch of Intel's 900-series chipsets and the recent return of SLI to
the video card scene, PCI Express has finally arrived on the PC enthusiast scene
in a big way. PCI Express-enabled motherboards are going to start becoming more
and more common, and with the new bus's increasing ubiquity will come the
inevitable confusion that accompanies the rise of any new technology, especially
one as complex and feature-rich as PCI Express. In this article, we'll take a
detailed look at the features of PCI Express ? what it is, what it isn't, and
how it improves on the venerable interconnect scheme that we've all come to know
and curse: PCI.
Basic PC system
architecture
No doubt most Ars readers are
familiar with the basic layout of a PC system, but it's worthwhile to do a brief
recap in order to set the stage for the discussion that follows.
Logically, an average PCI system
is laid out in something like the following manner:
Figure 1: PCI system layout
The core logic chipset acts as a
switch or router, and routes I/O traffic among the different devices that make
up the system.
In reality, the core logic
chipset is split into two parts: the northbridge and the southbridge (or I/O
bridge). This split is there for a couple of reasons, the most important of
which is the fact that there are three types of devices that naturally work very
closely together, and so they need to have faster access to each other: the CPU,
the main memory, and the video card. In a modern system, the video card's GPU is
functionally a second (or third) CPU, so it needs to share privileged access to
main memory with the CPU(s). As a result, these three devices are all clustered
together off of the northbridge.
The northbridge is tied to a
secondary bridge, the southbridge, which routes traffic from the different I/O
devices on the system: the hard drives, USB ports, Ethernet ports, etc. The
traffic from these devices is routed through the southbridge to the northbridge
and then on to the CPU and/or memory.
Figure 2: northbridge and southbridge
As is evident from the diagram
above, the PCI bus is attached to the southbridge. This bus is usually the
oldest, slowest bus in a modern system, and is the one most in need of an
upgrade.
For now, the main thing
that you should take away from the previous diagram is that the modern PC is a
motley collection of specialized buses of different protocols and bandwidth
capabilities. This mix of specialized buses designed to attach different types
of hardware directly to the southbridge is something of a continuously evolving
hack that has been gradually and collectively engineered by the PC industry as
it tries to get around the limitations of the aging PCI bus. Because the PCI bus
can't really cut it for things like Serial ATA, Firewire, etc., the trend has
been to attach interfaces for both internal and external I/O directly to the
southbridge. So today's southbridge is sort of the Swiss Army Knife of I/O
switches, and thanks to
Moore's Curves it has
been able to keep adding functionality in the form of new interfaces that keep
bandwidth-hungry devices from starving on the PCI bus.
In an ideal world, there would be
one primary type of bus and one bus protocol that connects all of these
different I/O devices ? including the video card/GPU ? to the CPU and main
memory. Of course, this "one bus to rule them all" ideal is never, ever going to
happen in the real world. It won't happen with PCI Express, and it won't happen
with Infiniband (although it technically could happen with Infiniband if
we threw away all of today's PC hardware and started over from scratch with a
round of natively Infiniband-compliant devices).
Still, even though the utopian
ideal of one bus and one bus protocol for every device will never be achieved,
there has to be way bring some order to the chaos. Luckily for us, that way has
finally arrived in the form of PCI Express (a.k.a. PCIe).
With Intel's recent launch of its
900-series chipsets and NVIDIA and ATI's announcements of PCI Express-compatible
cards, PCIe will shortly begin cropping up in consumer systems. This article
will give you the lowdown on what you can expect from the bus technology that
will dominate the personal computer for the coming decade.
Note: A few of the more
server-specific features of PCI Express are not covered in this article. These
include hot plugging and hot swapping, as well as reliability-oriented features
like packet retries and such.
A
primer on PCI
Before I go into detail on PCIe,
it helps to understand how PCI works and what its limitations are.
The PCI bus debuted over a decade
ago at 33MHz, with a 32-bit bus and a peak theoretical bandwidth of 132MB/s.
This was pretty good for the time, but as the rest of the system got more
bandwidth hungry both the bus speed and bus width were cranked up in a effort
keep pace. Later flavors of PCI included a 64-bit, 33MHz bus combination with a
peak bandwidth of 264MB/s; a more recent 64-bit, 66MHz combination with a
bandwidth of 512MB/s.
PCI uses a shared bus topology
to allow for communication among the different devices on the bus; the different
PCI devices (i.e., a network card, a sound card, a RAID card, etc.) are all
attached to the same bus, which they use to communicate with the CPU. Take a
look at the following diagram to get a feel for what a shared bus looks like.
PCI uses a shared bus topology to allow for communication among the
different devices on the bus; the different PCI devices (i.e., a network card, a
sound card, a RAID card, etc.) are all attached to the same bus, which they use
to communicate with the CPU. Take a look at the following diagram to get a feel
for what a shared bus looks like.
Figure 3: the shared bus
Because all of the devices
attached to the bus must share it among themselves, there has to be some kind of
bus arbitration scheme in place for deciding who gets access to the bus
and when, especially in situations where multiple devices need to use the bus at
the same time. Once a device has control of the bus, it becomes the bus
master, which means that it can use the PCI bus to talk to the CPU or memory
via the chipset's southbridge.
Speaking of the southbridge, the
large system diagram that I presented on the first page ? the one with the PCI
devices attached to the southbridge ? represents how things are actually
configured in the real world, as opposed to the idealized representation given
immediately above. The southbridge, the northbridge, and the CPU all combine to
fill the host or root role, which we'll discuss in a bit more
detail momentarily. For now, it will suffice to note that the root runs the show
? it detects and initializes the PCI devices, and it controls the PCI bus by
default. Or another way to put it would be to say that the purpose of the PCI
bus is to connect I/O devices to the root, so that the root can read from them
and write to them, and just generally use them to talk either to storage devices
or to the outside world.
The shared bus topology's main
advantages are that it's simple, cheap, and easy to implement ? or at least,
that's the case as long as you're not trying to do anything too fancy with it.
Once you start demanding more performance and functionality from a shared bus,
then you run into its limitations. Let's take a look at some of those
limitations, in order to motivate our discussion of PCI Express's improvements.
From the CPU's perspective, PCI
devices are accessible via a fairly straightforward load-store mechanism.
There's flat, unified chunk of address space dedicated for PCI use, which looks
to the CPU much like a flat chunk of main memory address space, the primary
difference being that at each range of addresses there sits a PCI device instead
of a group of memory cells containing code or data.
Figure 4: memory space
So in the same way that the CPU
access memory by performing loads and stores to specific addresses, it accesses
PCI devices by performing reads and writes to specific addresses.
When a PCI-enabled computer boots
up, it must initialize the PCI subsystem by assigning chunks of the PCI address
space to the different devices so that they'll be accessible to the CPU. Once
the devices are initialized and know which parts of the address space that they
"own," they start listening to the bus for any commands and data that might be
directed their way. Once an individual PCI device "hears" an address that it
owns being placed on the bus, then it reads any data following behind that
address.
This scheme works fine when there
are only a few devices attached to the bus, listening to it for addresses and
data. But the nature of a bus is that any device that's attached to it and is "listening"
to it injects a certain amount of noise onto the bus. Thus the more devices that
listen to the bus ? and thereby place an electrical load on the bus ? the more
noise there is on the bus and the harder it becomes to get a clean signal
through.
Sharing the bus
In this respect, the shared bus
is kind of like the following slightly loopy scenario: Imagine an office
building in which there is only one phone line that everyone shares. People work
all day in their cubicles with their phones off the hook and their hands-free
speakerphones turned on, listening for the front-office secretary to call out
their name, "Mr. Smith, Ms. Jones is here at my desk and wants to talk to you,
so I'm going to put her on. Now pay attention, because here she is..." With only
a few employees this lame scheme would be a pain but it would at least be
feasible. But in an office of hundreds, the amount of ambient background noise
pouring into each speakerphone would combine to make the entire line a noisy
mess, and it would be very hard to hear your name called out above the racket.
This load-related noise
phenomenon, along with clock skew issues, is the reason that PCI buses are
limited to five card-based devices at most. (If you solder PCI devices directly
onto the motherboard, the signal is cleaner so you can put a few more than five
on a single bus.)
What this means in real life is
that if you want to put more than five PCI devices on a system, then you must
use PCI-to-PCI bridge chips configured in the following manner:
Figure 5: PCI-to-PCI bridge chips
This hierarchical tree structure,
outlined above, is one of the features that distinguishes PCI from peer-to-peer
and point-to-point next-generation interconnects like HyperTransport and
Infiniband. The root at the top of the diagram is the master controller which is
responsible for initializing and configuring all of the PCI devices in the
system at boot-up. This makes every PCI device a slave device, with one master
controlling them. And because the master must enumerate all of the devices and
configure the entire system at boot time, there can be no hot-plugging or
hot-swapping.
Excursus: organizing bus traffic
Generally speaking, there are two
pairs of categories into which all bus traffic can be placed. The first pair of
categories is address traffic and data traffic. The data is the
information that you're using the bus to send or receive from a device that's
attached to it, and address is location of the particular device (or the region
within a particular device) where the information is being sent. So any bus
which supports multiple devices will need a way of handling both address traffic
and data traffic, and of distinguishing between the two.
The second pair of categories,
which overlaps the first pair, is command traffic and read/write
traffic. A command consists of a chunk of data containing some type of
configuration or control information (= a specific type of data) which is sent
to a particular device (= a particular address) on the bus. So command traffic
includes both address and data traffic. Examples of command traffic are
initialization instructions for a device, a device reset signal, a configuration
command that causes the device to switch operating modes, etc. Command traffic
allows the CPU to control how the PCI device handles the data that flows in and
out of it.
Read/write traffic is the most
important type of traffic, because it consists of the actual information that is
being sent to the device. For instance, a PCI RAID controller uses read and
write traffic to send and receive the actual files which it reads from and
writes to its attached hard disks, a PCI sound card uses read/write traffic to
get the sound data that it puts out through its speaker jack, and so on. Like
command traffic, read/write traffic consists of addresses coupled with data, and
so accounts for part of both of these types of traffic.
Different buses and bus protocols
have different ways of handling these four overlapping types of traffic. For
instance, many common bus types actually consist of two separate buses: an
address bus and a data bus. Addresses are placed on the address bus and data is
placed on the data bus, with the result that data is able to flow quickly
between devices because each type of traffic has its own dedicated bus.
The alternative to this would be
to "multiplex" address and data onto the same bus. This involves first placing
the address on the bus, and then following it with the data that is to be sent
to that address. PCI takes this approach, with a single 32-bit bus on which
addresses and data are multiplexed. In fact, remember the office phone line
analogy? "Mr. Smith, a Ms. Jones is here at my desk and wants to talk to you, so
I'm going to put her on. Now pay attention, because here she is..." The "Mr.
Smith" in this sentence would be the address, and Mrs. Jones' speech to Mr.
Smith would be the data.
Obviously multiplexing is a
little less bandwidth-efficient than having two dedicated buses, because address
traffic takes up precious bandwidth that could be put to better use carrying bus
traffic. But multiplexed buses are a lot cheaper than shared buses, because half
the number of bus lines are needed, and the devices on the bus need half the
number of pins.
The other popular way of handling
bus traffic is to split it into control traffic and read/write traffic and give
each its own bus. To return to our office analogy, this would be like installing
a separate line for management to use to talk to employees.
PCI and MSI
Later versions of the PCI
specification opt in part for the last method of organizing bus traffic outlined
above, and have what is called a "side-band bus" for transmitting some types of
command traffic. The side-band bus is a smaller bus consisting of a few lines
dedicated to the transmission of control and configuration information. Of
course, this side-band bus increases pin count, power draw, cost, etc., so it's
not the most optimal solution.
Even more recent versions of the
PCI spec dictate a method for using standard read and write operations to pass
one type of command and control traffic to PCI devices. This method, called
Message Signal Interrupt (MSI), sets aside a special message space in the PCI
flat memory space for passing a certain type of control message called an
interrupt. This message space is kind of like a bulletin board, onto which the
CPU writes interrupt messages which the device then reads. As we'll see below,
PCI Express expand the MSI spec to include not just interrupts but all side-band
control signals. But we're getting ahead of ourselves...
Summary of PCI's shortcomings
To summarize, PCI as it exists
today has some serious shortcomings that prevent it from providing the bandwidth
and features needed by current and future generations of I/O and storage devices.
Specifically, its highly parallel shared-bus architecture holds it back by
limiting its bus speed and scalability, and its simple, load-store, flat
memory-based communications model is less robust and extensible than a routed,
packet-based model.
PCI-X: wider and faster, but
still outdated
The PCI-X spec was an attempt to
update PCI as painlessly as possible and allow it to hobble along for a few more
years. This being the case, the spec doesn't really fix any of the inherent
problems outlined above. In fact, it actually makes some of the problems worse.
The PCI-X spec essentially
doubled the bus width from 32 bits to 64 bits, thereby increasing PCI's parallel
data transmission abilities and enlarging its address space. The spec also ups
PCI's basic clock rate to 66MHz with a 133MHz variety on the high end, providing
yet another boost to PCI's bandwidth and bringing it up to 1GB/s (at 133MHz).
The latest version of the
PCI-X spec (PCI-X 266) also double-pumps the bus, so that data is transmitted on
the rising and falling edges of the clock. While this improves PCI-X's peak
theoretical bandwidth, its real-world sustained bandwidth gains are more modest.
(See
this article for more on
the relationship between peak theoretical bandwidth and real-world bandwidth.)
While both of these moves
significantly increased PCI's bandwidth and its usefulness, they also made it
more expensive to implement. The faster a bus runs, the sensitive it becomes to
noise; manufacturing standards for high-speed buses are exceptionally strict for
this very reason; shoddy materials and/or wide margins of error translate
directly into noise at higher clock speeds. This means that the higher-speed
PCI-X bus is more expensive to make.
The higher clock speed isn't the
only thing that increases PCI-X's noise problems and manufacturing costs. The
other factor is the increased bus width. Because the bus is wider and consists
of more wires, there's more noise in the form of crosstalk. Furthermore, all of
those new wires are connected at their endpoints to multiple PCI devices, which
means an even larger load on the bus and thus more noise injected into the bus
by attached devices. And then there's the fact that the PCI devices themselves
need 32 extra pins, which increases the manufacturing cost of each individual
device and of the connectors on the motherboard.
All of these factors, when taken
together with the increased clock rate, combine to make the PCI-X a more
expensive proposition than PCI, which keeps it out of mainstream PCs. And it
should also be noted that most of the problems with increasing bus parallelism
and double-pumping the bus also plague recent forms of DDR, and especially the
DDR-II spec.
And after all of that pain, you
still have to deal with PCI's shared-bus topology and all of its attendant ills.
Fortunately, there's a better way.
PCI Express: the next generation
PCI Express (PCIe) is the
newest name for the technology formerly known as 3GIO. Though the PCIe
specification was finalized in 2002, PCIe-based devices have just now
started to debut on the
market.
PCIe's most drastic and obvious
improvement over PCI is its point-to-point bus topology. Take a look at the
following diagram, and compare it to the layout of the PCI bus.
Figure 6: shared switch
Figure 3: the shared bus
In a point-to-point bus topology,
a shared switch replaces the shared bus as the single shared resource by means
of which all of the devices communicate. Unlike in a shared bus topology, where
the devices must collectively arbitrate among themselves for use of the bus,
each device in the system has direct and exclusive access to the switch. In
other words, each device sits on its own dedicated bus, which in PCIe lingo is
called a link.
Like a router in a network or a
telephone switchbox, the switch routes bus traffic and establishes
point-to-point connections between any two communicating devices on a system. To
return to our office analogy from the previous section, each employee has his or
her own private line to the front desk; so instead of shouting over a shared
line to get a particular employee's attention, the front desk secretary uses a
switchboard to connect employees directly to incoming callers and to each other.
In the point-to-point diagram
above, the CPU at the top can talk to any of the PCIe devices by "dialing" that
device's address and opening up a direct and private communications link, via
the switch, with it. Of course, as with a modern telephone call, or even better,
an internet connection between a browser and a website, the two communicating
parties only think they're talking to each other via a private, direct,
continuous link; in reality, though, the communications stream is broken up into
discrete packets of data, which the switch routes ? like a postal worker
delivering addressed envelopes ? back and forth between the two parties.
Enabling Quality of Service
The overall effect of the switched fabric topology is that it allows the "smarts"
needed to manage and route traffic to be centralized in one single chip ? the
switch. With a shared bus, the devices on the bus must use an arbitration scheme
to decide among themselves how to distribute a shared resource (i.e., the bus).
With a switched fabric, the switch makes all the resource-sharing decisions.
By centralizing the
traffic-routing and resource-management functions in a single unit, PCIe also
enables another important and long overdue next-generation function: quality of
service (QoS). PCIe's switch can prioritize packets, so that real-time streaming
packets (i.e., a video stream or an audio stream) can take priority over packets
that aren't as time critical. This should mean fewer dropped frames in your
first-person shooter and lower audio latency in your digital recording software.
Backwards compatibility
Now, you've probably heard that
PCIe is backwards-compatible with PCI, and that operating systems can boot on
and use a PCIe-based system without modification. So you're no doubt wondering
how PCI's load-store model, described previously, can be compatible with the
switched packet-based model outlined here. The answer is more straightforward
than you might think.
PCI and PCI Express, like
many computer systems designed to transmit data, implement a part of the
OSI network stack. This
article is not the place for a detailed breakdown of a network stack, but the
basic idea behind it is easy enough to grasp.
PCI implements the first four
layers of the OSI stack, which specify the physical aspects of transmission (i.e.,
the wire-level signals) up through the higher-level load-store interface that
software uses to send and receive via PCI. PCI Express's designers have left
this load-store-based, flat memory model unchanged. So a legacy application that
wants to communicate via PCIe still executes a read from or a write to a
specific address. The next two stack levels down, however, take this read or
write request and convert it into a packet by appending routing and flow control
information, as well as CRC information, placing it in a frame, and then sending
it to its destination.
So the application still thinks
that it's reading to or writing from a memory address when it talks to a PCI
device, but behind the scenes there's a totally different network of protocols
and signals at work shuffling that read or write request to along to its
destination.
This brings us to back to the
topic of command and control signals. As I hinted at earlier, PCIe takes all PCI
side-band signals and converts them to MSI signals (which are load-store) so
that they can be encapsulated into packets and routed just like any other read/write
traffic. Of course, this means that all types of PCIe traffic ? whether command
or read/write, or address or data ? are transmitted over a single bus.
It's important to note at this
point that the two pairs of bus traffic types are logically divided under PCIe,
even if they're not physically separated onto different buses. The first two
types of traffic, address and data, are combined in the form of the packet. The
core of packet consists of an address combined with a chunk of data; so the
packet structure fuses these two types.
The packets themselves, though,
generally fall into the two other categories: command and read/write. In fact,
literature on a packet-based bus system like PCIe or RAMBUS will often talk of
command packets and data packets, the latter being the more common
name for what I'm calling read/write packets.
Traffic runs in lanes
When PCIe's designers started
thinking about a true next-generation upgrade for PCI, one of the issues that
they needed to tackle was pin count. In the section on PCI above, I covered some
of the problems with the kind of large-scale data parallelism that PCI exhibits
(i.e. noise, cost, poor frequency scaling, etc.). PCIe solves this problem by
taking a serial approach.
As I noted previously, a
connection between two a PCIe device and a PCIe switch is called a link. Each
link is composed of one or more lanes, and each lane is capable of
transmitting one byte at a time in both directions at once. This full-duplex
communication is possible because each lane is itself composed of one pair of
signals: send and receive.
Figure 7: Links and lanes
In order to transmit PCIe
packets, which are composed of multiple bytes, a one-lane link must break down
each packet into a series of bytes, and then transmit the bytes in rapid
succession. The device on the receiving end must collect all of the bytes and
then reassemble them into a complete packet. This disassembly and reassembly
happens must happen rapidly enough to where it's transparent to the next layer
up in the stack. This means that it requires some processing power on each end
of the link. The upside, though, is that because each lane is only one byte wide,
very few pins are needed to transmit the data. You might say that this serial
transmission scheme is a way of turning processing power into bandwidth; this is
in contrast to the old PCI parallel approach, which turns bus width (and hence
pin counts) into bandwidth. It so happens that thanks to Moore's Curves,
processing power is cheaper than bus width, hence PCIe's tradeoff makes a lot of
sense.
I stated earlier that a link can
be composed of "one or more lanes", so let me clarify that now. One of PCIe's
nicest features is the ability to aggregate multiple individual lanes together
to form a single link. In other words, two lanes could be coupled together to
form a single link capable of transmitting two bytes at a time, thus doubling
the link bandwidth. Likewise, you could combine four lanes, or eight lanes, and
so on.
A link that's composed of a
single lane is called an x1 link; a link composed of two lanes is called an x2
link; a link composed of four lanes is called an x4 link, etc. PCIe supports x1,
x2, x4, x8, x12, x16, and x32 link widths.
PCIe's bandwidth gains over PCI
are considerable. A single lane is capable of transmitting 2.5Gbps in each
direction, simultaneously. Add two lanes together to form an x2 link and you've
got 5 Gbps, and so on with each link width. These high transfer speeds are good,
good news, and will enable a new class of applications, like...