The S4810 is a 1U top-of-rack switch with multiple interface options. It has 48 SFP+ ports for 1G/10G Ethernet (we tested it with 48 10G Ethernet transceivers) and four QSFP+ ports for 40G uplinks. With 10GBase-SR transceivers, the switch drew 202 watts when idle and 219 watts with its data plane fully loaded.

The switch runs the Force10 Operating System (FTOS), which includes a command line interface (CLI) that's nearly a clone of Cisco's IOS. Experienced Cisco users will have no trouble configuring and managing this switch.

Although we tested the switch as a layer-2 data centre device, it also supports layer-3 features, including major IPv4 routing protocols and static routing of IPv6 traffic, via a $2,000 software upgrade.

Significantly, the switch does not yet support some key data centre protocols, according to a features questionnaire completed by Force10. These include the data center bridging extensions (DCBX), IEEE 802.1Qbb priority-based flow control (PFC), 802.1Qau congestion notification and 802.1Qaz traffic shaping. Force10 says these features are slated for third quarter 2011 release.

Unicast performance

We used the same methodology to test the S4810 as in our January 2010 comparison of 10G Ethernet top-of-rack switches. The only difference this time was that we used 48 instead of 24 ports in measuring layer-2 unicast and multicast performance.

The S4810 put up solid numbers when it comes to basic unicast traffic handling. It delivers line rate throughput, regardless of unicast frame size. Better still for delay-sensitive applications, the S4810 offers sub-microsecond average latency when configured in store and forward mode. This is one of the first store-and-forward switches we've tested to break the microsecond barrier.

We expected average latency to be lower still with the S4810 configured as a cut-through device, but that wasn't always the case. For frame sizes of 256 bytes and larger, cut-through latency was significantly higher than the equivalent test in store-and-forward mode. Further, cut-through latency increased with frame length.

Usually cut-through devices usually have two properties: They tend to be very fast (since they start forwarding a frame before it's fully received, unlike store-and-forward devices which wait until the entire frame is cached before switching it) and they have roughly the same average latency regardless of frame length.

With the S4810, these properties better described the store-and-forward results than cut-through ones.

Store and forward vs Cut through

This is partially explained by a characteristic of the Broadcom 56845 application-specific integrated circuit (ASIC) used in the S4810. According to Force10, the chip still acts in store-and-forward mode for frames shorter than 624 bytes, even when set for cut-through operation.

This could explain higher cut-through latency for medium-length frames (say, between 256 and 624 bytes) but it's still puzzling why cut-through latency would be higher for longer frames. The testing RFCs require different measurement methods for store-and-forward and cut-through latency, and we checked and rechecked results to verify we'd used the appropriate methods for each. Force10 and other labs also have confirmed this behavior.

Given the latency results, we'd recommend leaving the switch in its default store-and-forward mode. There's a performance advantage for doing so, and users get the extra benefit of error checking that store-and-forward operation provides.

MAC address capacity

Another anomaly appeared in tests of MAC address capacity, which determines how many devices can be attached to a switch. This metric is especially important for virtualisation and cloud computing, where virtual machine counts in a single broadcast domain can rise into the tens of thousands.

The S4810's data sheet states its MAC capacity as 128,000. In practice, we found the limit to be slightly lower, averaging 117,145 addresses depending on which set of pseudo-random addresses we used. The switch ASIC's hashing algorithm accounts for the difference. To save memory and speed lookup times, ASICs store a hash of each MAC address. With a particular set of addresses perfectly matched to a given hashing algorithm, no two hashes will ever overlap or "collide." In practice vendors cannot predict what addresses customers will use, so some collisions are inevitable.

What's more, the actual number of addresses the switch can learn in production is likely to be far lower than 117,000. Typically, address capacity tests are conducted using only three ports. When we configured the Spirent TestCenter traffic generator to offer a set of nearly 100,000 pseudo-random addresses across 48 ports, the switch learned only about 94,000 of these due to hash collisions. Through trial and error, we found that the switch would learn at most around 25,000 addresses without hash collisions when we distributed addresses across 48 ports.

To be sure, 25,000 addresses is still a huge number, more than enough for the vast majority of data centres. Then again, some heavy users of virtualization already are pushing above this figure. Further, we think data sheet numbers should give users meaningful guidance on the limits of switch performance, not theoretical best case estimates.