Sunday, August 15, 2021

Chameleon IO Controller - Adventures in Latency Testing

Chameleon IO Controller & Round-Trip Latency
One of my primary objectives when designing the Chameleon IO Controller was minimizing round-trip latency.  In mechanical games like pinball, lower latency improves game feel, by reducing the amount of time from pressing a cabinet button to a solenoid firing.  If latency is too great, you will notice a lag after pressing the button until the flipper fires, or after a ball hits a slingshot and the slingshot activates.

Unlike a typical pinball controller where the switch to fire a solenoid (i.e. a flipper) is directly connected to the solenoid, my design takes the button press and sends it over USB to the host PC, where my Chameleon Pinball Engine software processes the button press, and sends a command signal back over USB to turn on the solenoid.  

As the Chameleon IO Controller has both the gamepad buttons and LED outputs all on a single PCB, effectively the signal makes a round-trip from IO Controller to host PC and back to the IO Controller.

This design introduces latency at several distinct points.  There is the latency in the IO Controller to detect a button press and send the new button state info to the host PC.  There is latency in my pinball software to recognize and process the button press.  There is the latency in sending the new LED state back to the IO Controller.  And finally there is latency in the IO Controller to actually turn on the output.

My previous post detailed working on the Chameleon IO Controller's firmware to improve PWM performance, so of those four distinct points affected by latency, I've already addressed the last two.  My goal with this new round of testing is to work on the other two points.

Below a certain threshold, latency isn't an actual problem.  When you're playing video games, as long as latency from button pressed to pixels updating stays below 40ms, response feels instantaneous and you'll have a great experience.  But real world games like pinball are more demanding - physics waits for no-one - so latency needs to be even lower for good ball control.


Building a Round-Trip Latency Measurement Tool
I calculated preliminary latency numbers by recording slow motion video (240 fps) of two LED's, one illuminated when a gamepad switch was pressed on the IO Controller, and the other a LED output on the same IO Controller.  The first LED was in-line with the button, so it had to illuminate to complete the circuit, and thus it had zero delay.  The second LED had to wait for the button press to travel to the computer, be processed by my software, and travel back to the IO Controller, so there was a tiny delay to light it up.  That's the round-trip latency I wanted to measure.

I would load the video up in Adobe Premiere and count the frames from the first LED lighting to the second.  Multiplying the number of frames by 4.17 gave me the approximate latency.

This method was low accuracy (4.17ms increments are huge when measuring microseconds), and was a lot of work to record the video and analyze the frames.

Wanting a better tool for analyzing latency, I hooked up a Raspberry Pi 4 and wrote a few lines of code.  Now the Raspberry Pi could trigger the switch and detect the return signal, and report the delta between the two events.  I could also calculate and average delta over hundreds of cycles, giving me a completely automated, repeatable tool for measuring delta.

Before I measured the IO Controller, I connected the Raspberry Pi to itself in a loopback mode, and ran a few hundred cycles to see how much internal latency the Raspberry Pi contributed.  On average, the Pi had about 1ms of latency from triggering the button to detecting the press.  Based upon this, I would subtract 1ms of latency from any values captured from the IO Controller.


The Initial Latency Numbers
When I first started measuring the round-trip latency of the Chameleon IO Controller using the Pi, I was getting consistent averages of about 16.1ms.  This aligned fairly well the latency numbers I had calculated from the slow-motion video, which typically captured 4 frames between LEDs lighting, for 16.67ms.  This confirmed that my latency meter was working correctly.

While these latency numbers aren't bad (certainly better than video gaming latencies), my original design goal was to get it down below 10ms.


Improving Latency
My pinball test code was scanning for button presses at approximately 1000Hz.  Essentially the button scanning thread would scan for button changes, then sleep 1 ms, then repeat.  I made this tweakable in the GUI so I could test different values.  I hoped there might be a sweet spot where polling aligned better with the USB HID gamepad's reporting rate.  

Trying thread sleep values from 1 to 5 ms, I noticed almost no change, consistently getting 16.1ms average latencies.  Wondering if my settings were even having an impact, I cranked it up even higher, to 50ms, and instantly the Raspberry Pi began reporting much higher latencies between 25-50ms, which seemed correct, so these values were affecting latency after all.

On a hunch, I tried a 0 ms sleep, which caused the button scanning thread to run at 100% non-stop.  Instantly the CPU on my dual-core/quad-thread laptop went from 0% CPU to 25% consuming an entire thread, the heat output rose and the cooling fan noisily ramped up.  While none of this was desirable, I was amazed to see the Pi latency numbers drop down to a consistent 8ms!  I had cut latency in half with this one single tweak.

Obviously, sleeping the polling thread was necessary to keep CPU utilization in check.  Perhaps if I could sleep for less than 1ms I could keep both latency and CPU utilization low.

Unfortunately, 1ms is the smallest amount of time you can sleep a thread on Windows.  Instead, I modified my button scanning loop to only sleep after X number of loops, and made X configurable in the GUI.

For the next test, I configured my button scanning thread to sleep 1ms every 100 loops.  I figured this would result in effective sleeping of 0.01ms.  Surprisingly, CPU utilization stayed at 0%, while average latency dropped to 11.15ms.  Progress!

Next I tried sleeping 1ms every 1000 loops, for the equivalent of 0.001ms of sleep.  The CPU ramped up to 18%, higher than I wanted.  Average latency was 8.6ms, very close to the no-sleep 8ms result.

At 1ms every 500 loops, the CPU utilization was 12.5%.  This seemed much better, as the CPU stayed cooler and the fan didn't ramp up.  Latency averaged 9ms. 


Achievement Unlocked
Surprising things occurred when I tried 1ms every 250 loops.  Amazingly, CPU utilization dropped back to 0% (I was expecting around 6%).  Even better, latency actually dropped to 8.4ms.  Win-win!

I tested a few other values, but found that sleeping 1ms every 250 loops, for the equivalent of 0.004ms of sleep, seemed to be the best compromise.


What about the LED-Wiz in Modern Firepower?
For the past 7 years I've wondered what the round trip latency was in Modern Firepower.  My expectations were low.

Compared to the Chameleon IO Controller, the LED-Wiz runs slower USB 1.1, with a much slower system chip, slower PWM, and a less efficient algorithm for transferring LED PWM values from the host PC.  On Modern Firepower, I had to be very careful with my code and heavily multi-thread every task to keep lag at bay.  While the pin played great, I always felt it was on the high end of acceptable latency.  My guess was that latency with this solution was in the 20-30ms range.

Imagine my surprise, then, when I hooked up the LED-Wiz in my new latency tester, and with 5ms sleeps on every button scanning loop (current config on Modern Firepower) it got a 14ms response time!  In the same configuration, the Chameleon IO Controller was performing at 16.1ms.  

Does this mean my new design is a failure?  Not really, we're not comparing apples to apples.  There are a few notable differences.  

The LED-Wiz only needs 4 bytes transmitted to turn each output On/Off (completely separate from the PWM values) for 32 outputs.  The Chameleon IO Controller uses 256 bytes for 56 outputs, but this sets both On/Off and PWM at the same time.  The LED-Wiz is extremely slow when you update the PWM, so for the most part I avoided that feature, and Modern Firepower is basically limited to On/Off only.  

Another difference is that the Chameleon IO Controller has a built-in USB hub, whereas the LED-Wiz was directly connected to the laptop in this test.  On Modern Firepower, I have 3 LED-Wiz's connected via a USB hub, so actual latency might have been worse in that real-world configuration.

Another difference is that for this test I was using my own gamepad buttons built into the Chameleon IO Controller, where Modern Firepower uses an off the shelf U-HID, which might have different latency.

The takeaway here is that for an extra 2ms in latency, the Chameleon IO Controller adds PWM capability plus a lot more outputs.  That's a fair trade in my book.

And considering the pinball software has to update 3 or more LED-Wiz's just to get the minimum number of outputs for pinball, I'd expect those minor latency advantage to disappear.  I doubt I'll bother testing latency on Modern Firepower itself in this config.  

Besides, I've already tweaked the loop to improve the Chameleon IO Controller far beyond what the LED-Wiz was delivering in Modern Firepower, so there's no way my controller's going to be worse.


What about running the LED-Wiz with the New Loop Timings?
Since I got the Chameleon IO Controller down to 8.4ms latency with the improved button scanning loop timings, effectively sleeping the threat 0.004ms instead of 5ms, I was curious how the LED-Wiz would fair.  It improved just as much, dropping by 50% to 6.6ms.  

I wish I had discovered this years ago.  This is a change I'm excited to apply to Modern Firepower and see if it plays any better.


HID Polling Rate
Another distinct point of latency is the USB polling rate.  You may already be familiar with this, with all the gaming mice and keyboards on the market advertising 1000Hz or even faster polling.

I had already tweaked the polling rate for the LED outputs, changing the polling rate from 100Hz to 1000Hz, which is the fastest possible speed for HID devices (Note: special mice that exceed 1000Hz are not actually HID devices).  But I had neglected to do this for the Chameleon IO Controller's gamepads.  Sure enough, when I checked it was still set to 100Hz, so I cranked it up to 1000Hz.

Because I still had the LED-Wiz hooked up, I tested it first.  I was amazed that latency had dropped to 3.2ms, another 50% reduction!

Out of curiosity, I removed the sleep statement in my button scanning loop, and while CPU utilization shot up, round-trip LED-Wiz latency plummeted to 2.6ms.  It is simply amazing to witness latency improve 5-fold, I definitely didn't have Modern Firepower optimized.

Switching back over to the Chameleon IO Controller's LED outputs, latency was 5.1ms when using PFM calculated brightness values, and 5.0ms with PWM brightness values, so I was right in my earlier observations that my PWM code had a slight latency advantage, but that's too small to worry about.  I'll stick with the PFM (Pulse Frequency Modulation), as it gives me variable frequency from 780Hz to 11,700Hz, improving lighting quality, while the PWM method is constant at 780Hz.

Again out of curiosity, I turned off the sleep in my button scanning loop, to see what the lowest possible latency could be, and it dropped to 4.6ms.  I was actually quite pleased that performance with the sleep statement was so close to the absolute hardware limits, within half a milliseond.  

The 2ms delta in performance when compared with the LED-Wiz remained constant, and is unlikely anything I can remedy.  I'm not complaining one bit.  At 5ms latencies, The IO Controller should respond 3x faster than the already great performance on Modern Firepower.


Chameleon IO Controller Heat
One thing I noticed during testing is that the controller chips on the IO Controller became warm under constant load, especially as I improved performance.  The gamepad input controller chip stayed cool at all times, but the LED controllers were warm to the touch.  And the USB controller chip linking everything together was the warmest of the bunch.

For comparison, the chips on the Raspberry Pi 4 that is running as my latency tester were almost too hot to touch, and they have heat sinks.  So the heat on the IO Controller's chips isn't nearly so bad.

I didn't observe any failures, and I don't think the temperatures were hot enough to cause immediate problems, but long-term chip longevity is a concern.  I've ordered some small 7mmx7mm heatsinks to hopefully extend lifespan.  I also want to make sure that none of the components thermal throttle.

As part of my design validation, I will probably setup a control board without any heatsinks to do long-term stress testing and see if there are any failures.


How Does 5ms Compare to a Classic Solid State Pinball?
Good question.  I've never measured a "real" pinball machine myself, and my Google-Fu came up short in finding an official answer.  I don't have any "real" SS pinball machines by Williams or Stern or anyone else to test.  So the honest answer is that I really don't know about SS latency.  My expectation is that it should be better than what I can achieve with my design, as a SS design has behaviors coded control chip firmware for fast reaction times.

I did find a YouTube video by Kern Psy showing a high-speed, 960fps recording of a solid state pinball compared side by side with virtual pinball.  Kern was documenting lag in Visual Pinball and using the solid state pinball machine as the baseline.  Though he did overlay some times on the video, to be clear Kern didn't actually measure the latency.  He also slowed down the video by 75% (as if 960 fps wasn't slow enough).  All of this makes my analysis of the video below circumspect.

A typical cabinet button makes contact about halfway through the stroke to ensure solid contact and make the button more responsive to touch.  This isn't a hard and fast rule, and many flipper buttons can be adjusted to make them close either earlier or later in the stroke.  Accordingly, I stopped the video when the button was approximately 50% pushed in.  Then using the period (.), I was able to advance the YouTube video frame by frame, and I counted 60 frames.  I repeated this analysis a few times, and consistently counted around 60 frames.  Since Kern reduced the video speed by 75%, I made an educated guess to reverse this and divided this number by 4, for 15 frames.  This equates to 15.625ms of latency on a 960fps video.  Alternatively, counting from the button being fully depressed, there are only 12 frames, for 3.125ms of latency.

I think it is safe to say that Kern's pinball machine was in the range of 3-16ms of latency, most likely somewhere in the middle, around 8ms.  Kern's pinball machine is a Williams "The Getaway" from 1992, which uses a Williams WPC (Fliptronics 2) MPU.

UPDATE:  In a comment below, reader zacaj graciously shared his latency numbers from Black Rose, which uses the same MPU as The Getaway.  With an oscilloscope, he measured 1-4ms of latency to the flippers.  This aligns better with my expectations for solid-state performance.  

Zacaj's testing methodology also aligns better with my own, as he is measuring just the electrical signals.  Kern's video is measuring physical indicators, like the flipper moving, which is a completely different metric.  There is a small lag from the coil being energized, converting electrical energy to kinetic energy, until the flipper bat starts moving, and you can't account for that in a simple video analysis.

A huge thanks to zacaj for this info!

So the Chameleon IO Controller, with the refined button scanning logic in my Chameleon Pinball Engine software, is nearly on par with an official solid state pinball machine.  I consider that mission accomplished...


Side Effects of Better Latency
One concern I have with improving button scanning latency is picking up additional switch bounce.  Bounce was already a minor problem on Modern Firepower (and every pinball machine ever...), and I didn't have any debounce code in my software. 

In theory, the much lower latencies I've achieved here could potentially expose more bounce signals, so this is a real concern.

I do have some new debounce code that is currently untested, hopefully it will keep switch bounce at bay.


Final Thoughts
Modern Firepower has performed perfectly for over 7 years on the U-HID & LED-Wiz with these "at best"14ms latencies, and the real-world latencies were probably a little worse due to multiple boards and a USB hub to connect them.  While the LED-Wiz's PWM specs are decent on paper with 5.5-bit, 49 levels at 60Hz, I couldn't effectively use PWM dimming because those command updates were too slow for pinball.

I used these devices as my benchmark when designing the Chameleon IO Controller, and set out to improve on all metrics - performance, size, capabilities, ease of use, and even cost.  I feel I succeeded.

With my new Chameleon IO Controller I've achieved 5ms round-trip latencies with 112 total outputs,  while adding in PWM capabilities.  The IO Controller's PWM is 5-bit, 32 levels at 780Hz+, plus has true hardware PWM on 8 outputs for controlling servos.  The 56 button gamepad improves on the U-HID which only had up to 32 gamepad buttons, with the remaining switch inputs only configurable as problematic keyboard buttons, and the 1000Hz polling is likely faster than what I assume is 100Hz or 125Hz polling on the U-HID.

I fully expect The Black Knight Rises to be noticeably better than Modern Firepower, I can't wait to play it...

2 comments:

  1. I've taken some readings of a Black Rose with an oscilloscope, and I got a 1-4ms lag for the flippers, fyi

    ReplyDelete
    Replies
    1. That's awesome info, thank you for sharing zacaj! The Black Rose share's the MPU with The Getaway, so they should be the same. Seems my analysis of Kern's video was a bit harsh.

      Delete