How a digital sound source affects sound quality

How can a digital sound source have an impact on sound quality? As long as an audio signal is in the digital domain, the signal is fully specified by a sequence of numbers. As the digital sound source always delivers exactly the same numbers to a DAC, it ought to be the case that sound sources have no impact on sound quality. Nevertheless, they do.

Contents:

The problems
Choosing between USB and S/PDIF
The 3beez BitScrubber board
Objective tests
Subjective tests

Extra material:

Bits are still bits
What happened to XLR?

The problems

The digital circuitry in a digital sound source produces wide-bandwidth analog noise. That noise is present in the ground and power lines, the signal lines, and as electromagnetic interference (EMI). If it reaches the analog circuitry associated with the DAC, it will corrupt the analog output by merging with it or by causing jitter. Packaging the DAC in a separate, metal enclosure will protect the circuitry from the EMI produced by the digital sound source. However, the noise can still reach the DAC circuitry over the interconnect. All audio components should incorporate defenses that permit them to achieve an intended level of performance in typical environments. However, our experience and the experience of many serious listeners suggest that some DACs do not.

The cables used to connect external DACs to the digital sound source can cause problems other than the conveyance of noise. Cables have limited high-frequency response. This limit will cause transitions of the digital waveform to be more gradual. With long cables (e.g., 100m), signal degradation can be sufficient to make decoding impossible. Cables are shorter in consumer applications, so the problems are more subtle. When the clock is embedded in the signal, as is the case for S/PDIF, AES3, and TOSLINK, slower transitions make it more difficult for the receiver circuitry to determine the point at which a transition from one logic level to the other takes place. The problem is compounded by the presence of noise. The consequence is jitter. The phase-locked loop (PLL) in the receiver can attenuate the jitter, but minimizing jitter at its source is prudent. Note that a USB interface is not susceptible to this problem because the audio clock is not embedded in the signal (it is generated in the DAC).

Cables vary in the degree to which they exclude EMI. The coaxial cable used by S/PDIF (RCA) and AES-3id (BNC) has a shield that protects the signal from EMI. AES3 (XLR) uses shielded twisted pair (STP). The differential signal carried by the twisted pair might provide better isolation from EMI, but an STP cable has higher capacitance, which results in more signal loss over long distances. The losses won't matter over the short distances typical of consumer applications, but neither will the difference in EMI protection. STP makes more sense for analog connections where the differential form minimizes the problem of ground loops. For digital interconnects, we think that coax is the more sensible solution (see What happened to XLR? below for more discussion).

Choosing between USB and S/PDIF

The ideal solution to all these problems is elusive. A metal case is sufficient for isolating the DAC circuitry from EMI generated by the digital sound source. To protect the circuitry from noise on the ground of the sound source, DAC designers can galvanically isolate the inputs. Powering the circuitry in a DAC from its own power supply isolates the circuitry from noise in the power of the sound source. Cables and their connectors must have the right impedance, adequate bandwidth, low signal loss, and effective shielding. These measures are straightforward. The solution to other problems depends on the choice of interface.

Asynchronous USB originally struck us as a superior interface because it puts the critical clock circuitry in the DAC. Unfortunately, some users were unable to achieve excellent sound quality when connecting by USB to a DAC. After puzzling for a while over this contradiction, we made three observations. First, the data rate of HD audio requires that USB operate at "high speed" (480 Mb/s). Because USB connections are bidirectional, galvanically isolating the USB inputs would be extremely difficult with signals of this bandwidth. It is possible to isolate the output of the USB subsystem (at which point the signal is unidirectional), but we wonder whether that solution is as effective as arresting ground noise at the input. The second problem is that the USB protocol is so complicated that it requires what is essentially a computer to interface with it. Remember that the original rationale for an external DAC was to separate the sensitive analog circuitry associated with the DAC from the electrically noisy computer in the sound source. Replacing that computer with one for implementing the USB interface (typically an xCORE multi-core processor) accomplishes nothing. (And the same could be said about the circuitry required to implement an Ethernet interface, by the way.) We also observed that the crystal-stabilized clock in the digital sound source is sufficiently stable for audio, so replacing that clock with one in the external DAC also accomplishes nothing. After dismissing these two putative advantages of a USB interface, we are left only with disadvantages: USB makes galvanic isolation difficult (at best) and it requires the introduction of electrically noisy digital circuitry. We doubt that these disadvantages are insuperable, but they make the achievement of great sound quality with a USB interface challenging.

The remaining common interface (we will not consider I²S here because only a few products support it) is S/PDIF and its relatives. S/PDIF embeds the clock from the source in the digital audio stream. The destination device must extract it in a way that does not introduce jitter. This requirement still strikes us as suboptimal, but it seems that DAC designers are able to avoid problems. Decoding S/PDIF does not require processing as complicated as the processing required by USB. Also, the signal is unidirectional, so it is straightforward to provide galvanic isolation even at the high sample rates of HD audio. Forced to choose between the lesser of evils, we now believe that S/PDIF wins.

The 3beez BitScrubber™ board

../bitscrubber.png

The 3beez BitScrubber digital interface board performs four functions to optimize the S/PDIF signal it transmits to the DAC:

  • It eliminates common-mode noise
  • It isolates ground (galvanic isolation)
  • It conveys a signal with appropriate signal levels and edge sharpness so that the received signal is good enough despite any degradation in the cable
  • It matches its output impedance to the cable impedance required by the standards to eliminate transmission line effects

The BitScrubber board uses several measures to confine electrical noise. Multiple pulse transformers isolate the ground of the interface board from the ground of the input signal (the S/PDIF signal from the computer in the sound source) and from the output signals (the grounds of the interconnects). Galvanic isolation of the output is necessary because some DACs do not provide galvanic isolation themselves. Transformers also provide common-mode noise rejection. Theoretically, their common-mode noise rejection is perfect, but parasitic capacitance limits their performance. The particular transformers on the BitScrubber board – manufactured by Scientific Conversion – have very low parasitic capacitance for exceptional common-mode noise rejection. The board also isolates the metal case of the sound source from the metal case of the DAC.

The BitScrubber board uses a differential line receiver to convert the single-ended S/PDIF signal to differential form. Careful circuit design and PCB layout assure that the differential signal is balanced. A balanced, differential signal provides additional protection from noise, particularly noise that might be induced in the circuitry of the BitScrubber board itself or in the interconnects.

We power the BitScrubber board from an external linear power supply to assure that noise in the power supply for the computer or for the hard disk drives is not able to corrupt the signals generated by the board.

Tests of the BitScrubber board

Objective tests

cmnoise.png

The screenshot above [thanks to Jon Paul of Scientific Conversion] shows the measured performance of the BitScrubber board. The first waveform is the input test signal. It is an S/PDIF signal with so much common-mode noise that it is almost impossible to see the signal (close to 0dB SNR). The second waveform (brown) shows the output of the BitScrubber board. The common-mode noise has been almost completely eliminated. The third waveform (green) shows the signal at the far end of a 100m cable. There is some rounding because of the impedance of the cable, but the transitions are still very sharp and clean. Obviously, the length of the cable in a typical home system would be much shorter, so the transitions will normally be even sharper.

The board also optimizes the signal presented to the interconnect to minimize its effect on the signal. The line drivers have adequate bandwidth to keep edges sharp even for digital audio signals with a sample rate of 192kHz (which requires a bandwidth of at least 125MHz). They also have sufficient drive capability to handle the load presented by the interconnect. The amplitude of the signal presented to the interconnect is increased to the nominal value specified by the standards to assure the best possible signal-to-noise ratio at the DAC end of the interconnect. The output connectors made by Cardas (RCA) and TE Connectivity (BNC) provide optimal impedance, low resistance, and resistance to tarnishing.

Subjective tests

Careful listening tests comparing sound quality using the BitScrubber board (AES-3id) versus USB are ongoing. We are inviting outsiders and skeptics to participate. Tests are single blind. A-B comparisons are impossible because the master clocks are different, but we play the same material and switch as quickly as possible. Listeners cannot be certain that we are actually making a change when we interrupt the music to switch. We ask listeners whether they hear a difference and, if they do, which they prefer. Listeners generally have no trouble hearing a difference, and when they do, they almost always prefer the sound with the BitScrubber board. Most listeners comment that localization is more vivid and articulation is clearer. Some have also commented that the sound of instruments with significant high-frequency energy is less abrasive. We will update these notes as these subjective tests advance.

[update: 24 August 2016] Most of the comments that we are hearing about the improvement to sound quality are related to spatial effects. Listeners say that the sound stage is more vivid and that there is more of a sense of the space that the performers are in. We have also heard that sibilants and plosives are more lifelike.

Conclusion

Connecting an excellent external DAC to an excellent digital sound source does not assure excellent sound quality. We have no information about the input circuitry of specific commercial DACs, but it seems that some, at least, do not adequately protect themselves from analog noise conveyed from the digital sound source by the interconnect. Our formal subjective testing and the experience of many of our customers indicate that a digital output with demonstrable excellence in common-mode noise rejection and high-frequency response will improve sound quality with at least some DACs.

Additional comments for readers who are not satiated

Bits are still bits

As we said in the introduction, the digital sound source always delivers exactly the same numbers to a DAC – unless something is seriously wrong. There could be so much loss in the interconnect or so much noise on the signal that the DAC is no longer able to interpret the digital audio stream properly. As the distortion reaches this level, the receiver will begin to make single-bit errors. Such errors are readily audible as ticks or pops (depending on the significance of the bit that is in error). As the distortion gets worse, the sound will transition to very loud noise. When the loss is severe, the receiver in the DAC will no longer be able to lock to the digital audio stream, at which point a mute function will activate to silence the output. If none of these problems is evident, then the bits reaching the DAC are exactly correct regardless of any degradation of the digital waveform.

The problems that we explored in this treatise are analog in nature. They can occur because the clock signal embedded in the digital audio stream is analog. It may appear to be digital because the waveform assumes two values, just like a digital waveform. Indeed, the same waveform contains both the audio stream, which is digital, and the clock. However, the clock signal is encoded in transitions of the waveform, not in its amplitude, and the timing of the transitions is continuous in nature (analog) not discrete (digital). The digital values of the audio waveform are robust to noise, transmission-line effects, and cable losses because they are digital. The clock waveform is not.

Knowledgeable readers might be aware that a USB connection also contains a clock, and that the clock is embedded in the data stream just as it is in S/PDIF. However, that clock is unrelated to the clock driving the DAC chip. The USB clock is used to fill a first-in, first-out (FIFO) buffer with audio data. The DAC circuitry pulls data out of this buffer at a rate determined by its own clock. When the buffer reaches a given state of emptiness, the USB controller sends a request back to the host to provide more data. Thus, USB does not suffer from the same sensitivities as S/PDIF. However, as noted before, it does suffer from other deficiencies that tip the balance away from this significant virtue. The same is true of Ethernet.

What happened to XLR?

When we started designing the BitScrubber board, we intended to provide support for all three electrical versions of S/PDIF: AES3 (XLR), AES-3id (BNC), and S/PDIF (RCA). Support for S/PDIF was a given (even though RCA connectors are abhorrent) because it is the most common interface. The two professional versions of the standard are similar, so we decided to support only one to minimize the expense of what was clearly going to be an expensive board. We chose AES-3id for the second interface because we felt that it is better for digital audio than AES3.

Obvious advantages of BNC coax

Some of the advantages of AES-3id over AES3 are straightforward. Unlike BNC connectors, XLR connectors have an indeterminate impedance, so they undermine efforts to match the impedance of the output to the impedance of the cable. It is easier to assemble BNC connectors to coax cable than it is to assemble an XLR connector to shielded twisted pair cable. Also, XLR connectors are clunky. It was easier to accommodate BNC connectors on the back panel of the Wax Box. In a survey of DACs, we found only one that does not support either RCA or BNC, and it is intended primarily for professional applications.

Shielded twisted pair cables suffer from high loss

AES3 specifies shielded twisted pair (STP) for interconnects. Such cables have relatively high loss. The standard accommodates cable loss by permitting a large voltage swing at the input to the cable of 2-7V and a small voltage at the far end of the cable of only 0.2V. Thus, the standard tolerates a loss of over 97% of the signal, or over 30dB! Note that losses in a cable are greater at high frequencies, so the consequence of these losses is that transitions are slower and the waveform is more rounded. Such distortions will result in greater jitter.

The coax specified in AES-3id is less lossy, so AES-3id specifies a lower input voltage (1.0V) and a higher minimum voltage at the end of the cable (0.32V) – and that value refers to the amplitude at the end of a 1000m cable, not a 100m one. Thus, the maximum loss tolerated in a 1000m coax cable is less than 10dB. The loss in a coax cable whose length is the 100m maximum permitted by AES3 would be only 1dB as opposed to the 30dB of AES3. Cables are typically even shorter in consumer applications so the effect on signal amplitude of cable losses will be less significant, but greater losses at high frequencies will still distort the signal in ways that can increase jitter.

Large voltage swing is a disadvantage

A potential advantage of a large input signal is that it could improve the SNR relative to any noise induced in the cable. Induced noise is a significant concern in studio applications where engineers have to run long cables in close proximity to each other. The cables might approach power lines; there might be lights with dimmers in the studio. The environment in consumer applications probably has less EMI, but in any case the shorter cables typical of consumer applications minimize the opportunity for EMI to infiltrate the cable. Thus, any benefit to noise immunity of a larger voltage swing is minimal.

A significant disadvantage of a large voltage swing is that properly driving a cable with a higher voltage requires special circuitry. A high-bandwidth line driver is necessary in any implementation to drive the capacitive load presented by a cable over the bandwidth required for HD audio. However, line driver chips typically operate from a 5V power supply and deliver a voltage swing of 2V. Thus, such chips would need help from an output stage of discrete components to realize a voltage swing close to the 7V permitted by AES3. That circuitry would require a power supply with higher voltage – 15V, most likely. Driving the capacitive load of the cable over a wider voltage swing would require proportionately higher current drive capability (7x voltage swing requires 7x current drive). The higher capacitance of STP further increases the current drive requirements (to something like 10x). A discrete output stage would also be responsible for providing the necessary current drive. Any manufacturer claiming to provide a higher output voltage without using special active circuitry is not delivering a signal with sufficient bandwidth for HD audio – perhaps not even for Red Book audio. The consequence will be higher jitter. Demand to see an eye pattern such as the one we provide above.

Signals are differential in coax as well as STP

AES3 specifies the use of a differential signal in the cable. However, the galvanic isolation at the output of the BitScrubber board means that the BitScrubber board drives the two conductors of a coax cable differentially. Likewise, the input of a properly designed DAC will treat the two conductors of the coax as a differential signal. Thus, in practice there will be little difference between versions of the standard in this regard.

Short cables obviate differences

The short cable lengths characteristic of consumer applications minimize the effects of differences between the AES3 and AES-3id standards. The higher voltage of AES3 connections has little impact on noise immunity because short cables provide little opportunity for EMI to infiltrate the cable. Properly designed inputs and outputs treat signals as differential whether the cable provides twisted pair or not. The higher losses in AES3 cables probably won't matter much in a short cable, but if they do then they are a disadvantage for AES3. Reflections caused by the indeterminate impedance of the XLR connector won't matter much in short cables. The only factors that are really significant are the practical ones: BNC connectors are smaller, they are easier to attach to the cable, and they are easier to use. Audiophiles who want to err on the side of caution should favor AES-3id for its lower losses and higher bandwidth because those advantages might reduce jitter.

Conclusion

The engineers who designed the original AES/EBU standard probably used XLR cables because they were common in professional audio applications for connecting analog components. The engineers probably didn't have time to think about whether the cables were a good choice for digital audio until much later, by which time it was too late to change. Moreover, some of the problems didn't become serious until we started using AES/EBU to convey digital audio signals with a sample rate of 2-4 times the sample rate of the original specification. I mentioned above that cables must have a bandwidth of at least 5x the symbol rate to produce a good eye pattern (the green trace in the figure above) – 125MHz for a signal with a sample rate of 192kHz. The bandwidth for which XLR cables were originally intended was 20kHz – a bandwidth that is over 6000 times smaller. Coax cables with BNC connectors were designed for radio broadcasting and video, so they were always meant for high bandwidth. Using XLR cables for digital audio was a mistake. The professional audio industry corrected the mistake by introducing a new standard, AES-3id, 10 years later. 20 years after that, the consumer audio industry still has not caught on.