Audio Video Bridging (AVB) Ethernet implentation with XMOS
XMOS Ethernet AVB Implementierung mit XMOS-Controller
In spite of the constant growth of bandwidth and improvements in quality of service, Ethernet still faces the problem of the lack of real-time capabilities. This flaw is made awkwardly prominent by the transmission of multimedia content, which requires synchronous packet delivery as much as possible. There may be a change now in this respect as the “AVB” standard extension allows the deterministic delivery of packets in local networks. This is really useful for the distribution of high-grade audio, and definitely indispensable when it comes to the conveyance of audio and video lip sync.
With 802.1BA for synchronous Ethernet
The new IEEE standard 802.1BA or Audio Video Bridging (AVB), currently only available in its draft form, essentially consists of 3 extensions making Ethernet real-time-capable in a backward compatible manner. 802.1AS delivers the necessary accurate time information, 802.1Qav describes the implementation of QoS by means of queuing and forwarding rules, while 802.1Qat defines the reservation and allocation of transmission resources and the identification of media streams. All these standards ultimately represent the extension of the media access procedure on two levels of the OSI Layer Model, and consequently lead to changes in the Media Access Controllers (MAC) of NICs in terms of practical implementation. In addition, the setting up of an AVB domain necessitates a slight extension of the “Link Layer Discovery Protocol”, a layer-2 protocol to be used by network devices to communicate their technical characteristics and identification. AVB uses LLDP to evaluate the adequacy of a network device and determine the members of an AVB cloud. To this end, a new data unit format (DU) has been defined with specific AVB-TLVs (Type Length Value).
Setting up the AVB cloud
The term “AVB cloud”, frequently found in literature, is somewhat misleading as it gives rise to associations with the Internet cloud, which has a global dimension now, and is understood by many as “more of a prediction than a definition”. However, the concept “AVB cloud” can be easily defined: it describes domains operating according to the same clock using the Precision Time Protocol (PTP). Because layer 2 contains no superordinate server to browse the LAN for AVB-capable devices, it is the participants themselves who determine the borders of their domains using LLDP and exchanging messages.
After powering up an AVB device or being connected to an Ethernet-LAN, and after the subsequent link negotiation (automatic determination of the maximum common transmission rate), it is clear whether the relevant counterpart is full-duplex-compliant, which is mandatory for AVB. Basically, AVB does not support “shared media” systems such as x-Base-2 on a 50-ohm coaxial cable. However, wireless network devices with 802.11 should be suitable AVB participants at least in the final version of 802.1BA.
Subsequently, the device exchanges LLDPUs with its partners, ensuring that both are “AVB-capable”, i.e. they are equipped with the necessary MAC extensions. After this process has been completed on all ports of a local network, the extension of an AVB domain takes place without central “intelligence”. E.g. AVB bridges that only have one link partner with AVB capabilities automatically create a border post and thus establish the border line. A switch without AVB support in a larger AVB conglomerate is therefore sufficient to split the network into two domains.
As the third step, synchronization has to be accomplished, which involves the definition of a “clock grandmaster” based on a special selection procedure, which will subsequently provide a central beat for the network
Media transport
To enable the flexible exchange of multimedia contents in the existing AVB cloud, standardized media streams need to be packaged in various formats. E.g. in the case of audio streams, the sample rate, the number of channels, or sometimes even the codec used must be provided as supplementary information, while video necessitates the definition of attributes like resolution (SD, HD) or the compression procedure.
This purpose is served by the new P1722 IEEE standard (currently in its draft version V1.7), which is built on AVB, and manages the following media protocols:
- 61883-2: SD-DVCR data transmission
- 61883-4: MPEG2-TS data transmission
- 61883-6: Audio and music data transmission protocol
- 61883-7: Transmission of ITU-R BO.1294 System B
- 61883-8: Transmission of ITU-R BT.601 style Digital Video Data
- IIDC: Instrumentation and Industrial Control Digital Camera
However, P1722 also determines how to synchronize and render media streams based on the so-called presentation time and the global 802.1AS time. So the AVBTP presentation time is not to be confused with the global time stamp of an 802.1AS packet.
Presentation time
Presentation time represents the time when a media sample is fed to the talker in addition to the max transit time information to take into account the latency period of the network section. Max transit time, describing the maximum throughput delay, amounts to 2 ms for AVB class A traffic and 50 ms for class B. However, the talker and the listener may negotiate a lower maximum transit time when this is allowed e.g. by 1000-Base-T networks.
Implementation in the XMOS-Processor
Ethernet AVB, along with its substandards, requires specific extensions of the media access controller of all participants, as well as support for digital media recording and playback. Implementation is generally facilitated by processors of the embedded range, but they need to be determinable, i.e. real-time-capable, to a great extent to meet the requirements of synchronous Ethernet.
On the other hand, architectures based on state machines, which can be implemented e.g. in FPGAs and CPLDs, naturally satisfy this requirement ideally; however, they are quite cumbersome in terms of forming the necessary function blocks as HDL code. The perfect option would therefore be a highly deterministic processor with the possibility of programming in standard high-level language, like the one that has for some time been offered by XMOS as an “event-driven processor” (EDP) of the so-called XCore architecture
Event-driven processor
The idea is basically very simple: Define as many functions as possible in the software, even those that would normally be realized in the hardware, and let these run in an array of processor cores. Each task occupies a thread, and eight of them run in an exact cycle on a core. Mutual influence is ruled out, so it can be predicted in the development stage how long a certain action will last.
This high degree of determinism is a key property of the chips, which enables developers to define not only system processes like protocol stacks, but also hardware interfaces in a shared environment. This integrated hardware/software development flow is based on a modified C compiler named XMOS-C (XC), practically a C extension facilitating access to and the control of multi-threading and I/O resources.
Here are the key properties of XMOS processors:
- Each XCore can simultaneously process up to 8 threads at a speed of 400 MIPS, and each thread possesses its own register file, which makes them a quasi logical core.
- The 8 threads share among themselves 64 kbytes of unified memory [See the definition of the Unified Memory Architecture] without an access conflict.
- Integer and fixed-point operations are enabled to facilitate efficient signal processing and cryptographic operations.
- There are 64 GPI/O pins, programmable through software.
- The threads are absolutely deterministic, and, as a result, each thread allows the implementation of a “hard” real-time task, irrespective of the behaviour of other tasks.
- I/O pins are grouped as logical ports with a width of 1, 4, 8, 16 and 32 bits. Additionally, each port features a SerDes, synchronization options with external interfaces and accurate timing.
- Each XCore contains 8 timers, capable of measuring time relative to a 100 MHz beat.
These functions make the XMOS processor ideally suited for the implementation of AVB. While the first version of XMOS only supported Fast Ethernet, the second generation of EDPs has meanwhile made it possible to provide AVB participants with Gigabit Ethernet connectivity.
Communication channels
The data communication necessary between the threads is implemented via so-called channels, with channel ends situated either on CPU cores, or on various cores on the same chip, or even on different chips, utilizing a connection structure called XLink. Therefore, during implementation it makes no fundamental difference for the software developer what physical sections the channels are made up of; he can use them with same notation to transmit messages consisting of data and control tokens
Design flow
The development environment, available for Windows, Linux and Mac operating systems, can be obtained free of charge by registering on the XMOS website. According to information from XMOS, all functioning C programs in the XC environment, as well as the integration of specific XMOS include libraries, are translated smoothly. If the code matches the footprint of the chip, it is executed there, and the component behaves like dedicated hardware. XC is a programming language aiming at simultaneity and real time, which has been specifically developed for the XS1-G architecture.
XC programs are easy to write and debug – no deadlocks, no race conditions, no memory violations.
XC uses the described channels to realize fast bi-directional communication and synchronization between the end points of channels. End points may be located either inside a thread on the same processor, or on two processors of the same chip, or even on the processors of two different chips.
The XS1 processor architecture implements hardware functions as software, and therefore, hardware solutions are in this respect not more than software libraries. They can be compiled and linked as standard C codes, and the resulting binary file is loaded into and executed on the processor.
Accordingly, the AVB reference design is a library of source files that can be integrated in larger applications. Naturally, the reference design also contains project templates and sample programs for development support and documentation.
Development platforms
As mentioned earlier, the entire software development environment can be obtained free of charge from the XMOS website. Once registered, you are granted access to tools and sample programs, as well as the Xlinkers site, which is the official XMOS web community. The site includes a large number of projects, programs and code snippets, as well as blogs offering an opportunity to discuss problems and solutions, making the life of developers easier.
On the hardware side, XMOS offers several evaluation boards based on XS1-G4 and the new single-core device XS1-L1. XMOS utilizes XS1-G for AVB implementation, which is a break-out box with QVGA-LCD in its shiny black housing, and has the look-and-feel of a modder PC, comprehensively equipped with LVDS interfaces, audio I/O and Ethernet, an SD card slot and XIO ports with pin headers.
Upon delivery, a few sample programs are already installed, which can be launched from the on-screen menu or the keyboard. In addition to “Pong” and “Mandelbrot”, there is also an audio frequency analyzer analyzing the audio input of the eval system in real time and displaying it on screen as a frequency diagram.
Ethernet AVB
As a presentation of its reference applications, XMOS recorded short videos, which were then placed on its website. One of them shows the implementation of the Ethernet AVB standard, as well as IEEE1722 and IEEE1588 as an Ethernet AVB Demo. The reference software is also available free of charge as a source code on the website, and can therefore be freely used for your own projects.
The reference design essentially consists of the following three components:
- Ethernet MAC and MII interface
- Implementation of the Precise Time Protocol (PTP, IEEE1588 or 802.1AS)
- Audio streaming component (talker and listener)
Ethernet MAC component
The MAC component, a key element of the AVB implementation, packages data in Ethernet frames, and delivers them as a sending site to the MII interface; furthermore, it collects the frames received, and deconstructs them according to content and header information. The PHYs are not integrated in the processor, and need to be implemented externally with the required galvanic isolation (XMOS uses for this purpose the LAN8700 PHY of SMSC). The component is made up of six threads, which must run on the same core and communicate with other components and threads via channels (connection sections). You can control the data of these links that need to be transmitted with the help of so-called filters. The setting up and configuration of the channels is performed by a C-API client. For details, see the corresponding header files. The MAC component supports, in addition to its familiar functions, two important features for AVB:
Time stamping allows the receipt and sending of Ethernet frames with time stamps based on a clock signal, e.g. a 100 MHz beat with a resolution of 10 ns.
Flow control implements the 802.1Qav standard for the individual management of files of various priority categories and restrictions concerning the available bandwidth.
PTP component
The Precise Time Protocol component (PTP) delivers network-wide time information for the synchronization of all participants of an AVB cloud. In its reference system, XMOS has the implementation of both the IEEE1588 (V2) standard and the new 802.1AS standard in a single package, which can optionally be integrated in the application.
This component consists of only two threads, is connected to Ethernet MAC, and offers so-called channel ends to clients to query time information. It interprets PTP packets from the MAC, and ensures the availability of the global time information. It can be defined at run time, in accordance with 802.1AS and the Best Master Clock algorithm, as grandmaster or slave. As grandmaster, it delivers global time information to all participants of the AVB cloud, and, consequently, functions as a central timer; as slave, it acquires time information from an external grandmaster.
The connection between the local time reference and the global time of the AVB cloud is ensured via the channels, allowing the client to either set very precise time stamps when sending media packets, or create a very accurate sample clock when receiving streams.
Audio stream component
The audio stream component is connected to the software MAC, functioning bi-directionally to receive and send audio samples. As audio talker, it takes over digitalized audio streams through the I2S interface, bundles these into IEEE1722-compliant packets, and forwards them via Ethernet. As audio listener, it takes over an audio stream from the network, unravels the packets, and sends the audio samples to an audio DAC through the I2S interface.
The talker
The talker component is basically composed of two threads: on the one hand, it is a buffer for audio samples from the I2S interface, on the other hand, a packetizer, which forms the payload, i.e. the load capacity of Ethernet packets, from individual samples according to the IEEE1722 standard. The data are then passed on to the MAC component, which generates from them full Ethernet frames.
When creating the Ethernet packet “in statu nascendi”, the talker generates time stamps from the global time provided by the PTP component, in addition to an offset to compensate for run times inside the AVB cloud. Consequently, the time stamps define “presentation time”, i.e. the times at which audio samples need to be rendered at the listener.
The listener
The listener takes over the incoming IEEEP1722 packets from the MAC, and converts them into a stream of media samples that are in turn fed into a buffer thread. As the audio stream was produced on another system, i.e. based on another clock generator with a different frequency, the playback clock frequency must be adjusted.
This purpose is served by an audio clock recovery thread, which, on the one hand, determines the correct beat using various processes, and, on the other hand, controls an internal or external generator accordingly. The XMOS reference system offers three synchronization variants; of course, it is standard-compliant with IEEE1722 time stamps. To that end, the thread determines the correct beat for each packet from the difference between subsequent time stamps and the number of audio data words, and forwards control information to the selected generator. Alternatively, the playback beat can be synchronized with an external signal, or derived from the global PTP time and some additional information. As soon as the sampling rate has been adjusted, the listener can correctly determine the size of its output buffer to ensure that the media stream is played back exactly at the presentation time and synchronously with other AVB participants.
System design
The components MAC, PTP and audio interface require a total of 13 threads and a mere 80 kbyte memory. The set of 32 threads and the 256 kbyte memory of the XS1-G4 processor mean that more than 60% of available chip resources can be used by applications, e.g. protocol implementations for network configuration, user interfaces, audio processing routines, or simply additional audio stream instantiations.
You can see the distribution of components among the 4 processor cores in Figure 16. Certain time-critical threads must run on a common core to ensure the shortest link delays.
Demo system
The configuration of XMOS' AVB reference system allows to build up an AVB cloud with four participants, where each box can function as a talker or a listener. The necessary software is stored on an SD card in each box and can be selected via a menu, with the program alternatives “AVB_T.XB” for use as a talker and “AVB_L.XB” as a listener. Audio signals are fed into the boxes through the line-in jack plugs, and can be output to active boxes via line-out.
The central Ethernet switch allows to track data traffic at the Ethernet level via a repeating port, e.g. using a packet sniffer like the well-known Wireshark.
During operation, the boxes show a frequency diagram of the audio signals fed and transmitted, to be found in the thread diagram as FFT & Buffer component. This is another impressive proof of the fact that threads on this processor do not influence one another, and that they preserve their real-time capabilities even under load.






















