Evaluating encoders for Live Smooth Streaming
March 16, 2012 1 Comment
As part of a recent project we designed the infrastructure for delivery of TV channels over the Internet as part of an over-the-top TV solution. Among the tasks we undertook was an evaluation of several encoders available on the market, during which I gained much valuable data, which I will try to share here.
This article is written primarily to help you make an informed choice when in a similar situation – I will outline the key factors that deserve special attention and outline potential issues you may encounter when looking for a suitable encoder.
In the interest of fairness – products evolve over time and what is true today might not have been true when we performed our evaluation – I will not name the encoders that I evaluated, with the exception of the best performing one.
The input video was a set of live standard definition television channels delivered over a satellite feed and packaged in an MPEG-2 transport stream. Our solution had to receive the channels, deliver them to encoders as IP streams over UDP multicast, process them and publish output in the form of a PlayReady-protected Live Smooth Streaming presentation.
We chose to encode the output using the H.264 and AAC-LC codecs to achieve best compatibility with our target platforms. In addition, the video was protected with Microsoft PlayReady, so that it could be used together with our SilverHD DRM service. Since the target audience was home users who might have highly variable bandwidth conditions, we chose Smooth Streaming as the underlying transport technology, since we have had very good results with it in the past – Smooth Streaming can very effectively ensure that a stream with the optimal video quality is delivered to the end-user.
These constraints primarily defined the selection of encoders that we evaluated, so the advice in this article might not apply to other scenarios. Our evaluation was performed with real TV channels coming over a satellite feed and the output was transmitted internationally over channels that would mirror our real architecture.
Key factor: reliability of the encoder
Our TV channel broadcast solution was designed to operate 24/7, without any interruptions in service. This set great expectations on reliability – I wanted to configure the encoder and not need to touch it thereafter. Barring any network connectivity problems, it was critical that the video feed to stay up indefinitely.
Real-life testing over several months resulted in a greatly disappointing outcome – only one evaluated encoder was able to stay functional for a whole week! All of the others failed after 2-3 days of encoding (at best) and required human intervention to correct the situation. The failure rate was also dependent on the input signal – some TV channels caused one encoder to fail very quickly, generally within 10 hours.
I recommend you conduct similar tests using real-life video input for any encoders you consider for use, since an unreliable encoder will have a large negative effect on service quality and greatly raise administration costs. The rate of failure will certainly depend on the input video and encoder configuration, so our results might not directly carry over to another scenario.
Key factor: reaction to failures
The abovementioned reliability problems occurred with a near-perfect input signal (less than 0.0001% packet loss), without any hard failures such as disruptions of network or satellite connectivity. When I deliberately introduced such failures (e.g. by physically disconnecting the receiver), I discovered that around half of the encoders simply stopped encoding at this point! This is unacceptable, since it adds considerable administration overhead. Occasional signal failures are bound to happen and should just be ignored by the encoders.
A far more useful behavior was seen with other devices, which simply paused the broadcast and automatically resumed once the input signal was restored. One encoder replayed the last 30 seconds when this happened, which does not seem to add any value in my opinion – I would prefer a loss of output signal, which can be deteced by the player software, allowing appropriate error information to be displayed.
I encourage you to see what happens when the input signal or output connection disappears for an encoder candidate. Ideally, everything should resume automatically once the failure is corrected.
If the encoder does not resume operation automatically, you need to make sure you have a setup that allows you to manually detect and fix these errors (possibly using scripted automation). Note that while several vendors offer “encoder farm management software”, the impression I have is that these do not offer significant additional functionality and are likely to be as problematic as a constantly failing encoder itself. Our solution architecture did not use any such product provided by the manufacturer and was instead designed to use custom management software to drive the encoder farm workflows.
Key factor: variable aspect ratio compensation
This concern may only have relevance for scenarios where you are streaming TV channels, but in this scenario it is of critical importance. The main concern is that TV channels show content that has multiple aspect ratios in the same video feed. For example, a movie may be in 16:9 but then a 4:3 advertisement is displayed. The way this is handled for most channels is that every video frame contains information about its presentation (e.g. “Show this frame in 16:9″) and this information drives whatever device is using the signal.
For an encoder to be acceptable for use in our solution, it needed to use this information and compensate for the change in aspect ratio by adjusting the letterboxing and pillarboxing of the output picture. That is, since the output aspect ratio was fixed at 16:9, the encoder had to add black borders to the sides when showing 4:3 content, otherwise the picture would be stretched.
While all but one of the evaluated encoders claimed to support this, in fact only one encoder was capable of performing thre required compensation! The others that claimed to do it had a fatal defect – they did the proper manipulation of the picture… but only at the start of the encode process! If I started the encode when a 4:3 advertisement was playing, it correctly added the black borders… and a few minutes later when the 16:9 movie started playing, the black borders still remained! The encoders were not able to detect changes in aspect ratio at runtime, only on startup.
When you are looking for a TV channel encoder, pay special attention to whether it compensates dynamically for aspect ratio changes as they happen or only when starting the encode. The latter encoders are not usable for TV channel scenarios.
Key factor: TTML subtitles
The target market for our solution was a group of countries Europe, which meant that many of our TV channels had multilanguage subtitles that had to be delivered to the end-user. The Smooth Streaming technology natively supports multilanguage subtitles delivered using a format called TTML. All that is needed is for the encoder to take the subtitles from the incoming satellite stream and transform them into TTML. Sounds fairly simple – after all, subtitles are mostly only text, right?
Unfortunately, only one of the evaluated encoders supported TTML subtitles, although that encoder worked perfectly with them. Another encoder claimed to support it, but on closer inspection I discovered that it actually does not generate any subtitles no matter what subtitle settings you configure, which was also confirmed by the manufacturer.
Note that if you use TTML subtitles with Live Smooth Streaming in a Silverlight player that is based on the Microsoft Media Platform v2.6 or earlier, you will need to set some additional configuration options in the player to avoid subtitles breaking after a few hours of video playback. Leave a comment if you are impacted by this and would like to know more.
Key factor: management system usability
Running an encoder farm requires adminisrtative effort and training of personnel, which is a cost that is best minimized. To this end, it is important that the management system be easy to understand without being an expert in digital television broadcasting.
Around half of the encoders we evaluated had very obtuse user interfaces in their management system – it was obvious that these applications were very thin layers oevr a programmatic object model that might make sense for software developers but is largely impenetrable to most people.
I was most impressed by one device that had a web interface that felt “hardware-like”. That is, it almost succeeded in giving me the feeling that I was pressing buttons on a real physical interface, which gives a nice feeling of security and simplicity. For a very illustrative example: it was the only one that had a single prominent on/off button for each channel! This is just one example of UI design that makes it very obvious to control the encoder. Compare this to other encoders that had “Start” and “Stop” buttons (and one also had a “Prepare” button) that were often hidden away in some status screen and that might exist in different states at various times – for example, on one device, when an encode was starting or stopping, all such buttons were disabled. This makes it very simple to connect the UI with the programmatic model underneath, but is not a natural user interface paradigm – so what if something is already happening? Maybe I changed my mind and do not actually want to start it. I should not have to wait to press “Stop”.
Real usage of the devices is the main way to determine whether an encoder is easily usable by administrative personnel or whether extensive training is required (and mistakes easy to make) due to the complexity of the user interface.
A quick rule of thumb might be to see if the UI offers an easy “New presentation/channel” button that quickly gets you started. The more impenetrable user interfaces I encountered required you to start your configuration with less meaningful concepts like “codec profiles”.
Irrelevant factor: picture quality
While encoder manufacturers love to claim great picture quality with highly flexible compression configuration options, the truth is that if you configure any encoder with the same settings, they will all procude a picture quality that is generally not distinguishable by the human eye. Do not use any picture quality claims as the basis for selecting an encoder.
If the product supports your desired video configuration, don’t worry about picture quality. Even though some encoders were highly flexible in their configuration, this did not result in any real benefit – for us this was simply unused complexity. The important settings are fairly basic – what codecs and codec profiles does the encoder support? For example, some evaluated encoders did not support H.264 High profile, which offers best compression at the expense of decoding complexity. Other than the codec, the main determining factor in picture quality is still the video bitrate, which is always configurable.
Even though we did not find any really noticable differences in picture quality during our evaluations, I should mention that at least one encoder had an important encoding setting set to “Fastest” (e.g. lowest quality) and changing it required digging fairly deep into advanced configuration options. In our testing, I set it to a medium quality level which seemed to roughly match the options provided by the other devices. Do not trust the default settings to be appropriate when you are evaluating an encoder!
Be suspicious of performance claims
Naturally, almost every encoder manufacturer has happy performance claims on their websites, mostly in the style of “We can process n HD channels per encoder”. However, in my experience, these claims only apply if you are using very lightweight and low quality settings which tend to be the “recommended” configuration.
Using realistic settings, the best encoder could process 3 SD channels and the others 2 SD channels. With HD picture quality, a realistic estimate would be 1 channel per device; 2 if you are feeling optimistic. In our evaluation, all encoders had roughly similar performance characteristics – no matter whether their encoding was performed on the CPU or the GPU.
Pay attention to the difference between “streams” and “channels” – in a Smooth Streaming presentation, a single video channel will consist of multiple streams (five in our case). If the manufacturer gives the estimated performance in video streams, be sure to re-calculate this into actual video channels.
Other topics of interest
Most encoders offered an API in one form or another. If you are interested in creating a solution that includes management of a large encoder farm, you will want to investigate the capabilities of the API in detail. Our findings indicate that most encoders offer relatively poor API support – just a bolt-on feature rather than the primary control system. There was one, however, which exposed a very detailed and impressive REST-based API. When you are looking for a suitable encoder, pay attention to the functionality exposed through the API – if the manufacturer’s own management software does not use this API, it is likely to be lacking in its capabilities.
Our solution did not include channels with multilanguage audio, so we did not look into this in depth. Casual observation, however, indicated that most encoders were not capable of providing multiple audio streams. Make sure to try any encoder candidates with real input data if you are looking for multilanguage audio.
When I started the evaluation, I expected all encoders to be roughly equal, with some unique distinguishing characteristics here and there. I was greatly surprised to find that only one encoder was even remotely suitable for use with TV channels! The Envivio 4Caster C4 Gen III encoder proved itself reliable, recovered well from error conditions and supported both critical TV channel features – dynamic aspect ratio compensation and TTML subtitles. While the product was not perfect, any downsides were trivial compared to the positive results it produced. In the current encoder market, it is our recommended encoder for Live Smooth Streaming scenarios.
We discussed the deficiencies we found in other products with encoder manufacturers and they all claimed that they would fix any such problems if we placed a large order. However, can you really depend on an encoder that has never been used in your scenario before? I think not – if I have a choice between playing the role of beta tester for an experimental product or using a product that already supports the features we need, I will pick the latter.
I’m looking forward to evaluating the next generation of Smooth Streaming capable encoders in the future – there is a lot of room for innovation and most of the products available are not yet mature enough to use in a large-scale deployment.
If you have any questions about encoder usage or would like to learn about our evaluation results in detail, please leave a comment or get in touch via e-mail!