Realtime and streaming playback from fast I/O devices

RV has a number of features intended to make it possible to read 2k or greater directly from fast I/O devices. Because there are more than a few variables that determine I/O and decoding speed you should try to start with a simple set of parameters in RV and then adjust one at a time. If you adjust all of them at the same time it’s much harder to figure out a sweet spot.

We will try and update this with more information when we can. 

Quick Start

 RV is most often used on the artist desktop, so the default preferences are not configured for streaming I/O. The most important changes to make to enable streaming I/O are listed here. More detail and other options are described below:

  1. Turn on the Lookahead Cache
  2. Increase the number of Reader Threads in the RV preferences (experiment to find the best number)
  3. If necessary, try alternate I/O Methods in the per-format image preferences
  4. Use prefetch (with or without PBOs on) found in the Preferences->Render section
  5. On linux try running RV as root with -scheduler SCHED_RR -priorities 99 99 (see below) to make playback soft real-time 

System Requirements

We don't have a fixed system which we can recommend. People have been using a variety of different setups to get streaming play back. However you will definitely benefit if you have:

  • Lots of cores + procs
  • Lots of memory (6 Gb on a 64 bit machine is a good idea)
  • 64 bit machine
  • A recent GPU (doesn't always have to be a Quadro)
  • A good motherboard 
  • Some kind of fast I/O device. e.g a RAID, ioXTREME, etc. (A single "fast" disk drive probably won't cut it)
  • NVidia card. Cheapo 580 Quadro is pretty good, Newer is probably better. Newer Geforce cards are probably good too.

Streaming I/O Image Formats

Streaming will only work with these formats. If you manage to get it to work with something other than one of these you are a miracle worker.

  • DPX and Cineon -- especially 10 bit files
  • JPEG
  • TARGA (TGA)
  • TIFF -- typically RAW tiff works best, no compression
  • EXR or ACES

Multiple Reader Threads

It is necessary to use multiple reader threads to get any kind of fast I/O streaming. In order to do so you must be using the Look-Ahead cache. We recommend using the smallest possible value for the look ahead cache size. Some people have said that more than 1Gb (and sometimes A LOT more) works for them. Ideally this number is not more than 1Gb. 

If you don't have at least two cores available you will probably not get streaming play back.

Start with two threads. Optimize for speed with two threads and the try increasing the number after that. The number of threads can be controlled from the preferences under the caching tab. You need to restart RV when you change the number of threads.

The more cores and procs you have the better. The more memory you have the better.

Multiple EXR decoding threads

If you want to stream EXR files you may want to reserve some of your cores for decoding. The number of EXR decoding threads can be changed from the preferences under the formats tab. Try "automatic" first.

I/O Methods

Which I/O method to use is hard to determine without experimenting. You can change the I/O method from the preferences formats tab. The basic I/O methods are:

  • Standard (for some formats). This uses whatever is considered the standard or normal way to read these files. For example with EXR this uses the "normal" EXR I/O streams that come as part of the EXR libraries.
  • Buffered. The data is "streamed" as a single logical read of all data. The file data is allowed to reside in the filesystem cache if the kernel decides to do so. The file data is decoded after all the data is read.
  • Unbuffered. Similar to buffered except that RV provides a hint to the kernel that file data should not be put into the filesystem cache. In theory this could lead to faster I/O because a copy of the data is not created during reading. In practice we've never seen this really help. There's a lot of debate on the internet about whether Unbuffered I/O is useful or not. 
  • Memory Mapped. The file contents are mapped directly to main memory. This has the advantage that it may not be filesystem cached and the memory is easily reclaimed by the process when no longer needed. In some respects it is similar to the buffered method above. On windows this method can increase the speed of local I/O significantly compared to the buffered, unbuffered, and standard methods above.
  • Asynchronous Buffered. Similar to buffered above, however, the kernel may provide the data to RV in some random order instead of waiting to assemble the data in order itself. In addition the low-level I/O chunk size can be used to tune the I/O to maximize bandwidth. This method is most useful on Windows. On Linux and Mac Async I/O is not as useful in the context of RV because RV manages decode and read concurrency itself.
  • Asynchronous Unbuffered. Same as Asynchronous buffered, but a hint is provided to the kernel to omit storing the data in the filesystem cache if possible. This is the default method on Windows.

Rules of Thumb

  • Use the Prefetch option in the preferences. This will double the amount of VRAM required, but can significantly increase the amount of bandwidth between RV and the graphics card. This can be especially important when viewing stereo. Ideally, Pixel Buffer Objects (PBOs) are also turned on. Modern graphics cards benefit from turning on PBOs. For older cards you should experiment with them on and off -- even some pre-fermi Quadro cards can get worse performance with them on.
  • With NVIDIA quadro fermi and kepler GPUs (fermi 4000, 6000, kepler 4000, 5000, 6000) enable Multithreaded GPU Uploads for linux and windows. On Mac enable Apple Client Storage.
  • On Windows: try Asynchronous Unbuffered first. The only other method which might produce good results is Memory Mapped.
  • If you are on linux memory mapping may cause erratic play back (esp. with software RAID). Unbuffered or buffered, are the preferred methods. However, if it works for you, great.
  • The Unbuffered method may be a placebo on linux. It appears to depend on the type of filesystem you are reading from. For example on ubuntu 10.04 with ext4 it seems to be identical to buffered. It may actually be counterproductive over NFS.
  • For EXR don't use the standard method. Unbuffered is typically the best method if the file system supports it.
  • EXR B44 images are ideally subsampled as 4:2:0. Also, keep in mind that B44/A must be 16 bit. PIZ, ZIP, and ZIPS encoded EXR are CPU intensive and may require more EXR decoder threads.
  • If you have recent graphics card try setting the DPX and Cineon 10 bit display depth to 10 Bits/Pixel Reversed in the format preferences. This is the fastest and most color preserving method of dealing with 10 bit data in RV. If you have a "30 bit" capable monitor like an HP dreamcolor you can additionally put the X server or Windows in 30 bit mode to get both high precision color and fast streaming this way.
  • Displaying 10 bit DPX in 16 bit mode requires 2x the bandwidth from the I/O device AND to the graphics card that 8 bit mode does. If you can use the 10 bit mode for DPX/Cineon try 8 bit first.
  • If you I/O device has big latency, you may need to increase the number of reader threads dramatically to amortize the delay. It some cases it may be beneficial to use more threads than you have cores. This can happen with network storage.
  • Make sure RV's v-sync is not on at the same time that the driver's GL v-sync is on.
  • DPX files which are written so that pixel data starts 4096 bytes into the file, are little-endian, and which use four channel 8 bit, 3 channel 10 bit, or 4 channel 16 bit, and which have a resolution width divisible by 8. 
  • For all speed tests, be sure you're in "Play All Frames" mode (Control men) as opposed to "Realtime". In this context, "Realtime" means RV will skip video frames in order to keep pace with the audio.
  • Not all I/O methods are supported by all file systems. In particular, the Unbuffered I/O method may not be supported by the underly file system implementation.

A Note on Testing and the Filesystem Cache

All operating systems (that RV runs on) try to maximze IO throughput by holding some pages from the filesystem in memory. The algorithms can be quite hard to predict, but the upshot is that if you're trying to test realtime streaming IO performance this "assistance" from the OS can invalidate your numbers; to be clear, for testing, you want to ensure that you start each run with none of the sequence to be played in memory. One way to do this is with a very large test set. Say several sequences, each of several thousand frames. If the frames are big enough and you RAM is small enough, then swiching to new sequence for each testing run will ensure that no part of the new sequence is already in the filesystem cache.

But a more certain way to ensure that none of your frames are in the memory, which also lets you use the same sequence over and over for testing, is to forcibly clear the filesystem cache before each testing run. On Linux you can run this command:

sudo echo 1 > /proc/sys/vm/drop_caches

On Mac OSX, you can:

purge

Unfortunately we don't know how to clear the filesystem cache on windows, if you do please drop us a note!

Real-Time

As of RV version 3.10.8, on linux, you can tell RV to run as a real-time application. This mode enables the most stable possible playback on linux, especially if the machine as been set up with server time slice durations (which is often the case when the machine is tuned for maximum throughput). Ideally, RV runs with more and smaller time slices -- at least for display and audio threads.

To start RV in this mode use:

rv -scheduler SCHED_RR -priorities 99 99

RV linux will try and use either the FIFO or Round Robin scheduler in place of the normal linux scheduler in this case. In order to do so, it must have the capability CAP_SYS_NICE. This can be achieved in a number of ways, but currently we have only been able to make two of them work: running setuid root on the linux binary so it has root privileges or just running rv as root. Ideally, you can use setcap to mark the rv.bin binary on the filesystem so that it has only CAP_SYS_NICE so non-root users can run it real-time, but we have not been able to get this to work (probably because we don't really understand it yet).

On the Mac, RV 3.10.8 is a real-time app by default and does not require any special privileges.

On Windows, RV 3.10.8 will elevate its priorities as high as possible without admin privileges. 

NOTE: When RV runs with higher priorities, this is referring to only two of its threads: the display thread and the audio thread. Neither of these threads do much computational work. They are both usually blocked. So you shouldn't need to worry about RV consuming too many kernel resources.

Refresh Rate and V-Sync

The biggest hurdle to making playback absolutely smooth is to recognize the effects of playback FPS coupled with the monitor refresh rate. For example, it’s typically the case that an LCD monitor will have a refresh rate of ~60Hz by default (i.e. it refreshes 60 times a second). Playing 24 FPS material on a 60Hz monitor will result in something similar to a 3/2 pulldown. In order to get smoothest playback the ideal refresh rate would be 48 or 72Hz or some other multiple of 24. Also, be aware that a "60Hz" monitor may actually be 59.88Hz which means that even 30 FPS material will not play back perfectly smoothly. You need 59.88 / 2 (29.94) FPS for best results.

Multi-monitor Systems

Systems with multiple attached monitors have another problem: the monitor that RV is playing on may not be the monitor that it’s syncing to. If that happens the play back can become very irregular. On linux for example, the driver can only sync to one monitor -- it can't change the sync monitor once RV has started. On linux you can change which monitor the driver uses for sync by setting the environment variable __GL_SYNC_DISPLAY_DEVICE. Here's a relevant passage from nvidia's driver README:

When using __GL_SYNC_TO_VBLANK with TwinView, OpenGL can only sync to oneof the display devices; this may cause tearing corruption on the displaydevice to which OpenGL is not syncing. You can use the environment variable __GL_SYNC_DISPLAY_DEVICE to specify to which display device OpenGL shouldsync. You should set this environment variable to the name of a displaydevice; for example "CRT-1". Look for the line "Connected displaydevice(s):" in your X log file for a list of the display devices presentand their names. You may also find it useful to review Chapter 10, Configuring TwinView "Configuring Twinview," and the section on Ensuring Identical Mode Timings in Chapter 16, Programming Modes.

RV's V-Sync versus driver V-Sync

Finally, do not run RV with both the driver's GL v-sync on and RV's. This will almost guarantee bad playback. Use one or the other. You may want to experiment to see if one results in better timing than the other on your system. You can find these settings as follows

  • In RV Preferences under the Rendering tab, there is a "Video Sync" checkbox.
  • In the Nvidia-settings GUI there is an openGL 'Sync to VBlank' checkbox.
Nvidia recommends using the driver v-sync and disabling RV's v-sync if possible. On Linux, RV 3.12.12 will try to detect which monitor the driver is using for sync and warn you if it’s not the one RV is playing on (it outputs an INFO message in the shell or console window). In presentation mode, RV 3.12.12 will show a message box if the presentation device is not the sync device on linux.
 
See Also: V-Sync Article
Follow

4 Comments

  • 0
    Avatar
    John RA Benson

    This seems to be important:

    • Make sure RV's v-sync is not on at the same time that the driver's GL v-sync is on.

    Where is that set in RV or GL?

  • 0
    Avatar
    Seth Rosenthal

    Hey John, 

    I updated the article with this info.

    Cheers,

    Seth

  • 0
    Avatar
    Michael Kessler

    This article has been invaluable in recent setups.

    Two things I was hoping you comment on:

    1) Hyper-threading so-far seems to cause more trouble than its worth.  Playing back difficult media does load faster, but on large/difficult media I get more dropped frames.  This might just require further tuning of my threads, but is there any sort of HyperThreaded core avoidance in the playback threads (if that even makes sense; my recent reading seems to suggest disparity between physical cores and HyperThreaded logical cores)?

    2) When using SDI; my assumption is that no v-sync should be enabled; correct?

  • 0
    Avatar
    Jim Hourihan

    Hi Michael, the answer to #2 is yes turn off v-sync (in RV's prefs). It gets turned off no matter what when SDI presentation is turned on.

    #1 is a bit complicated from what I understand. Depending on the SDI output type (AJA, BM, or NVIDIA) we use one or two extra threads to feed the SDI device. I'd set aside at least one core if not two for AJA and BM devices if you're seeing dropped frames. The hyper threads are most useful in the case of computationally complex formats like EXR PIZ but can lead to too many threads in other cases. Also: this can vary between the platforms and how the time slices are allocated in the kernel. Basically you need to find the sweet spot manually unfortunately.

Please sign in to leave a comment.