RV has a number of features intended to make it possible to read 2k or greater directly from fast I/O devices. Because there are more than a few variables that determine I/O and decoding speed you should try to start with a simple set of parameters in RV and then adjust one at a time. If you adjust all of them at the same time its much harder to figure out a sweet spot.
We will try and update this with more information when we can.
RV is most often used on the artist desktop, so the default preferences are not configured for streaming I/O. The most important changes to make to enable streaming I/O are listed here. More detail and other options are described below:
- Turn on the Lookahead Cache
- Increase the number of Reader Threads in the RV preferences (experiment to find the best number)
- If necessary, try alternate I/O Methods in the per-format image preferences
- Use prefetch (with or without PBOs on) found in the Preferences->Render section
- On linux try running RV as root with -scheduler SCHED_RR -priorities 99 99 (see below) to make playback soft real-time
We don't have a fixed system which we can recommend. People have been using a variety of different setups to get streaming play back. However you will definitely benefit if you have:
- Lots of cores + procs
- Lots of memory (6 Gb on a 64 bit machine is a good idea)
- 64 bit machine
- A recent GPU (doesn't always have to be a Quadro)
- A good motherboard
- Some kind of fast I/O device. e.g a RAID, ioXTREME, etc. (A single "fast" disk drive probably won't cut it)
- NVidia card. Cheapo 580 Quadro is pretty good, Newer is probably better. Newer Geforce cards are probably good too.
Streaming I/O Image Formats
Streaming will only work with these formats. If you manage to get it to work with something other than one of these you are a miracle worker.
- DPX and Cineon -- especially 10 bit files
- TARGA (TGA)
- TIFF -- typically RAW tiff works best, no compression
- EXR or ACES
Multiple Reader Threads
It is necessary to use multiple reader threads to get any kind of fast I/O streaming. In order to do so you must be using the Look-Ahead cache. We recommend using the smallest possible value for the look ahead cache size. Some people have said that more than 1Gb (and sometimes A LOT more) works for them. Ideally this number is not more than 1Gb.
If you don't have at least two cores available you will probably not get streaming play back.
Start with two threads. Optimize for speed with two threads and the try increasing the number after that. The number of threads can be controlled from the preferences under the caching tab. You need to restart RV when you change the number of threads.
The more cores and procs you have the better. The more memory you have the better.
Multiple EXR decoding threads
If you want to stream EXR files you may want to reserve some of your cores for decoding. The number of EXR decoding threads can be changed from the preferences under the formats tab. Try "automatic" first.
Which I/O method to use is hard to determine without experimenting. You can change the I/O method from the preferences formats tab. The basic I/O methods are:
- Standard (for some formats). This uses whatever is considered the standard or normal way to read these files. For example with EXR this uses the "normal" EXR I/O streams that come as part of the EXR libraries.
- Buffered. The data is "streamed" as a single logical read of all data. The file data is allowed to reside in the filesystem cache if the kernel decides to do so. The file data is decoded after all the data is read.
- Unbuffered. Similar to buffered except that RV provides a hint to the kernel that file data should not be put into the filesystem cache. In theory this could lead to faster I/O because a copy of the data is not created during reading. In practice we've never seen this really help. There's a lot of debate on the internet about whether Unbuffered I/O is useful or not.
- Memory Mapped. The file contents are mapped directly to main memory. This has the advantage that it may not be filesystem cached and the memory is easily reclaimed by the process when no longer needed. In some respects it is similar to the buffered method above. On windows this method can increase the speed of local I/O significantly compared to the buffered, unbuffered, and standard methods above.
- Asynchronous Buffered. Similar to buffered above, however, the kernel may provide the data to RV in some random order instead of waiting to assemble the data in order itself. In addition the low-level I/O chunk size can be used to tune the I/O to maximize bandwidth. This method is most useful on Windows. On Linux and Mac Async I/O is not as useful in the context of RV because RV manages decode/read concurrency itself.
- Asynchronous Unbuffered. Same as Asynchronous buffered, but a hint is provided to the kernel to omit storing the data in the filesystem cache if possible. This is the default method on Windows.
Rules of Thumb
- Use the Prefetch option in the preferences. This will double the amount of VRAM required, but can significantly increase the amount of bandwidth between RV and the graphics card. This can be especially important when viewing stereo. Ideally, Pixel Buffer Objects (PBOs) are also turned on. Modern graphics cards benefit from turning on PBOs. For older cards you should experiment with them on and off -- even some pre-fermi Quadro cards can get worse performance with them on.
- With NVIDIA quadro fermi and kepler GPUs (fermi 4000, 6000, kepler 4000, 5000, 6000) enable Multithreaded GPU Uploads for linux and windows. On Mac enable Apple Client Storage.
- On Windows: try Asynchronous Unbuffered first. The only other method which might produce good results is Memory Mapped.
- If you are on linux memory mapping may cause erratic play back (esp. with software RAID). Unbuffered or buffered, are the preferred methods. However, if it works for you, great.
- The Unbuffered method may be a placebo on linux. It appears to depend on the type of filesystem you are reading from. For example on ubuntu 10.04 with ext4 it seems to be identical to buffered. It may actually be counterproductive over NFS.
- For EXR don't use the standard method. Unbuffered is typically the best method if the file system supports it.
- EXR B44 images are ideally subsampled as 4:2:0. Also, keep in mind that B44/A must be 16 bit. PIZ, ZIP, and ZIPS encoded EXR are CPU intensive and may require more EXR decoder threads.
- If you have recent graphics card try setting the DPX and Cineon 10 bit display depth to 10 Bits/Pixel Reversed in the format preferences. This is the fastest and most color preserving method of dealing with 10 bit data in RV. If you have a "30 bit" capable monitor like an HP dreamcolor you can additionally put the X server or Windows in 30 bit mode to get both high precision color and fast streaming this way.
- Displaying 10 bit DPX in 16 bit mode requires 2x the bandwidth from the I/O device AND to the graphics card that 8 bit mode does. If you can use the 10 bit mode for DPX/Cineon try 8 bit first.
- If you I/O device has big latency, you may need to increase the number of reader threads dramatically to amortize the delay. It some cases it may be beneficial to use more threads than you have cores. This can happen with network storage.
- Make sure RV's v-sync is not on at the same time that the driver's GL v-sync is on.
- DPX files which are written so that pixel data starts 4096 bytes into the file, are little-endian, and which use four channel 8 bit, 3 channel 10 bit, or 4 channel 16 bit, and which have a resolution width divisible by 8.
- For all speed tests, be sure you're in "Play All Frames" mode (Control men) as opposed to "Realtime". In this context, "Realtime" means RV will skip video frames in order to keep pace with the audio.
- Not all I/O methods are supported by all file systems. In particular, the Unbuffered I/O method may not be supported by the underly file system implementation.
A Note on Testing and the Filesystem Cache
All operating systems (that RV runs on) try to maximze IO throughput by holding some pages from the filesystem in memory. The algorithms can be quite hard to predict, but the upshot is that if you're trying to test realtime streaming IO performance this "assistance" from the OS can invalidate your numbers; to be clear, for testing, you want to ensure that you start each run with none of the sequence to be played in memory. One way to do this is with a very large test set. Say several sequences, each of several thousand frames. If the frames are big enough and you RAM is small enough, then swiching to new sequence for each testing run will ensure that no part of the new sequence is already in the filesystem cache.
But a more certain way to ensure that none of your frames are in the memory, which also lets you use the same sequence over and over for testing, is to forcibly clear the filesystem cache before each testing run. On Linux you can run this command:
sudo echo 1 > /proc/sys/vm/drop_caches
On Mac OSX, you can:
Unfortunately we don't know how to clear the filesystem cache on windows, if you do please drop us a note!
As of RV version 3.10.8, on linux, you can tell RV to run as a real-time application. This mode enables the most stable possible playback on linux, especially if the machine as been set up with server time slice durations (which is often the case when the machine is tuned for maximum throughput). Ideally, RV runs with more and smaller time slices -- at least for display and audio threads.
To start RV in this mode use:
rv -scheduler SCHED_RR -priorities 99 99
RV linux will try and use either the FIFO or Round Robin scheduler in place of the normal linux scheduler in this case. In order to do so, it must have the capability CAP_SYS_NICE. This can be achieved in a number of ways, but currently we have only been able to make two of them work: running setuid root on the linux binary so it has root privileges or just running rv as root. Ideally, you can use setcap to mark the rv.bin binary on the filesystem so that it has only CAP_SYS_NICE so non-root users can run it real-time, but we have not been able to get this to work (probably because we don't really understand it yet).
On the Mac, RV 3.10.8 is a real-time app by default and does not require any special privileges.
On Windows, RV 3.10.8 will elevate its priorities as high as possible without admin privileges.
NOTE: When RV runs with higher priorities, this is referring to only two of its threads: the display thread and the audio thread. Neither of these threads do much computational work. They are both usually blocked. So you shouldn't need to worry about RV consuming too many kernel resources.
Refresh Rate and V-Sync
The biggest hurdle to making playback absolutely smooth is to recognize the effects of playback FPS coupled with the monitor refresh rate. For example, its typically the case that an LCD monitor will have a refresh rate of ~60Hz by default (i.e. it refreshes 60 times a second). Playing 24 FPS material on a 60Hz monitor will result in something similar to a 3/2 pulldown. In order to get smoothest playback the ideal refresh rate would be 48 or 72Hz or some other multiple of 24. Also, be aware that a "60Hz" monitor may actually be 59.88Hz which means that even 30 FPS material will not play back perfectly smoothly. You need 59.88 / 2 (29.94) FPS for best results.
Systems with multiple attached monitors have another problem: the monitor that RV is playing on may not be the monitor that its syncing to. If that happens the play back can become very irregular. On linux for example, the driver can only sync to one monitor -- it can't change the sync monitor once RV has started. On linux you can change which monitor the driver uses for sync by setting the environment variable __GL_SYNC_DISPLAY_DEVICE. Here's a relevant passage from nvidia's driver README:
When using __GL_SYNC_TO_VBLANK with TwinView, OpenGL can only sync to oneof the display devices; this may cause tearing corruption on the displaydevice to which OpenGL is not syncing. You can use the environment variable __GL_SYNC_DISPLAY_DEVICE to specify to which display device OpenGL shouldsync. You should set this environment variable to the name of a displaydevice; for example "CRT-1". Look for the line "Connected displaydevice(s):" in your X log file for a list of the display devices presentand their names. You may also find it useful to review Chapter 10, Configuring TwinView "Configuring Twinview," and the section on Ensuring Identical Mode Timings in Chapter 16, Programming Modes.
RV's V-Sync versus driver V-Sync
Finally, do not run RV with both the driver's GL v-sync on and RV's. This will almost guarantee bad playback. Use one or the other. You may want to experiment to see if one results in better timing than the other on your system. You can find these settings as follows
- In RV Preferences under the Rendering tab, there is a "Video Sync" checkbox.
- In the Nvidia-settings GUI there is an openGL 'Sync to VBlank' checkbox.