Optimizing filters and API scripts to make your Shotgun site faster

Shotgun's flexibility allows for customization of pages through the filter panel and the development of sophisticated API-based integration scripts. This flexibility makes it easy to create complex or advanced filters that are intuitively sensible, but in practice perform poorly and consume large amounts of system resources.

The purpose of this document is to help Shotgun users, especially those writing API scripts, understand how to make complex filters that are fast and efficient. Most filters that are well-written finish in a small fraction of a second. But inefficient filters that return exactly the same data may take seconds, or even minutes, to complete.

To optimize filter performance, we recommend that you:

  • Minimize the number of table rows the database must load and process.
    • This is done most efficiently by using filters on columns of the main entity’s table, not fields belonging to other tables, such as linked fields, multi-entity fields, tags, and URL fields.
  • Avoid text-matching filters because they are slow and should not be used by themselves for this purpose.

A slow filter example

filter example

The Version page filter example above performs slowly for the following reasons:

  1. All of the filters are “text-matching” filters (“name contains”, “contains”, “starts with”, etc.), which are slow for the database to evaluate.
  2. The number of Versions that have to be checked for these text matching filters is very large—all Versions!
  3. A number of the filters are on linked fields, so the database must draw data from other tables to evaluate the filters.

Furthermore, since the Versions table is often the fastest growing table, this filter will become slower and slower as time goes on.

To improve the performance of a complex filter, add filters that can more efficiently eliminate most Versions from consideration. This ensures that the slow text-matching queries do not have to be evaluated for many Versions.

What works for you will depend on your workflow. If your workflow depends very heavily on text-matching filters or linked field filters, you may find you need to tweak your workflow to give yourself better leverage to filter out Versions or other entities efficiently. One example of an effective workflow change is to make better use of status fields.

Improving the filter example

improved filter example

Now we’ve added a group of three new filters that are more efficient. These three filters will improve this complex filter by eliminating the majority of the studio’s Versions, so the slow pattern-matching and linked-field filters are evaluated on far fewer Versions.

  • Project: We have explicitly restricted the project with an “is” filter. This will efficiently eliminate all Versions in all other past and present projects, which should make a big difference if the studio has many projects. This may require a workflow change to keep the project filter up to date, because the users of this page have been used to collecting all VFX Editorial versions regardless of the project.
  • Status: It is a good “best practice” to update each Version’s status from “Pending Review” to “Viewed” once they have been reviewed. This makes it easy to filter down Version pages to only the small number of relevant new Versions.
  • Date Created (or Date Updated): These fields, which all entities have, are often useful to efficiently filter down page content.

Filter performance in more detail

1. Text-matching filters are slow.

The reason text-matching filters are slow is because the database must do a complicated test against every table row that could be part of the results. There are no optimization strategies available for these filter types.

This guideline does not mean you should never use text-matching filters, only that in addition to text-matching filters, you must use other kinds of more efficient filters to reduce results (which is the second guideline, discussed in detail below). Those other filters will ensure your text-matching filters are applied to fewer table rows and therefore consume fewer resources.

Here are the text-matching filter types that are inherently slow:

  • contains
  • does not contain (not_contains)
  • starts with (starts_with)
  • ends with (ends_with)
  • name contains (name_contains)
  • name doesn't contain (name_not_contains)

Text-matching queries are especially slow when they are:

  • the only filters on an entity search
  • part of an "or" ("any of") filter group

Then the result may be hundreds or thousands of times slower than a complex filter that uses another efficient filter type to reduce the number of results.

If you are currently dependent on text-matching filters for API scripts that are performing poorly, you may want to look at the section called Changing your Integration Approach to Avoid Text-Matching.

2. Reduce the number of entities returned by using a simple, efficient filter that operates on the entity's own fields.

This is the most important principle for making your complex filters perform well.

An efficient filter on the entity's own fields allows the database to immediately and quickly eliminate the majority of the entity's table rows from consideration. Therefore the database processes less data in subsequent steps. All remaining parts of the complex filter, such as other filters, sorting, or grouping, become faster as a result, compounding the benefit.

Making the final result set smaller reduces the data formatting and network resources for the response to the API script or Shotgun page, which further improves performance.

What kinds of filters efficiently improve performance this way?

  1. An efficient "reducing" filter must significantly reduce the number of entities under consideration.
    If there are 1 million Versions, filtering on a project that has only 8,000 Versions should be helpful. On the other hand, filtering on a status like "viewed" that includes 800,000 Versions is not likely to improve performance.
  2. The filter must be inside an AND ('all of') filter group.
    Note that the top-level array of conditions in filters in the python API are an AND group. In the following example, the filter on "entity" likely provides a good reduction in rows:
    EFFICIENT_filters = [[ "entity", "is", { "type": "Asset", "id": 9 } ]]

    However, if the filter is in a subgroup that's inside other filter groups, it must not be inside any OR ('any of') filter group at any level above it. In the following example, the filter on "entity" probably does not improve performance:
    SLOW_filters = [ 
      {
        "filter_operator":"or",
        "filters":[
          [ "code", "contains", "e" ],
          [ "entity", "is", { "type": "Asset", "id": 9 } ]
        ]
      }
    ]

    A filter that's inside an OR group at any level does not necessarily get applied to all table rows, so it does not ensure that the number of rows under consideration is reduced. (See the Advanced Performance section below for an exception.)
  3. The filter field is not a linked field.
    Filters on linked fields are not a fast way to eliminate rows because they require the database to find and evaluate rows in another entity's table. You may need filters on linked fields, but also use additional filters on fields that belong to the entity itself to efficiently eliminate rows. (See the Advanced Performance section below for an exception.)
  4. The filter field's data type is not Multi-entity or Tag-list.
    Multi-entity and tag fields are not fast because they are similar to linked fields. Filters on these fields require the database to evaluate rows in other tables, which is not an efficient way to eliminate rows from the main table.
    There is nothing wrong with filtering on multi-entity fields or tag fields, but for good performance, you need additional filters that will more efficiently reduce the number of entities.
    Note that unlike multi-entity fields, single-entity fields are fast and can be a good "reducing" filter. This is because they belong to the main entity table.
  5. The filter is not a text-matching filter type.
    Text matching filters are slow and can't be optimized well by the database.
    All of our other filter types, such as "is", "is_not", "greater than", "is after", "in the next __ weeks", etc., are relatively efficient, as long as the filter eliminates a significant number of rows.

Example

Here are filters that will likely perform slowly for any site with a large number of Versions:

SLOW_filters = [
  {
    "filter_operator":"or",
    "filters":[
      [ "code", "contains", "_rev_" ],
      [ "code", "contains", "_redo_" ]
    ]
  }
]

Some obvious candidates for eliminating rows are restrictions on the project or a date range. Below is an example that will try both.

Note that the top level filter is an OR, ("any of") filter. Make sure the new reducing filters are outside the OR filter group.

EFFICIENT_filters = [
  [ "project", "is", { "type": "Project", "id": 49 } ],
  [ "updated_at", "in_last", [ 1, "WEEK" ] ],
  {
    "filter_operator":"or",
    "filters":[
      [ "code", "contains", "_rev_" ],
      [ "code", "contains", "_redo_" ]
    ]
  }
]

Benefitting from database indexes

Some fields can provide especially efficient filtering, because they are indexed in the database. A filter on an indexed field allows the database to determine which rows to consider by consulting an efficient hash index of values of the indexed field.

Note: Using an indexed field provides absolutely no benefit unless you follow the other filter guidelines provided. The goal is to efficiently reduce the number of rows processed, and using an indexed field will help.

Database indexes can improve performance of some reads from the database, but they slow down writes to the database—because the index must be updated on each write. Therefore Shotgun does not index every field in every table.

For each entity type, by default, Shotgun indexes these common fields:

  • Id (id)
  • Project (project)
  • Status (sg_status_list)
  • Date Created (created_at)
  • Date Updated (updated_at)

A positive filter, such as 'is' or 'greater than,' that uses one of these fields may radically improve performance by eliminating rows from consideration much more quickly than filters on non-indexed fields.

Note that negative filters, such as 'is not', do not usually allow the database to use an index. However, negative filters are still useful for improving performance, as long as they significantly reduce the number of rows that must be processed.

As an example of the effectiveness of indexes, our API event polling script uses a filter on the Event Log Entry id field to restrict Event Log Entry table rows to a small number of new rows only (e.g., "Id is greater than N"). This filter performs extremely quickly for very large client databases, even though the Event Log Entry table is by the largest table.

However, the database uses complex heuristics and statistics, that it maintains for each index, in order to decide when to employ an index, or which one to use. As a result, it is hard to predict when you will gain performance improvements by using a filter on one of these indexed fields.

Therefore it is important to provide an efficient filter that significantly reduces the number of rows under consideration, not necessarily on an indexed field. Do not spend a lot of extra effort using an indexed field that may not actually be suitable for the case at hand.

3. Linked fields add cumulative cost per filter and per row filtered.

Linked fields are not especially slow, though they add some overhead because they evaluate rows in additional tables.

If an efficient filter exists to eliminate most rows from the main entity table, then the complex filter may also have other filters on linked fields without a significant negative performance impact.

On the other hand, if the number of table rows to be evaluated is large, then each filter, sort, or grouping on a linked field is going to add a much more significant cost to a complex filter. Additionally, "double hop" linked fields will have an even larger impact, since they require rows in more tables to be evaluated.

Changing your integration approach to avoid text-matching

If you find that your API integration depends heavily on text-matching filters, or filters on linked fields, without applying faster types of filters according to the second guideline, then your integration is unlikely to scale well as your production activity increases. You should consider looking for an approach that depends on exact matches on fields belonging to the entity itself—it will pay off, even if it requires some work. For example, you may need to automatically update a status or add and populate a new list or text field that you will be able to use with exact matching values.

Here is an example API summary filter that depends on text matching to identify PublishedFile animation file types for a report in a hypothetical animation workflow:

filters = [
  {
    "filter_operator":"or","filters":[
      [ "path_cache", "contains", "/anim_face" ],
      [ "path_cache", "contains", "/anim_fine" ]
    ]
  }
]
summaries = sg.summarize(entity_type='PublishedFile', filters, summary_fields=[{'field':'id', 'type':'count'}])

There are many possibilities to further restrict this filter, such as limiting the project, date range, or PublishedFileType.

However, if identifying specific animation file types is the only acceptable means for the workflow, then adding a new custom list column for resource type (that always gets set by a custom publish script or some other way), would improve performance. Here is what that filter would look like using a field called "My Resource Subtype":

filters = [
  {
    "filter_operator":"or",
    "filters":[
      [ "sg_my_resource_subtype", "is", "anim_face" ],
      [ "sg_my_resource_subtype", "is", "anim_fine" ]
    ]
  }
]
summaries = sg.summarize(entity_type='PublishedFile', filters, summary_fields=[{'field':'id', 'type':'count'}])

This is still likely to be a slow filter if there are very many 'anim_face' and 'anim_fine' PublishedFiles, but it should be much faster than the previous text-matching version. (See the Advanced Performance section below if you are wondering why this OR ("any of") filter may improve performance.)

Advanced performance notes

Using OR ("any of") groups

In practice, some combinations of filters in an OR group could still eliminate table rows efficiently enough to improve complex filter performance. That would depend on the particular filters, of course, and the degree to which they eliminate table rows. See the example above in the Changing Your Integration Approach section.

Using linked fields

In practice, a filter on a linked field through an Entity field may still get good results. This is because a linked field specifies the entity type for the linked field value. A filter linked up through an Entity field may eliminate most rows in the parent entity table if very few values in the Entity field belong to the linked entity type. Note that this does not apply to multi-entity fields.

Below is an example of a filter for Tasks:

filters = [[ "entity.Sequence.sg_status_list", "is", "hld" ]]

For example, the filter above for Tasks may actually perform well if very few Tasks have a Sequence in their "entity" field. That is because only the small number of Task rows that have a Sequence in the entity column will be processed.

On the other hand, these filters will perform poorly if millions of Tasks are linked to Sequences, because those millions of Task rows will have to be processed.

For more information on how to get the most out of Shotgun, see our checklist of best practices.

Follow

0 Comments

Please sign in to leave a comment.