Django ORM performance tips - part 1

This blog post will be split into two part. After I finish part two, I will add a link at the bottom of this one. For both parts I will use a sample project with sqlite3 database. models.py file for sample application is here:

Django encapsulates SQL queries with its ORM infrastructure. This is a very nice feature for developers, since it makes implementing database based code easy. Code base also gets a lot smaller.

On the other hand, ORM abstraction hides what really is happening behind the scenes. If developer is not careful enough, application might be doing unnecessary queries that really hurts the performance of the application. Here are some tips that every django developer should follow;

Know how querysets work

Django querysets run their queries only once. If you try to read the queryset second time results will come from the fetched values. The trick is to know which ORM methods are creating a new queryset. Here is an example:

   el = Employee.objects.filter(birthday__year=1976) # creates a queryset
   for e in el: # runs the query at first iteration and fetches all results.
       print e.name

   for e in el: # since query is fetched. this loop will not run any other query
       print e.birthday.day,e.birthday.month

In line one, we are creating a queryset and assigning it to the variable 'el'. Creating a queryset does not query database but creates the queryset so it will query database first time it needs to. Line two is the place query runs. First iteration of the for loop is the first time we need data from the queryset and that's why this is the place the query runs. Here is the SQL query line two will run:

      SELECT "companies_employee"."id",
               "companies_employee"."name",
               "companies_employee"."last_name",
               "companies_employee"."birthday",
               "companies_employee"."department_id"
      FROM "companies_employee"
      WHERE "companies_employee"."birthday" BETWEEN 1976-01-01
              and 1976-12-31 23:59:59.999999

Since the queryset is already fetched its results, second for loop does not run any other query.

count() vs len

Sometimes you need to learn the row count of query. There are two ways to accomplish this:

You can use len() function to determine the size of resultset.

len function will run the query and count the result rows. It will fetch the query. So if you will already fetch the resultset, you might choose to use len function. As a coding practice I don't recommend using len.

   el = Employee.objects.filter(birthday__year=1976)
   len(el)

In this example code len function will run the following query.

    SELECT "companies_employee"."id",
             "companies_employee"."name",
             "companies_employee"."last_name",
             "companies_employee"."birthday",
             "companies_employee"."department_id"
    FROM "companies_employee"
    WHERE "companies_employee"."birthday" BETWEEN 1976-01-01
           and 1976-12-31 23:59:59.999999
You can use count() method of queryset.

Count method creates a new queryset which constructs a count SQL query. So this new queryset will not fetch any results but return the row count of queryset.

         el = Employee.objects.filter(birthday__year=1976)
         el_count1 = el.count()
         el_count2 = el.count()
         el_count3 = len(el)
         el_count4 = el.count()

First line creates a queryset(But does not run the query). In second line el.count() creates a new queryset. This queryset runs the following query:

      SELECT COUNT(*)
      FROM "companies_employee"
      WHERE "companies_employee"."birthday" BETWEEN 1976-01-01
               and 1976-12-31 23:59:59.999999

In the third line, we create another queryset and run the query again(This is a new queryset. So result do not come from fetched value).

In fourth line, we use len function to get size of the queryset. This method will run the actual query. And return the raw count. This will run a lot slower than count query but queryset results will be fetched. Finally the fifth line creates a new queryset. But does not run the query because parent queryset already fetched the results so child queryset will use those results.

Sliced querysets creates a new queryset

When we slice a queryset, this will create a new queryset and newly created querysets will always run a new query. But if we fetch the parent query first, child querysets will use parent query's fetched data. Here is an example:

   el = Employee.objects.filter(birthday__year=1976)
   e0 = el[0] #runs the query
   el1 = el[:5] #creates a new queryset
   el2 = el[5:10] #creates a new queryset
   e_count = len(el) # fetches the parent query
   e1 = el[0] #parent query fetched. No query runs
   el1 = el[:5] #new queryset is created with fetched values from parent
   el2 = el[5:10]

As you can see, If a queryset is created slicing an already fetched queryset, new queryset uses fetched values from parent queryset.

Some methods just do not get fetched

Some calls will never get fetched so they always will run their query.

Manager methods: objects (for example Employee.objects) attirubute of a model object is a Query manager. Query managers are starting point of the django queries so their methods will always run their query:

    el1 = Employee.objects.filter(birthday__year=1980)
    print len(el1)
    el2 = Employee.objects.filter(birthday__year=1980)
    print len(el2)

    el1 = Employee.objects.all()
    print len(el1)
    el2 = Employee.objects.all()
    print len(el2)

Here we call filter and all methods of query manager twice. and then call len function on them to trigger query execution. and all 4 len methods will run 4 queries. manager methods will not use any fetched values.

filter method: When you call filter method of either query manager or queryset, new queryset will form and it will always run its query.

   el1 = Employee.objects.filter(birthday__year=1980)
   len(el1)
   el2 = el1.filter(birthday__month=9)
   len(el2)

In this example we create el1 queryset and fetch it with len function. Then we create el2 queryset and call len on it. Unlike slicing or count method, filter method will always run its query even if parent query is already fetched.

In line two following query will run:

    SELECT "companies_employee"."id",
           "companies_employee"."name",
           "companies_employee"."last_name",
           "companies_employee"."birthday",
           "companies_employee"."department_id"
    FROM "companies_employee"
    WHERE "companies_employee"."birthday" BETWEEN 1980-01-01
          and 1980-12-31 23:59:59.999999

in line four (second len function), query changes:

    SELECT "companies_employee"."id",
           "companies_employee"."name",
           "companies_employee"."last_name",
           "companies_employee"."birthday",
           "companies_employee"."department_id"
    FROM "companies_employee"
    WHERE ("companies_employee"."birthday" BETWEEN 1980-01-01
          and 1980-12-31 23:59:59.999999
          AND django_extract('month',
                      "companies_employee"."birthday") = 9)

Second part of this post can be found here

blog comments powered by Disqus