Django ORM performance tips - part 2

This is the second part of the two-part series. First part can be found here. Here are some more tips that every django developer should know in order to write faster apps.

If you need only id of a related object use <modelname>_id variable

Lets say we need company id of an employee. We can get it with

employee.department.company.id

Here, we have an employee object. And we are querying its related department object and company object. Following that, we are getting company object and reading its id. Executed queries are:

SELECT "companies_department"."id",
       "companies_department"."name",
       "companies_department"."company_id"
 FROM "companies_department"
 WHERE "companies_department"."id" = 8

 SELECT "companies_company"."id",
        "companies_company"."name"
 FROM "companies_company"
 WHERE "companies_company"."id" = 1

The problem here is that department table already has company_id field. Ids of related models are mapped to an attribute named <related model name>_id. For example, every employee object has department_id field. The value of this field is coming from employee table but not from department table. Also every department object has company_id field which gets its value from company_id field of the department table. So, instead of using the call above, we can use:

employee.department.company_id

Which runs only one query:

SELECT "companies_department"."id",
       "companies_department"."name",
       "companies_department"."company_id"
FROM "companies_department"
WHERE "companies_department"."id" = 8

So, if you only need the id of a related object, use <related model name>_id attribute of the model to save an extra query.

For simple inner joins use select_related method

Lets say we need an employee with id 401. We also need his/her department and company name. For this, we can do:

e = Employee.objects.get(id=401)
print e.name #prints "employee name 29"
print e.department.name #prints department7
print e.department.company.name #prints company0

Those 4 lines run following queries

--runs at line 1
SELECT "companies_employee"."id",
       "companies_employee"."name",
       "companies_employee"."last_name",
       "companies_employee"."birthday",
       "companies_employee"."department_id"
FROM "companies_employee"
WHERE "companies_employee"."id" = 401

--runs at line 3
SELECT "companies_department"."id",
       "companies_department"."name",
       "companies_department"."company_id"
FROM "companies_department"
WHERE "companies_department"."id" = 8

--runs at line 4
SELECT "companies_company"."id",
       "companies_company"."name"
FROM "companies_company"
WHERE "companies_company"."id" = 1

Instead of running three queries, we can run three queries all at once with an inner join using select_related method:

e = Employee.objects.select_related('department','department__company').get(id=401)
print e.name #prints "employee name 29"
print e.department.name #prints department7
print e.department.company.name #prints company0

In this version, we are querying department and company tables with employee table. Here is the query we will execute

-- runs at line 1
SELECT "companies_employee"."id",
       "companies_employee"."name",
       "companies_employee"."last_name",
       "companies_employee"."birthday",
       "companies_employee"."department_id",
       "companies_department"."id",
       "companies_department"."name",
       "companies_department"."company_id",
       "companies_company"."id",
       "companies_company"."name"
FROM "companies_employee"
INNER JOIN "companies_department" ON ("companies_employee"."department_id" = "companies_department"."id")
INNER JOIN "companies_company" ON ("companies_department"."company_id" = "companies_company"."id")
WHERE "companies_employee"."id" = 401

If your are familiar with select_related method, you probably know that we could use this method like

e = Employee.objects.select_related().get(id=401)

This code runs the exact same query with the first select_related example. If you don't specify model names in parameters, select_related will include all relationships and will go as deep as possible. So, if we add other relationships to employee table, this code would also add them to its query. So I recommend using select_related method with model names specified.

There is one little problem with select_related method. For every row of the result set, django creates one object for queried table and one object for each related table. Lets say we need every employee and department for companies with ids 4,5 and 6. To achieve this, we can do following query:

    Employee.objects.select_related('department__company').filter(department__company__id__in=[4,5,6])

This line will run an inner join query with employee, department and company tables. Normally, we need to have 3 companies, 42 departments and 2374 employees. However, when we run the query, django will create 2374 employee, 2374 department and 2374 company objects. Creating that many model objects will cause a performance bottleneck. So, if you know that you will get too many rows with your query, you might actually choose to use three separate queries for every model. This way, it will take up a lot less memory. Another solution would be using values or values_list method as described below.

Use "only" method to limit fields

At the example above, we only needed employee, department and company name. However, this query fetches all the fields of our three tables. In this case, it is a better idea to restrict query fields. By doing this, resulting objects will use less memory space and will be initialized faster.

e = Employee.objects.select_related('department','department__company').only('name','department__name','department__company__name').get(id=401)

This line run following query

SELECT "companies_employee"."id",
       "companies_employee"."name",
       "companies_employee"."department_id",
       "companies_department"."id",
       "companies_department"."name",
       "companies_department"."company_id",
       "companies_company"."id",
       "companies_company"."name"
FROM "companies_employee"
INNER JOIN "companies_department" ON ("companies_employee"."department_id" = "companies_department"."id")
INNER JOIN "companies_company" ON ("companies_department"."company_id" = "companies_company"."id")
WHERE "companies_employee"."id" = 401

No matter how many fields employee, department and company tables have, this query will always return with same number of fields.

Use values and values_list methods if you can

Values method returns a query_set object that holds dictionaries instead of django objects as it would by default. Every dictionary will have table field names as keys and field values as values.

values_list method is almost same as values method. The difference is returned query_set will hold tuples instead of dictionaries. If your query returns a few rows, using those methods probably will not make much difference. However, if you need to process thousands of rows, your program will use like 10 times less memory than using django models. It also will run 10 times faster. Lets rewrite the example that we queried employees of a company. We will again query Employees and departments of companies that have ids 4,5 and 6:

e = Employee.objects.filter(department__company__id__in=[4,5,6]).values('name','department__name','department__company__name')

Result of this query will return a query set that holds dictionary of fields. An example dictionary is following:

print e[0]
"""{'department__company__name': u'company3',
    'department__name': u'department0',
    'name': u'employee name 0'}"""

We can also use values_list method here.

e = Employee.objects.filter(department__company__id__in=[4,5,6]).values_list('name','department__name','department__company__name')
print e[0]
# (u'employee name 0', u'department0', u'company3')

values_list is especially helpful if you only need one field from your query. Lets say you need only employee ids from this query:

e = Employee.objects.filter(department__company__id__in=[4,5,6]).values_list('id')
# [(1817,), (1818,), (1819,), (1820,), (1821,), (1822,), (1823,), (1824,),...]
result_list = zip(*e)[0] # next example is a better solution to this
# (1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824,...)

Here, we got a query set of tuples which only have employee ids inside. We zip the query set to have whole ids in an iterable. Better way to this would be:

e = Employee.objects.filter(department__company__id__in=[4,5,6]).values_list('id',flat=True) # Notice flat=True parameter
# [1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832, 1833, 1834,....

blog comments powered by Disqus
- Follow yilmaz_huseyin on Twitter