Virtual attributes in Rails can be useful for eliminating N+1 queries, which often lead to slower requests and increase the need for premature scaling.
What Are N+1 Queries?
N+1 queries can occur whenever we have records with has_many
associations that we lazy load in our application. When loading a set of records and then loading each of those records’ associated records, queries are issued for both the parent record and for every child record, thus generating N+1 queries. Doing this can potentially flood our database with queries, which is something we should avoid.
Recently, we ran into the N+1 query problem while working on a Rails API for a client project where we wanted to take in a geographic point and serve up a collection of resources, ordered by distance from that point, and also provide the distance for each resource in the collection. Iād like to share how we solved that problem using the #attribute method in Rails.
For the sake of illustration, let’s say we have a polling_location
model, and we want to find our nearest polling_locations
and calculate how far away each polling_location
is from us. Using the PostGIS extension for PostgreSQL, we can query for all polling_locations
ordered by distance by defining a method like this one:
class PollingLocation < ApplicationRecord def self.near(lat, lon, distance_in_meters = 500_000_000) where("ST_DWithin(lonlat, 'POINT(? ?)', ?)", lon, lat, distance_in_meters) .order("ST_Distance(lonlat, 'POINT(#{lon} #{lat})')") end end
This method takes longitude
and latitude
arguments and an optional distance_in_meters
argument that defaults to an arbitrarily large number (large enough to span the United States in this example). The lonlat
reference in the SQL string is a geography column on the polling_locations
table. With this geographic data, we can use PostGIS’ ST_DWithin
and ST_Distance
functions to query for all of the polling_locations
whose lonlat
is within a given distance to the provided coordinates and return these ordered by distance.
Now we can call PollingLocation.near(lat, lon)
to find our nearest polling_locations
. Great! But we also want to know how far each one is from us. How do we do that? We might start by writing a method to calculate distance for each polling_location
:
def distance(lat, lon) query = <<-QUERY SELECT ST_Distance(gg1, gg2) AS spheroid_dist FROM (SELECT ST_GeogFromText('SRID=4326;POINT(#{lon} #{lat})') As gg1, ST_GeogFromText('SRID=4326;#{lonlat}') As gg2 ) AS foo; QUERY PollingLocation.connection.select_one(query)['spheroid_dist'] end
This distance
method takes our coordinates and makes a SQL query using functions from PostGIS to calculate the distance in meters. Now we can call distance
on a polling_location
and see how far away it is. š
The above method works fine for a single polling_location
instance, but we want our API to serve up all of our polling_locations
near a given point and provide the calculated distance for every one. How can we do that? Well, we could call the distance
method for every polling_location
that is returned in our collection, but that would make a new SQL query for every object. Depending on how many records we have in our database, that could potentially mean thousands or hundreds of thousands of SQL queries for each API request. š¢ How can we do this without generating N+1 queries?
If we look closely at our code, we might notice that we are already doing the work of calculating distance in the ordering of the .near
method. What would be great is if we could take that distance we’ve already calculated and return that along with each polling_location
. Here’s where virtual attributes can help us.
Virtual Attributes To The Rescue
Using the #attribute method provided to us by Rails, we can add a virtual attribute to our model, which means an attribute with no database column backing it. Let’s call ours distance_in_meters
and give it a type of float
:
attribute :distance_in_meters, :float
With this in place, we can chain a .select
method to the query in our .near
method, taking the distance we’ve calculated and setting it dynamically as distance_in_meters
on our model:
def self.near(lat, lon, distance_in_meters = 500_000_000) where("ST_DWithin(lonlat, 'POINT(? ?)', ?)", lon, lat, distance_in_meters) .order("ST_Distance(lonlat, 'POINT(#{lon} #{lat})')") .select("\"polling_locations\".*, ST_Distance(lonlat, 'POINT(#{lon} #{lat})') AS distance_in_meters") end
Now, when we call PollingLocation.near(lat, lon)
, we get all of the polling_locations
ordered by distance, and each has a distance_in_meters
attribute that shows how many meters away each polling_location
is — all in a single SQL query!
We could leave this is as, but if we wanted to go one step further and serve up the distance in another unit of length, such as miles, we could define a distance conversion method on our PollingLocationSerializer
:
class PollingLocationSerializer < ActiveModel::Serializer attributes :id, :distance_in_meters, :distance_in_miles, :lonlat def distance_in_miles (object.distance_in_meters / 1609.34).round(2) end end
Now, let’s take a look at our new PollingLocation
model:
class PollingLocation < ApplicationRecord attribute :distance_in_meters, :float def self.near(lat, lon, distance_in_meters = 500_000_000) where("ST_DWithin(lonlat, 'POINT(? ?)', ?)", lon, lat, distance_in_meters) .order("ST_Distance(lonlat, 'POINT(#{lon} #{lat})')") .select("\"polling_locations\".*, ST_Distance(lonlat, 'POINT(#{lon} #{lat})') AS distance_in_meters") end end
With the help of Rails virtual attributes, we were able to arrive at a solution that avoids N+1 queries and helps us find our nearest polling locations more quickly. š
Special thanks to Dave Kroondyk, who helped me work out this solution and provided feedback on this blog post.