Optimising Ruby on Rails database query

Optimising Ruby on Rails database query

ActiveRecord makes it easy to write queries but understanding how it works behind the hood is important for performance.

I was asked to investigate a very slow RoR application and determined that it was due to the way the ActiveRecord was written. These were the 2 models causing the problems:

# app/models/event.rb
class Event < ApplicationRecord
  scope :final, -> { where final: true }

  validates :event_number, uniqueness: { scope: :country_id }

  belongs_to :country
  has_many :races
  has_many :trials
  has_many :race_timings

  def calculate_completion runner
    pids = races.where.not run_type: 'trial'
    best_timings = pids.pluck(:id).map { |pid| runner.get_best_time nil, rid, false }
    best_timings << runner.get_best_time(id, nil, true)
    best_timings = best_timings.map { |e| e ? e.time : 0.0 }
    best_timings = best_timings.map { |e| !e.nil? && e>1.0 ? 1.0 : e } 
    pids.count==0 || runner.nil? ? 0.0 : (100.0 * best_timings.compact.sum / best_timings.count.to_f).ceil 
  end
end

# app/models/runner.rb
class Runner < ApplicationRecord
  has_many :trials
  has_many :race_timings

  validates :uid, uniqueness: true
  validates :email, uniqueness: true

  def get_best_time event_id, race_id, trial
    s = race_timings.where event_id: event_id, trial: true, best_time: true if trial==true
    s = race_timings.where race_id: race_id, trial: false, best_time: true if trial==false
    s.first
  end
end

It seems too easy to write database queries in Ruby on Rails, that some developers forgot about how it works behind the hood. The above code were taking approximately 28 seconds to complete. I've made some comments in the code to explain why:

  def calculate_completion runner
    # At this point, the query is not made yet.
    pids = races.where.not run_type: 'trial'

    # Inefficient 1: the .pluck() method triggers the query but the result is passed immediately to .map() method. 
    # Inefficient 2: each id then triggers a call to runner.get_best_time().
    best_timings = pids.pluck(:id).map { |pid| runner.get_best_time nil, rid, false }

    best_timings << runner.get_best_time(id, nil, true)
    best_timings = best_timings.map { |e| e ? e.time : 0.0 }
    best_timings = best_timings.map { |e| !e.nil? && e>1.0 ? 1.0 : e } 

    # Inefficient 3: since .pluck() result was retained, the .count() method triggers another query.
    pids.count==0 || runner.nil? ? 0.0 : (100.0 * best_timings.compact.sum / best_timings.count.to_f).ceil 
  end
  • It should be noted that the query is not called until some trigger. The method .pluck() is that trigger which then returns an array and is immediately chained to the method .map(). This means the result is not retained for use in the last method .count().

  • Within each .map() there were multiple calls to runner.get_best_time().

I re-wrote the methods:

class Event < ApplicationRecord
  scope :final, -> { where final: true }

  validates :event_number, uniqueness: { scope: :country_id }

  belongs_to :country
  has_many :races
  has_many :trials
  has_many :race_timings

  def calculate_completion runner
    # No change.
  end

  # Added new method for optimization.
  def calculate_completion_v2 runner
    # The .pluck() method is called and results retained.
    pids = races.where.not(run_type: 'trial').pluck(:id)

    # A single combined query instead.
    best_timings = runner.get_best_times(id, pids).to_a

    # No change to calculation.
    best_timings << runner.get_best_time(id, nil, true)
    best_timings = best_timings.map { |e| e ? e.time : 0.0 }
    best_timings = best_timings.map { |e| !e.nil? && e>1.0 ? 1.0 : e } 

    # The new multiple queries does not return results for some pids, thus have to pad it.
    ([pids.length + 1 - best_timings.length, 0].max).times.each{|i| best_timings << 0}

    # Using .length instead of .count() method.
    pids.length==0 || runner.nil? ? 0.0 : (100.0 * best_timings.compact.sum / best_timings.length.to_f).ceil 
  end

end


class Runner < ApplicationRecord
  has_many :trials
  has_many :race_timings

  validates :uid, uniqueness: true
  validates :email, uniqueness: true

  def get_best_time event_id, race_id, trial
    s = race_timings.where event_id: event_id, trial: true, best_time: true if trial==true
    s = race_timings.where race_id: race_id, trial: false, best_time: true if trial==false
    s.first
  end

  # Added a new method for combining queries.
  def get_best_times event_id, race_ids
    # Using a PostgreSQL specific syntax to combine the queries.
    race_timings.select('distinct on(race_id) *').where(race_id: race_ids, trial: false, best_time: true).order(:race_id, :id)
  end
end

The new code now takes about 8 seconds. The next optimisation is to create indexes:

CREATE INDEX race_timings_runner_best_time_idx ON public.race_timings
(
    runner_id,
    best_time
);

Now it is less than 3 seconds.