While building OnCompare, we’ve been adding specifications to test the application at different levels. These come in three varieties: tests of the services and models (using the database to exercise ActiveRecord and assert our relations) and UI integration tests (using zombie.js, which is probably worth a separate blog entry). In order to test something you need to have a context, and in the case of our services this context can mean a lot of interconnected data to validate that the algorithms are performing correctly. We ran through three different approaches to managing test data under Ruby on Rails and none of them are great.


At first I started using the basic ActiveRecord to create data:

before do
  category = Category.create(:name => 'Service',
        :description => 'Lorem ipsum dolar sit amet')
  Question.create(:text => 'Question A', :category => category,
        :key => 'Key')

This worked for the model tests fine, because I was creating a small number of records. But when it got to the PriceCalculator service, I needed to load a large set of data to test them and this was just too much typing. I looked at fixtures and immediately threw out the yml approach, because it didn’t reduce typing. While I tried the CSV approach, that didn’t work either because the records need to be tied in relations (as in the example above).


I know nobody likes fixtures and they clearly have some significant failures:

  • Not recommended by most Rails pros; I trust their judgement
  • Data is tied to the model, not the test context; I need different data for different tests
  • Data is loaded when fixtures is called but never cleaned up until the next fixtures call; so if some of your tests use other data it can clash with the fixtures data

Factory Girl

Following Aaron’s suggestion, I added factory_girl which can generate up all our required data fields (i.e. not-null) with default data making it possible to greatly limit the data added to records in our test. So we defined some factories to create records:

Factory.define :category do |f|
  f.sequence(:name) { |n| "Category Lorem Ipsum-#{n}" }
  f.description 'SaaS for You'

Factory.define :question do |f|
  f.sequence(:text) { |n| "Question-#{n}" }
  f.association :category
  f.sequence(:key) { |n| "Key-#{n}" }

The tests then were much simpler to write (compare to the opening ActiveRecord example):

before do
  category = Factory.create(:category)
  Factory.create(:question, :category => category, :key => 'Key under test')

There are a couple of really nice things to note here:

  1. I didn’t have to add all the default fields, so there’s much less repetition and the data showing is data that’s critical to the test
  2. I could have not created the Category; factory_girl will auto-generate that for me

Note, that it’s important that any data in your factory definitions is not used in an assertion because then your tests are brittle. (I added a comment at the top of our factory file to that effect.) Thus in the above example, I set the key attribute because, presumably, I’m testing it.

Making lots of records

Unfortunately, this still doesn’t address my issue of large amounts of data. For large sets of data there are a couple things I want to do. One is build a highly integrated data set that incudes lots of dependencies to simulate the production configuration. It turns out that the more tested our core logic, the less data I actually needed to generate to validate the test cases and boundary conditions.

The other place to use large data is to stress test the system. While I can pare down my specs to require a dozen or fewer records to test basic logic of the algorithms, that doesn’t tell me how it will hold up in production. I need to know that the algorithm will continue to be responsive when we have 300 products. At this point, I’m looking at running loops around factory_girl calls so that it will generate most of the non-essential information. So to set up a single product matrix may take a dozen records, but I can loop over that 300 times and factory_girl will automatically create records with new names (using the sequence method shown in the factory definitions above).

before do
  category = Factory.create(:category)
  300.times do
    product = Factory.create(:product, :category => category)
    12.times do
      Factory.create(:question, :category => category, :product => product)

What are you using. Are there good data generators out there that we’re missing?