by Kevin Murphy
Standard Library Parsing
We're maintaining a system that tracks information about books, including their publication dates. On occasion, publishers will send us CSVs with updated publication dates, and we need to update our Rails application to have those dates.
We want a repeatable process, so rather than updating these by hand, let's use Ruby's CSV class to parse this data.
CSV.parse(csv_text, headers: true).map do |row| book = Book.find_by(isbn: row["ISBN"]) book.update(publication_date: row["Pub Date"]) end
Four lines and we have a functionally complete parser that updates our system how we expect. That all seems great. Until, that is, we actually run it.
Book Checked Out
We execute our parser on the first data file we receive, and it quickly stops with the following error:
NoMethodError: undefined method `update' for nil:NilClass
On inspecting the state of our database, we see that the first three books in the CSV file had their publication dates updated, but the rest didn't. Looking at the fourth row in the CSV, we discover that the ISBN for that row isn't in our database. In that case
nil, and calling
nil is exactly our problem. An exception is raised, and further rows of the CSV aren't parsed.
We can fix that! If we don't find the book, let's log the error and move on to the next row, without calling
CSV.parse(csv_text, headers: true).map do |row| book = Book.find_by(isbn: row["ISBN"]) if book.blank? Rails.logger.error("Could not find book") next end book.update(publication_date: row["Pub Date"]) end
Now the entire CSV processes, our books are updated, and everyone's happy. Right?
Days later, we discover that a book which previously had a publication date doesn't anymore. It's not unusual for a book to not have a publication date - we have records of books that haven't been published yet. However, books shouldn't lose an existing publication date.
We see that the book in question was in the CSV, and the Pub Date column was empty for that row. Turns out, that was an error from the publisher in preparing the CSV. Any book in that file should always have a publication date - the purpose of this process is to provide publication dates.
We can prevent this from happening in the future by ensuring that a row has a publication date before attempting to process it:
CSV.parse(csv_text, headers: true).map do |row| if row["Pub Date"].blank? Rails.logger.error("Skipped updating book with no publication date") next end book = Book.find_by(isbn: row["ISBN"]) if book.blank? Rails.logger.error("Could not find book") next end book.update(publication_date: row["Pub Date"]) end
Losing The Plot
Our "simple" parser is now a lot more complicated. Business rules about the shape, structure, and expectations of the data are now littered along with actions consuming the data to do things like find the book and update it with the appropriate publication date. It's harder to identify what the core responsibility of this snippet of code is.
Let's try to increase the clarity of our code by separating out how to process an individual row of the CSV.
First Act: Establishing A New Character (Class)
We'll start by making a class that takes in the needed data from the CSV row and finds the book associated with the ISBN.
class BookPublicationDateImportRow attr_reader :isbn, :publication_date def initialize(isbn:, publication_date:) @isbn = isbn @publication_date = publication_date end def book @book ||= Book.find_by(isbn: isbn) end end
Perhaps you've heard of a form object to represent data associated with a particular form on your web application. You can consider that's what we're doing here, except our input isn't from a form on a web page - it's a row from a CSV file.
Second Act: Rising Validations
Now, rather than rewriting validation logic, as we had in our procedural code initially, we can use ActiveModel's Validations module. That'll allow us access to the validation helpers we're used to using with other Rails ActiveRecord models.
class BookPublicationDateImportRow include ActiveModel::Validations validates :book, presence: true validates :publication_date, presence: true ... end
We would have caught our last problem of losing publication dates if that validation were on the Book model itself - and we may be tempted to look to add it now. However, remember - having a book with no publication date is totally normal for our application. It's only in this instance of receiving a publication date update from a publisher with no publication date where it's unacceptable to have a value for that attribute.
Third Act: Update Resolution
We'll also mirror ActiveRecord's API by adding in a
save method that makes sure our instance is passing its own validations before updating the book:
class BookPublicationDateImportRow ... def save return false unless valid? book.update(publication_date: @publication_date) end end
Rewriting The Intro
Now that we have something that's responsible for managing an individual row, we can update our parsing code to be responsible for iterating through that CSV and pass off the details of how to manage that data to our new class.
CSV.parse(csv_text, headers: true).map do |row| book_import = BookPublicationDateImportRow.new( isbn: row["ISBN"], publication_date: row["Pub Date"], ) if !book_import.save Rails.logger.error(book_import.errors.full_messages.join(", ")) end end
By adding this new class, we've given ourselves an extension point for additional logic on the row. Any data transformation, like converting the publication date string to a date object, can be handled here (however, for CSV parsing, do take a look at the standard library's converters as well!).
Any further validations that need to be exercised on the data can take place in this separate class. Moreover, we can use framework features and concepts that we're already familiar with, rather than rewriting our own business logic for validation.