Rubyish hashes

Published
6 June 2012
Tagged

Something that's probably not new, but was interesting for me to discover.

There is no to_hash method in ruby, which is probably because you hardly ever need to convert something to a hash. Here's a sample method, if we wanted to define a to_hash method on an Array:

class Array
  def to_hash
    return_hash = {}
    each{ |k,v| return_hash[k] = v
    return_hash
  end
end

This assumes that each entry in your array is itself an array with two elements, in a [key, vale] pattern.

In my view, there's a prettier, "rubier" way to write our to_hash method:

class Array
  def to_hash
    reduce({}){ |hsh, (k,v)| hsh.merge(k=>v) }
  end
end

Now we have it on one line. Pretty, compact, better, right? Let's benchmark:

#!/usr/bin/env ruby

require 'benchmark'

# Generate some sample data
sample_array = []
10000.times do
  sample_array << [rand(1000), rand(1000)]
end

Benchmark.bmbm do |x|
  x.report('Array#each:') do
    hash = {}
    sample_array.each do |k,v|
      hash[k] = v
    end
  end

  x.report('Array#reduce:') do
    hash = sample_array.reduce({}){ |hsh,(k,v)| hsh.merge(k=>v)}
  end
end

Rehearsal -------------------------------------------------
Array#each:     0.000000   0.000000   0.000000 (  0.003334)
Array#reduce:   4.550000   0.210000   4.760000 (  5.099619)
---------------------------------------- total: 4.760000sec

                    user     system      total        real
Array#each:     0.000000   0.000000   0.000000 (  0.003777)
Array#reduce:   4.430000   0.150000   4.580000 (  4.644474)

Ouch.

So what causes the problem? Is it the use of reduce, or of merge? Here's a method that could test this:

class Array
  def to_hash
    reduce({}){ |hsh, (k,v)| hsh[k] = v; hsh}
  end
end

If this runs as fast as our each-using to_hash method, we know the problem lies in merge. Benchmark results:

Rehearsal -------------------------------------------------
Array#each:     0.000000   0.000000   0.000000 (  0.003309)
Array#reduce:   0.010000   0.000000   0.010000 (  0.005039)
-------------------------------------
---
total: 0.010000sec

                    user     system      total        real
Array#each:     0.000000   0.000000   0.000000 (  0.003523)
Array#reduce:   0.010000   0.000000   0.010000 (  0.004758)

It's very, very definitely the merge that's causing the slowdown. So how much slower does our reduce method go than the each method? Time to pull out graphing on varying size arrays:

OK, so it's nothing to write home about. Especially when we compare it with the reduce + merge method shown above:

Slightly worse performance.