- Published
- 6 June 2012
- Tagged
Something that's probably not new, but was interesting for me to discover.
There is no to_hash
method in ruby, which is probably because you hardly ever need to convert something to a hash. Here's a sample method, if we wanted to define a to_hash
method on an Array
:
class Array
def to_hash
return_hash = {}
each{ |k,v| return_hash[k] = v
return_hash
end
end
This assumes that each entry in your array is itself an array with two elements, in a [key, vale]
pattern.
In my view, there's a prettier, "rubier" way to write our to_hash
method:
class Array
def to_hash
reduce({}){ |hsh, (k,v)| hsh.merge(k=>v) }
end
end
Now we have it on one line. Pretty, compact, better, right? Let's benchmark:
#!/usr/bin/env ruby
require 'benchmark'
# Generate some sample data
sample_array = []
10000.times do
sample_array << [rand(1000), rand(1000)]
end
Benchmark.bmbm do |x|
x.report('Array#each:') do
hash = {}
sample_array.each do |k,v|
hash[k] = v
end
end
x.report('Array#reduce:') do
hash = sample_array.reduce({}){ |hsh,(k,v)| hsh.merge(k=>v)}
end
end
Rehearsal -------------------------------------------------
Array#each: 0.000000 0.000000 0.000000 ( 0.003334)
Array#reduce: 4.550000 0.210000 4.760000 ( 5.099619)
---------------------------------------- total: 4.760000sec
user system total real
Array#each: 0.000000 0.000000 0.000000 ( 0.003777)
Array#reduce: 4.430000 0.150000 4.580000 ( 4.644474)
Ouch.
So what causes the problem? Is it the use of reduce
, or of merge
? Here's a method that could test this:
class Array
def to_hash
reduce({}){ |hsh, (k,v)| hsh[k] = v; hsh}
end
end
If this runs as fast as our each
-using to_hash
method, we know the problem lies in merge
. Benchmark results:
Rehearsal -------------------------------------------------
Array#each: 0.000000 0.000000 0.000000 ( 0.003309)
Array#reduce: 0.010000 0.000000 0.010000 ( 0.005039)
-------------------------------------
---
total: 0.010000sec
user system total real
Array#each: 0.000000 0.000000 0.000000 ( 0.003523)
Array#reduce: 0.010000 0.000000 0.010000 ( 0.004758)
It's very, very definitely the merge
that's causing the slowdown. So how much slower does our reduce
method go than the each
method? Time to pull out graphing on varying size arrays:
OK, so it's nothing to write home about. Especially when we compare it with the reduce + merge
method shown above:
Slightly worse performance.