JRuby ActiveRecord performance
June 11, 2008
In our current project we are loading pretty big amount of xml data into the database. The xml parsing is very fast because we are using the streaming flavor of REXML like this:
source = File.new(fp)
REXML::Document.parse_stream(source, ImportListener.new)
class ImportListener
def tag_start(name, attrs)
@tags.unshift name
@langs.unshift attrs['xml:lang']
@origin = extract_id(attrs['rdf:about']) if attrs['rdf:about']
relation_name = nil
case name
when 'rdf:Description' # Concept
@pref_label = ''
@definition = ''
...
def current_language
@langs.detect do |l|
!l.nil? && !l.empty?
end
end
def text(t)
case current_tag
when 'rdfs:label'
@label += t.strip
...
So the most time is consumed by ActiveRecord with stuff like find_or_create_by_xxx
. The whole import took 20 minutes / 14 minutes / 52 seconds (real / user /sys) with mysql running on the same machine. Hoped it would go faster with jruby time jruby -S rake xxxx:reimport
. I'm using jruby1.1.2 build from source (rev 6586) with jdbcmysql adapter. With jruby it takes 24 minutes / 18 minutes / 0:44 seconds - about 20% slower.