Erik has written about Solr’s usage in libraries on the Lucid Imagination Blog. Solr has found its way into many libraries and quite rightly so. However, one of the main things that Erik talks about in that blog post is the performance of DataImportHandler vs SolrMarc (the indexing library used by both VUFind and Blacklight).
Quoting from Erik’s email to the solrmarc-tech google group:
The difference in speeds:
SolrMarc: 22 docs / s
DIHmarc: 1,745 docs / s
W00t!
Well, I don’t know much about SolrMarc but I’ve seen DataImportHandler instances with comparable (or even better) throughput many times. There is something fishy inside SolrMarc for sure and I have a feeling that fixing it would be a low hanging fruit. However, Erik’s opinion is that DataImportHandler is a better way to index and that he will devote a portion of his time to helping the Solr using library community (thanks Erik!).
DIH has really taken off since it debuted in Solr 1.3 and it would be safe to call it the de-facto standard for indexing data into Solr. It may not be the most elegant way to index data but it is quick and it works great. With the planned features for DataImportHandler in Solr v1.5, it will continue to improve and it makes sense to base VUFind and Blacklight’s indexing infrastructure on top of it. I’m very excited to see this happening.
-
faustino-gudroe liked this
-
maryrussell liked this
-
shalinmangar posted this