Shalin Says...

There is a large amount of work being done in Lucene 2.9, in which a large portion is related to adding support for near real-time search.

To put it very simply, search engines transfer a lot of work from query-time to index-time. The reason this is done, is to speed up queries at the cost of adding documents slower. Until now, Lucene based systems have had problems with dealing with scenarios in which the searchers need to see the changes instantly (think Twitter Search). There exist a variety of tricks and techniques to acheive this even now. However, near real-time search support in Lucene itself is a boon to all those people who have been building and managing such systems because the grunt work will be done by Lucene itself.

This is still under development and will probably take a few more months to mature. Solr will benefit from it as well but before that can happen, a lot of work will be needed under the hood particularly in the way Solr handles its caching.

Michael McCandless has summarized the current state of Lucene trunk in this email on java-dev mailing list. In fact, there is so much activity that, at times, it becomes very difficult to follow all the excellent discussions that go on. There are some very talented people on that forum and it is a lot of learning for a guy like me, who started with Solr and is still trying to find his way in the Lucene code base.

Lucene 2.9 will bring huge improvements and I’m looking forward to working with other Solr developers to integrate them with Solr.

blog comments powered by Disqus