A couple of weeks back, Apache Lucene Committer and PMC member, Michael McCandless started a discussion on factoring out a shared, standalone Analysis package for Lucene, Solr and Nutch. During the discussions, Yonik Seeley, Solr Creator, proposed merging the development of Lucene and Solr. After intense discussions and multiple rounds of voting, the following changes are being put into effect:
- Merging the developer mailing lists into a single list.
- Merging the set of committers.
- When any change is committed (to a module that “belongs to” Solr or to Lucene), all tests must pass.
- Release details will be decided by dev community, but, Lucene may release without Solr.
- Modularize the sources: pull things out of Lucene’s core (break out query parser, move all core queries & analyzers under their contrib counterparts), pull things out of Solr’s core (analyzers, queries).
The following things do not change:
- Besides modularizing (above), the source code would remain factored into separate dirs/modules the way it is now.
- Issue tracking remains separate (SOLR-XXX and LUCENE-XXX issues).
- User’s lists remain separate.
- Web sites remain separate.
- Release artifacts/jars remain separate.
So what does it mean for Lucene/Solr users? Nothing much, really. Except that you should see tighter co-ordination between Lucene and Solr development. New Lucene features should reach Solr faster and releases should be more frequent. Solr features may also be made available to Lucene users who do not want to setup Solr use the RESTy APIs.
Already, Solr has been upgraded to use Lucene trunk (in branches/solr) and should soon become the new Solr trunk. There is talk of re-organizing the source structure to better fit the new model. Things are moving fast!
Personally, I feel that this merge is a good thing for both Lucene and Solr:
- Solr users get the latest Lucene improvements faster and releases get streamlined.
- Lucene users get access to Solr features such as faceting.
- The in-sync trunk allows new features to make their way into the right place (Lucene vs Solr) more easily and duplication is minimized.
- Bugs are caught earlier by the huge combined test suite.
- More number of committers means more ideas and hands available to the projects
- Other Lucene based projects can benefit too because many Solr features will be made available through Java APIs.
There are a couple of things to be worked out. For example, we need to decide where the integrated sources should live and whether or not to sync Solr’s version with Lucene’s. All this will take some time but I am confident that our combined community will manage the transition well.

Congratulations to the Apache Lucene team on releasing Lucene Java 3.0.1 and 2.9.2. Both of these are bug fix releases and are backwards compatible with Lucene Java 3.0.0 and 2.9.1 respectively.
From the official announcement:
Hello Lucene users,
On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2:
Both releases fix bugs in the previous versions:
- 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4
- 3.0.1 has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5
New users of Lucene are advised to use version 3.0.1 for new developments, because it has a clean, type-safe API.
Important improvements in these releases include:
- An increased maximum number of unique terms in each index segment.
- Fixed experimental CustomScoreQuery to respect per-segment search. This introduced an API change!
- Important fixes to IndexWriter: a commit() thread-safety issue, lost document deletes in near real-time indexing.
- Bugfixes for Contrib’s Analyzers package.
- Restoration of some public methods that were lost during deprecation removal.
- The new Attribute-based TokenStream API now works correctly with different class loaders.
Both releases are fully compatible with the corresponding previous versions. We strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 3.0.1 if you are using 3.0.0.
See core changes at:
and contrib changes at:
Binary and source distributions are available at http://www.apache.org/dyn/closer.cgi/lucene/java/
Lucene artifacts are also available in the Maven2 repository at http://repo1.maven.org/maven2/org/apache/lucene/

Apache Lucene Java 3.0.0 has been released. Lucene Java 3.0.0 is mostly a clean-up release without any new features. It paves the path for refactoring and adding new features without the shackles of backwards compatibility. All APIs deprecated in Lucene 2.9 have been removed and Lucene Java has officially moved to Java 5 as the minimum requirement.
See the announcement email for more details. Congratulations Lucene Devs!

Apache Lucene 2.9 has been released. Apache Lucene is a high performance, full-featured text search engine library written entirely in Java.
From the official announce email:
Lucene 2.9 comes with a bevy of new features, including:
- Per segment searching and caching (can lead to much faster reopen among other things)
- Near real-time search capabilities added to IndexWriter
- New Query types
- Smarter, more scalable multi-term queries (wildcard, range, etc)
- A freshly optimized Collector/Scorer API
- Improved Unicode support and the addition of Collation contrib
- A new Attribute based TokenStream API
- A new QueryParser framework in contrib with a core QueryParser replacement impl included.
- Scoring is now optional when sorting by Field, or using a custom Collector, gaining sizable performance when scores are not required.
- New analyzers (PersianAnalyzer, ArabicAnalyzer, SmartChineseAnalyzer)
- New fast-vector-highlighter for large documents
- Lucene now includes high-performance handling of numeric fields. Such fields are indexed with a trie structure, enabling simple to use and much faster numeric range searching without having to externally pre-process numeric values into textual values.
- And many, many more features, bug fixes, optimizations, and various improvements.
Look at the
release announcement for more details.
Congratulations to the Lucene team! Great work as always.
This is also the last minor release which supports Java 1.4 platform. The next release will be 3.0 with which deprecated APIs will be removed and Lucene will officially move to Java 5.0 as the minimum requirement.
Solr 1.4 is not far behind and we hope to release it within two weeks.
There is a large amount of work being done in Lucene 2.9, in which a large portion is related to adding support for near real-time search.
To put it very simply, search engines transfer a lot of work from query-time to index-time. The reason this is done, is to speed up queries at the cost of adding documents slower. Until now, Lucene based systems have had problems with dealing with scenarios in which the searchers need to see the changes instantly (think Twitter Search). There exist a variety of tricks and techniques to acheive this even now. However, near real-time search support in Lucene itself is a boon to all those people who have been building and managing such systems because the grunt work will be done by Lucene itself.
This is still under development and will probably take a few more months to mature. Solr will benefit from it as well but before that can happen, a lot of work will be needed under the hood particularly in the way Solr handles its caching.
Michael McCandless has summarized the current state of Lucene trunk in this email on java-dev mailing list. In fact, there is so much activity that, at times, it becomes very difficult to follow all the excellent discussions that go on. There are some very talented people on that forum and it is a lot of learning for a guy like me, who started with Solr and is still trying to find his way in the Lucene code base.
Lucene 2.9 will bring huge improvements and I’m looking forward to working with other Solr developers to integrate them with Solr.