April 2011
1 post
Apache Lucene 3.1.0 and Apache Solr 3.1.0 →
This is the first release bringing Lucene and Solr release versions in sync. There are numerous bug fixes, optimizations and new features. Download from here
August 2010
1 post
2 tags
My Android phone - Samsung Galaxy S Review
I had been holding out on buying a more internet friendly phone for some time now, waiting for 3G service to start in India. After my iPad experience, it was clear to me that I couldn’t be happy with an iPhone but it was also obvious that none of the available Android phones were good enough.
Enter the new Samsung Galaxy S with Android 2.1, awesome 4” display, light weight and a good...
June 2010
2 posts
1 tag
Solr 1.4.1 Released →
From the mailing list announcement:
Apache Solr 1.4.1 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/
Solr 1.4.1 is a bug fix release. See the change log for more details.
The iPad experience
Now that I have had the chance to play with the Apple iPad, I thought I’d put down some observations and opinions about the device. I know that I’m late to the party but hey, they don’t sell these devices here in India.
First of all, I love that I do not have to sit before a desk to use this. There is a real need in the world for a device like this and even though many have...
March 2010
2 posts
1 tag
Apache Mahout 0.3 Released
Apache Mahout 0.3 has been released. Apache Mahout is a project which attempts to make machine learning both scalable and accessible. It is a sub-project of the excellent Apache Lucene project which provides open source search software.
From the project website:
The Apache Lucene project is pleased to announce the release of Apache Mahout 0.3. Highlights include:
New: math and collections...
2 tags
Merging Lucene and Solr
A couple of weeks back, Apache Lucene Committer and PMC member, Michael McCandless started a discussion on factoring out a shared, standalone Analysis package for Lucene, Solr and Nutch. During the discussions, Yonik Seeley, Solr Creator, proposed merging the development of Lucene and Solr. After intense discussions and multiple rounds of voting, the following changes are being put into effect:
...
February 2010
2 posts
1 tag
Apache Lucene Java 3.0.1 and 2.9.2 Released
Congratulations to the Apache Lucene team on releasing Lucene Java 3.0.1 and 2.9.2. Both of these are bug fix releases and are backwards compatible with Lucene Java 3.0.0 and 2.9.1 respectively.
From the official announcement:
Hello Lucene users,
On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2:
Both releases fix bugs...
2 tags
Microsoft dropping FAST search for Linux, Unix
According to a blog post from Microsoft Distinguished Engineer and CTO, FAST Bjørn Olstad, the 2010 products will be the last to have a search core that runs on Linux and UNIX.
Being involved in Apache Solr and the newly formed Lucene Connectors Framework (LCF) project, I’m very interested in the implications. Undoubtedly, at least some FAST customers will not be happy with this decision...
January 2010
4 posts
1 tag
2 tags
Solr In Action Case Studies
Well, the cat is out of the bag. I’ve been working with Otis on Solr In Action. We’re looking for a couple of contributors to write case studies for the book describing how they have used Solr. Otis just posted this to his blog and to the Solr mailing list as well.
So, if you are are using Apache Solr in some clever, interesting or unusual way, or deal with large indexes or large...
The Total Growth of Open Source →
Amit Deshpande and Dirk Riehle from SAP Labs have conducted and published a research on the growth of open source software.
The data has been culled from Ohloh.net and is based on the stats and activity of around 5000 open source projects written in 30 different languages and 103 open source licenses.
Some interesting quotes from the publication:
Successful open source projects like Linux,...
1 tag
SolrMarc vs DIHMarc
Erik has written about Solr’s usage in libraries on the Lucid Imagination Blog. Solr has found its way into many libraries and quite rightly so. However, one of the main things that Erik talks about in that blog post is the performance of DataImportHandler vs SolrMarc (the indexing library used by both VUFind and Blacklight).
Quoting from Erik’s email to the solrmarc-tech google...
December 2009
4 posts
Exploding mobile web usage in India →
Opera published a study titled State of the Mobile Web, November 2009 which I found through TechCrunch. I can’t help but notice the tremendous growth in web usage through mobile phones in India. Page views have grown by 228.5% Y/Y and unique users have grown by 208.4% but if you look at metrics like page views per user or the amount of data transferred per user, you’ll see that they...
Migrating from Blogger to Tumblr
I had been thinking about moving away from Blogger to my own domain. Finally, I decided to give in and I was fortunate enough to buy this domain. Blogger has been a simple service but I wanted to try the new kids Tumblr or Posterous. After spending some time fiddling with both of them, I decided to go with Tumblr.
It took me some time to figure out the right way to move from Blogger. I used the...
1 tag
AOL lists on NYSE
AOL listed on the New York Stock Exchange on 10th December 2009. This has been in the works for a long time and I’m glad we’re finally here. Things are changing around the company and I’m happy to be a part of this change.
AOL has a new logo (and yes, it is still to be written as AOL). I loved the new brand videos, watch them on youtube -...
November 2009
3 posts
1 tag
Apache Lucene Java 3.0 Released
Apache Lucene Java 3.0.0 has been released. Lucene Java 3.0.0 is mostly a clean-up release without any new features. It paves the path for refactoring and adding new features without the shackles of backwards compatibility. All APIs deprecated in Lucene 2.9 have been removed and Lucene Java has officially moved to Java 5 as the minimum requirement. See the announcement email for more details....
1 tag
Apache Mahout 0.2 Released
Apache Mahout 0.2 has been released. Apache Mahout is a project which attempts to make machine learning both scalable and accessible. It is a sub-project of the excellent Apache Lucene project which provides open source search software. From the project website: The Apache Lucene project is pleased to announce the release of Apache Mahout 0.2.Highlights include:Significant performance increase...
1 tag
Apache Solr 1.4 Released
From the official announcement: Apache Solr 1.4 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and...
October 2009
1 post
Why you should contribute to Open Source
Note: The following material and presentation was prepared for students of the Indian Institute of Information Technology (IIIT), Allahabad. The aim was to get them excited about contributing to open source projects and in particular about Apache Lucene, Solr and Hadoop. The first talk was titled “Why you should contribute to Open Source” and was aimed at freshmen and has no technical...
September 2009
2 posts
2 tags
What's new in DataImportHandler in Solr 1.4
DataImportHandler is a Apache Solr module that provides a configuration driven way to import data from databases, XML and other sources into Solr in both “full builds” and incremental delta imports. A large number of new features have been introduced since it was introduced in Solr 1.3.0. Here’s a quick look at the major new features: Error Handling & Rollback Ability to...
1 tag
Apache Lucene 2.9 Released
Apache Lucene 2.9 has been released. Apache Lucene is a high performance, full-featured text search engine library written entirely in Java. From the official announce email: Lucene 2.9 comes with a bevy of new features, including: Per segment searching and caching (can lead to much faster reopen among other things) Near real-time search capabilities added to IndexWriter New Query types Smarter,...
July 2009
1 post
1 tag
Thoughts on Tomcat
I saw an advertisement today for taking a survey on Tomcat to help define it’s future directions. I don’t usually click on ads but this one seemed interesting so I did. It was a short one (thanks guys!) so I didn’t mind completing it. What I did not like so much was the focus on questions on how Tomcat can compete with “enterprise” application servers. What is...
May 2009
1 post
5 tags
Solr in PHP/Drupal, Ruby/Sunspot and...
Adoption of Apache Solr is accelerating. Being accessible though HTTP makes it possible for Solr (a Java webapp) to be used with any language. All you need is support for making HTTP calls and parsing one of the many available formats such as XML or JSON. Drupal Drupal is one of the most popular CMS available as open source. It is written in PHP and boasts of a huge user and developer base....
April 2009
5 posts
1 tag
Burst of activity in Lucene
There is a large amount of work being done in Lucene 2.9, in which a large portion is related to adding support for near real-time search. To put it very simply, search engines transfer a lot of work from query-time to index-time. The reason this is done, is to speed up queries at the cost of adding documents slower. Until now, Lucene based systems have had problems with dealing with scenarios in...
2 tags
Google App Engine and Maven
Google has announced support for building Java applications on the App Engine platform. This is great news for new App Engine developers and especially for those Java developers who had to learn Python to use App Engine. I created a project for App Engine using Maven for builds. These were the steps I needed to follow: 1. Publish the App Engine libraries to the local Maven repository. Goto the...
2 tags
Apache Mahout 0.1 Released
Apache Mahout 0.1 has been released. Apache Mahout is a project which attempts to make machine learning both scalable and accessible. It is a sub-project of the excellent Apache Lucene project which provides open source search software. This is also the first public release of Taste collaborative filtering project ever since it was donated to Apache Mahout last year. From the official announce...
2 tags
Tagging and Excluding Filters
Multi-select faceting is a new feature in the, soon to be released, Solr 1.4. It introduces support for tagging and excluding filters which enables us to request facets on a super-set of results from Solr. The ProblemOut-of-the-box support for faceted search is a very compelling enhancement that Solr provides on top of Lucene. I highly recommend reading through the excellent article by Yonik on...
3 tags
Inside Solr: Improvements in Faceted Search...
Yonik Seeley recently implemented a new method for faceting which will be available in Solr 1.4 (yet to be released). It is optimized for faceting on multi-valued fields with large number of unique terms but relatively low number of terms per document. The new method has made a large improvement in performance for faceted search and has cut memory usage at the same time. Background When you facet...
March 2009
1 post
2 tags
The architecture behind popular websites
Sharing a few interesting articles I read in the past few weeks on the interweb about Twitter, LinkedIn, Ebay and Google. Improving running components at Twitter describes the evolution of Twitter’s technology and about their new message queue server, named Kestrel, written in approximately 1.5K lines of Scala. LinkedIn Communication Architecture details the heavy usage of Java, Tomcat,...
February 2009
3 posts
1 tag
Helpful hints on Large Solr Indexes and Schema...
Solr user Lance Norskog has been kind enough to contribute documentation on: Common Solr Schema Design problems and solutions Design considerations for Unique Keys for common use-cases Common problems and solutions for Large Solr IndexesVery useful documentation which, no doubt, will be made more comprehensive with time. Update - Mark Miller has written a very nice article on Scaling Lucene and...
3 tags
Google Summer of Code 2009 at Apache
Google Summer of Code program is back again this year and Apache is looking for students interested in contributing and making money with the program. Apache Software Foundation received quite a few students with excellent proposals who did a lot of great work last year. Take a look at the last year’s proposals to get a feel of the level of competition. I’m sure there would be quite a...
2 tags
Announcing my return to blogging
Yes, it has been a long long time since my last post. I guess I lost interest in writing about the myriad of things out there. But, I sure did not lose interest in reading and learning about them. I work at AOL Bangalore Development Center as a Software Engineer on a variety of cool projects. Life is great, work is fun and I’m having a good time. They pay me to work on such interesting...