Thursday, November 02, 2006

Integrating Lucene with Spring Framework & Hibernate

While looking for integration support for Lucene with Spring Framework & Hibernate, I have come across a full-blown open source Java Search Engine Framework called Compass Framework which is built on top of the Lucene Search Engine and provides seamless integration support to popular development frameworks like Hibernate and Spring Framework.

Why do we need yet another framework for implementing search functionality?

Lucene is a low level API which implies that it can easily cause coupling problems especially with the domain objects. This way of directly coding the Lucene API into the application maybe a performance killer and can also become a cause of maintenance nightmare in future (with domain model changes). Looking for other options for integrating Lucene with our Spring based application, I came across two alternatives that exist in the open-source arena:

1. Lucene Spring Modules

One option is using the "Lucene Spring Modules", which is a part of "Spring Modules project" which tries to extend the functionalities of Spring Framework to include other open-source tools. The project is intended to facilitate integration between Spring Framework and other projects without cluttering or expanding the Spring core.

2. Compass Framework

Another option is to use Compass Framework which provides a declarative way to map the domain model to the search engine. Compass provides a high level abstraction on top of the Lucene's low level API which supports a declarative mapping of domain objects. It externalizes all dependencies and coupling in a compass meta data file and thus provides a declarative technique to map the domain objects. Compass also implements fast index operations and optimization which increases the application performance.

Compass Framework provides a module named "Compass::Spring" which is intended to provide closer integration with the Spring Framework. It supports IoC using Spring's Application Context and provides support for Hibernate Session Factory. CF claims to support complex applications with bigger domain models easily. Compass also claims to bring maintenance and performance down to negligible values. Compass comes with a sample project (the old petclinic sample with additional search functionalities using Compass Framework) that demonstrates its integration support with Spring Framework & Hibernate. The product is also quite mature with much elaborate documentation. The current stable version is compass version 1.1M2.

More about Compass Framework

Compass is a first class open source Java Search Engine Framework, enabling the power of Search Engine semantics to your application stack decoratively. Compass is a powerful, transactional Object to Search Engine Mapping (OSEM) Java framework which allows you to declaratively map your Object domain model to the underlying Search Engine, synchronizing data changes between Index and different datasources. Compass provides a high level abstraction on top of the Lucene low level API. Compass also implements fast index operations and optimization and introduce transaction capabilities to the Search Engine.

In recent versions, compass provides a Lucene Jdbc Directory implementation, allowing storing Lucene index within a database for both pure Lucene applications and Compass enabled applications. Compass also provides support to SpringHibernate Gps Device (configured in Spring context file using IoC) which utilizes Compass OSEM feature (Object to Search Engine Mappings) and Hibernate ORM feature (Object to Relational Mappings) to provide simple database indexing. All the OSEM mappings are defined in a compass meta-data file and the SpringHibernate Gps Device intercepts the Hibernate session factory object to index data transparently. The Gps Device also provide real time mirroring of data changes done through Hibernate so you didn't have to explicitly re-index data after a store/update/delete. The path data travels through the system are: Database -- Hibernate -- Objects -- Compass::Gps -- Compass::Core (Search Engine). The compass returns the ids of objects matched along with a tag that identifies the class of object it belongs.

Dear readers don’t forget to read about the origin of compass framework as described by the author Shay Banon’s on his blog. It is well written and I bet you will surely enjoy the narration!!!!!!!!

References:

Open Symphony's Page
Shay Banon’s Blog


Wednesday, November 01, 2006

Full Text Search

In this article I have tried to evaluate some of the options for integrating full-text search features in java applications.

MySQL’s built-in Full Text Search engine

From my initial search what I could find was that MySQL’s built-in Text Search Engine surprisingly does effective full-text searching if the dataset is small. Also it has the least cost to implement since the search criteria can be specified as a part of query itself. But as the size of dataset grows its efficiency becomes dependent on the system resources like CPU, RAM etc.

Open Source Full Text Search engines

Most of the external full text search engines work by keeping a separate index of the table data which will be updated at frequent intervals (maybe with some amount of caching) so that time spend on the database server is less for searching for information. This approach will certainly lessen the load on the database server.

A complete list of popular full text search engines is available at WikiMedia site

1. Sphinx

Sphinx is a full-text search engine, distributed under GPL version 2. Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data source drivers support fetching data either via direct connection to MySQL, PostgreSQL, or from a pipe in a custom XML format.

2. Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java with features like Scalable High-Performance Indexing, Powerful Accurate and Efficient Search Algorithms etc.

As a full-text search engine, Lucene needs little introduction. Lucene, an open source project hosted by Apache, aims to produce high-performance full-text indexing and search software. The Java Lucene product itself is a high-performance, high capacity, full-text search tool used by many popular Websites such as the Wikipedia online encyclopedia and TheServerSide.com, as well as in many, many Java applications. It is a fast, reliable tool that has proved its value in countless demanding production environments.

Although Lucene is well known for its full-text indexing, many developers are less aware that it can also provide powerful complementary searching, filtering, and sorting functionalities. Indeed, many searches involve combining full-text searches with filters on different fields or criteria. For example, you may want to search a database of books or articles using a full-text search, but with the possibility to limit the results to certain types of books. Traditionally, this type of criteria-based searching is in the realm of the relational database. However, Lucene offers numerous powerful features that let you efficiently combine full-text searches with criteria-based searches and sorts.

Bench Marks


The results of benchmarking the most popular full text search engines (MySQL’s built-in Text Search engine, Sphinix Text Search engine plug-in for MySQL and Lucene) is published in the PlanetMySQL site.

Conclusion

Lucene is “the most” popular full text search solution available now to conduct efficient full text searches on database compared to MySQL’s built-in Text Search engine and Sphinix plug-in for MySQL. Lucene is just a java API so it provides seamless integration with other Java programs as compared to Sphinix written in pearl. It is a set of tools that allows us to create an index and then search it. So we need to manually handle the index creation/updation and searches using the API. But the good news is that Spring & Hibernate supports integration of Lucene through various support classes. So go ahead and use it in your java projects. Have a great time searching inside your applications with Lucene!!!

Reference:

Java World Article on Integrating Lucene