Roshan's World

Introduction to Elasticsearch

2015-05-31T10:10:00.001-07:00

Going through my previous blogs it will be clearly evident that I am very fond of writing about the topic of "Full Text Search". This time also, I would like to discuss on a topic related to Searching and Indexing. That is, about a document based NoSQL database named Elasticsearch which uses an underlying full-text search engine to store data as indexes and query it efficiently. Recently, I had an opportunity to work on this product and was fascinated by its concept and features.

I don't wish to get into the theory of NoSQL databases and Polyglot persistence here. For that you can refer to the book NoSQL Distilled by Martin Fowler, which gives a good introduction to this subject. Here I would like to give a brief introduction about ElasticSearch and explain how ElasticSearch is related to the concept of full text search.

A little bit of history (as per Wikipedia):

Shay Banon created Compass Framework in 2004. (You can read about Compass from my previous article here ). While thinking about the third version of Compass he realized that it would be necessary to rewrite big parts of Compass to "create a scalable search solution". So he created "a solution built from the ground up to be distributed" and used a common interface, JSON over HTTP, suitable for programming languages other than Java as well. Shay Banon released the first version of Elasticsearch in February 2010.

By the way, what is Elasticsearch?:

Elasticsearch is a distributed RESTful search engine built for the cloud. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. You can Download Elasticsearch from the official distribution site.

How is Elasticsearch related to Full Text Search?:

ElasticSearch is powered by Lucene, a powerful open-source full-text search library, under the hood. ElasticSearch uses Apache Lucene to create and manage an inverted index and instead of searching the text directly, it searches an index instead. Hence ElasticSearch is able to achieve fast search responses. This is like retrieving pages in a book related to a keyword by scanning the index at the back of a book, as opposed to searching every word of every page of the book.

My Experiences with Elasticsearch:

Recently, I was involved in a migration of Data-warehouse from relational star based schema to Elasticsearch NoSQL data store. This rewrite simplified the schema definition for analytics significantly and gave amazing performance benefits for storing (indexing) and querying (searching) data. ElasticSearch also gave us the benefits of horizontal scaling (with the concept of shrads) which is excellent to handle big data.

But one challenge with NoSQL database is to deal with the non-conventional style of queries and resultset (usually a JSON based notation). It might seem non-intuitive to Developers, who are much familiar with the intuitive and standard notation of Structured Query Language (SQL) for relational databases. This is clearly a disadvantage for projects using NoSQL datastores, as a lot of time is spend by Developers in writing code for translating SQL statements to its NoSQL counterpart statements.

Due to the complexity and learning curve associated with the non-conventional style of querying, firing even simple queries against NoSQL databases becomes cumbersome and time consuming. When I was working on the migration project, I was searching for an open source tool that could give a SQL interface for the underlying NoSQL data store, so that I could verify data easily outside the application. But I couldn't find one, so I had to use the web-based Marvel plugin for Elasticsearch and had to spent lot of time learning the syntax for creating expressions to query data in the format Elasticsearch understands.

So after finishing the migration project, I started working an open source project for developing a web based query tool for querying Elasticsearch using standard SQL syntax. This project is currently publicly hosted in GitHub under name sql-query-browser-for-elasticsearch. This project is still under development (with daily check-ins), but at it current state, it can be used as a thin web client for Elasticsearch Server running on any machine to execute simple queries against a hosted Elasticsearch server. The project can be downloaded using the Git client and run locally.

Emerging visualization trends with Elasticsearch:

Elastic Search, Logstash and Kibana – the ELK Stack – is emerging as the best technology stack to collect, manage and visualize big data. These technology stack provides good visualization tools and relieves developers from the need to write custom queries to fetch data.

Conclusion:

The world of NoSQL is still in its infancy and need to go a long way to get to a position where relational databases are positioned now. Maybe in near future, we can expect a standardization in the language for getting information from and updating NoSQL databases, similar to what SQL provides now for relational databases. Till then, we need a community involvement in producing more open-source, user friendly query tools to interface with these feature rich databases.

References:

1. Wikipedia entry for Elasticsearch

2. Github project for Elasticsearch

Full-Text Search using Oracle Text

2009-03-21T11:19:00.000-07:00

This is in continuation with my previous post on Full Text Search in which I discussed about MySQL’s built-in Full Text Search engine and external Open Source Full Text Search engines as options for performing integrating full-text search features in java applications. This time I want to share information on building Full-Text Search Applications with Oracle Text.

Oracle Text:

Oracle Text is a powerful search technology built into all Oracle Database editions, including the free Express Edition (XE). The development APIs provided by Oracle Text allow software developers to easily implement full-featured content search applications.

Oracle Text is suitable for a wide variety of search-related use cases and storage structures. Application areas for Text include e-business, document and records management as well as issue tracking just to name a few. Retrievable text can reside in a structured form inside the database or in unstructured form either in a local file system or on the Web.

Oracle Text can be used to search structured and unstructured documents complementing the SQL wildcard matching. Oracle Text provides a complete SQL-based search API that consists of custom query operators, DDL syntax extensions, a set of PL/SQL procedures and database views. Text API gives the application developer full control over indexing, queries, security, presentation, and software configuration that is sometimes required. Oracle Text is also programming-language agnostic and works equally well for PHP as well as Java applications.

Setting Up Oracle Text:

Oracle Text is installed with an Oracle Database XE installation by default. With other database editions, you need to install the Oracle Text feature yourself. Once the feature is present, you only need to create a normal database user and grant the CTXAPP role to the user. This will allow the user to execute certain index management procedures:

Indexing Process and Searching

Oracle Text indexes retrievable data items before users are able to find content with search. This is a common approach used to ensure adequate search performance. The Oracle Text indexing process is modeled after a pipeline, where data items retrieved from a data store pass through a series of transformations before their keywords are added to the index. The indexing process is split into multiple phases, where each phase is handled by a separate entity and configurable by the application developer.

Oracle Text has different index types that are suitable for different purposes. For full-text search with large documents, the CONTEXT index is the appropriate index type. The indexing process includes the following phases:

Data Retrieval: Data is simply fetched from a data store, for example, a Web page, database large object, or local file system, and passed as a stream of data to the next phase.
Filtering: The filters are responsible for converting data in different file formats to plain text. The other components in the indexing pipeline only process plain text data and don't know about file formats such as Microsoft Word or Excel.
Sectioning: The sectioner adds metadata about the structure of the original data item.
Lexing: A stream of characters is split into words based on the language of the item.
Indexing: In this final phase, the keywords are added to the actual index.

Once the index has been built, an application can use plain SQL queries to execute a search entered by an end user.

Searching

The CONTAINS operator is used for searching CONTEXT indexes.

Index Maintenance

Because base table data is replicated by the index, the data needs to be periodically synchronized to the index. Index maintenance procedures can be found in the CTX_DDL PL/SQL package.

Summary

Oracle Text allows users to create full-text index on a single column / multiple columns in a single table as well across multiple tables in a database. Details on creating the index, searching and index maintence is discussed comprehensively in the OTN Developer article on full text indexing.

References:

1. OTN Developer article on full text indexing

2. OTN Discussion Forum - Topic on multi-table indexing

3. Thread on full-text indexing

Flex Profiling

2008-11-20T23:06:00.001-08:00

I am back with my boring technical posts again! This time I have tried to collect and organize some helpful material for anyone looking into Flex Profiling, because the noise coming out from Google searches usually ends up with information of no worth.

Profiling Flash Applications with Flex Builder 3

The Flex Profiler is a new addition to Flex Builder 3 and is a powerful tool that enables you to watch an application as it allocates and clears memory and objects. It connects to your application with a local socket connection.

As the Profiler runs, it takes a snapshot of data every few milliseconds and records the state of the Flash Player at that snapshot, a process referred to as sampling. By parsing the data from sampling, the Profiler can show every operation in your application. The Profiler records the execution time of those operations, as well as the total memory usage of objects in the Flash Player at the time of the snapshot.

Following links provides a step-by-step information on how to start profile applications using Flex Builder 3.

http://x-geom.net/blog/?p=48
http://blogs.adobe.com/aharui/profiler/ProfilerScenarios.swf
http://www.insideria.com/2008/06/profiling-flex-applications-sa.html
http://labs.adobe.com/wiki/index.php/Flex_3:Feature_Introductions:_Performance_and_Memory_Profiling
http://livedocs.adobe.com/flex/3/html/help.html?content=profiler_4.html

Flex Builder 3 allows to profile both Memory and Performance of Flex applications. Following information helps one to quick start on both.

1. Flex Memory Profiling

Memory profiling involves examining the memory used—as well as the memory currently in use—by objects in your application. Those objects could be simple classes, such as Strings, or complex visual objects, such as DataGrids. Using memory profiling, you can determine whether an appropriate number of objects exist and whether those objects are using an appropriate amount of memory.

Understanding Flex Memory Management and VM Garbage Collection

Flash Player Memory Allocation

Flash Player is responsible for providing memory for your Flex application at runtime. When you execute a line of code that creates a new instance of the DataGrid class, Flash Player provides a piece of memory for that instance to occupy. Flash Player in turn needs to ask your computer’s operating system for memory to use for this purpose.

The process of asking the operating system for memory is slow, so Flash Player asks for much larger blocks than it needs, and keeps the extra available for the next time the application requests more space. Additionally, Flash Player watches for memory that’s no longer in use, so that it can be reused before asking the operating system for more.

Flash Player Garbage Collection

Garbage collection is a process that reclaims memory no longer in use, so that it can be reused by the application—or, in some cases, given back to the operating system. Garbage collection happens automatically at allocation, which can be confusing to new developers. This means that garbage collection doesn’t occur when memory is no longer in use, but rather when the application asks for more memory. At that point, the process responsible for garbage collection, called the garbage collector, attempts to reclaim available memory for reallocation.

The garbage collector follows a two-part procedure to determine which portions of memory are no longer in use:

1. Reference counting
2. Mark and sweep

Following are some links which will be useful for anyone looking into Flex GC and Flash Player memory design.

1. PPT from Adobe on Memory Management and GC
http://blogs.adobe.com/aharui/GarbageCollection/GCAtomic.ppt

2. Resource Management
http://www.adobe.com/devnet/flashplayer/articles/resource_management.html

3. Article on GC from Adbove Devnet
http://www.adobe.com/devnet/flashplayer/articles/garbage_collection.html

4. Sample Flex app with a straightforward memory leak
http://blogs.adobe.com/aharui/GarbageCollection/MemoryLeakTest.zip

General observations on Flex GC

GC is invoked during memory allocation and not asynchronously as a background thread in Java JVM
IE minimize/maximize/restore operations seems to fire the GC, releasing the memory

Flex memory tuning tips

1. Memory Leaks caused by Event Listeners

Always remove unused event listeners. Each time an event listener is added to an object, it increases the object's reference count. So the reference remains, until the event listener is removed. If for some reason, you cannot remove the event listener use the useWeakReference parameter in the addEventListener. This does not increase the reference count.

Problem:

When you call addEventListener() on the TextInput instance, it responds by adding a reference to the object (the one that contains the handleTextChanged method) to a list of objects that need to be notified when this event occurs. When it's time to broadcast the change event, the TextInput instance loops through this list and notifies each object that registered as a listener. In terms of garbage collection, this means that, in certain circumstances, if an object is listening for events it may never be available for garbage collection.

The following example shows a simple case:

var textInput:TextInput = new TextInput();
textInput.addEventListener('change', handleTextChanged);

Solution:

When adding an event listener to a broadcaster, the developer can specify that the event listener should use weak references. This is accomplished by specifying extra parameters for the addEventListener() method:

var textInput:TextInput = new TextInput();
textInput.addEventListener('change', handleTextChanged, false, 0, true);

2. Using Item Renderer

Use item renderers judiciously. An item renderer derived from a Container class comes with lot of unnecessary overhead. Instead use a simpler class, may be an Actionscript class derived from a UIComponent. This would reduce a lot of overhead.

3. Using Images

Use images that are smaller in size and when they are large in number prefer not to embed them in the application. As far as the image formats are concerned, PNG images are much faster than other image types.

Use BitmapData as much as possible. Use dispose() method of BitmapData to free memory that is used to store the BitmapData object.

4. Bindings

Use Binding only when necessary. Data binding expressions usually take up memory. Prefer assignments to Binding whenever possible.

5. Variables

Accessing local variables is much faster. If you have variables that need to be accessed more use local variables as they are stored on the stack and accessing them is much faster.

You could help the garbage collector by assigning unused variables to null.

6. Instance Creation

For components use deferred instantiation. This would immensly reduce the startup time. Be wary of creationPolicy="all". Try to avoid removeChild() / addChild() when it would work just as well to reuse an object or just toggle the visible property.

7. Containers

Minimize the use of containers. Try not to nest HBoxes within VBoxes and so on. Nested containers make up to huge overheads.

8. Types Conversions

Use types for the variables. Avoid implicit type conversions and when unsure of the type use the "as" operator.

9. Repeaters

Repeaters have a property called recycleChildren.Set it to true. When set to true, the repeater reuses the children it already created instead of creating new ones.

10. Dictionary

Use weak references in the Dictionary object.

11. Modules

Don't unnecessarily load/unload modules. If you need to unload a module, make sure to remove all references pointing to it. In particular if you have an event listener from within the module to something outside the module, that can prevent the module's memory from being reclaimed.

12. NativeWindow

After making a NativeWindow you must call close() before it can be GC'ed. But you must remove references before you can call close(). Also when you open a FileStream object in asynchronous mode, pending event listeners can prevent the FileStream object from being GC'ed.

Sources:

http://www.peachpit.com/articles/article.aspx?p=1182473
http://www.adobe.com/devnet/flashplayer/articles/resource_management.html

2. Flex Performance Profiling

Performance profiling is used to find aspects of the flex application that are unresponsive, or where performance can be improved. When profiling for performance, generally one should be looking for methods that are executed very frequently, or methods that take a long time whenever they’re executed. The combination of those two factors usually provides a good indication of where your time should be spent in optimizing or potentially refactoring.

Using the Flex Builder 3 Profiler, one can identify the slowest portions of the application and optimize it. Flex Builder profiler allows one to take performance snapshots to record how long was spent in each function. This is useful to identify the areas of code that might benefit from optimization. While profiler is running everything is much slower. Often Mouse or similar Events will seem to take lots of time, you can ignore these. Investigate your own functions and see if they have been called too often or they take too long.

Flex Performance Tuning Tips

Flex Speed Tips: Matt chotin has shared a some great tips to improve Flex performance. Following are a few of them:

* If you have a type in AS3, which you are not sure of always use the As operator to cast the type before you use it. This avoids VM errors with try/catch, which slow the execution and is ten times slower than the As operator.

* Array is access is slow if the array is sparse. It may be faster to put nulls in empty values as this speeds things up. Array misses are very slow, up to 20 times slower than finding a valid entry.

* Avoid implicit type conversion. In the player it will convert integers to numbers and back when asked to add integers. You might as well use numbers for everything and convert back to integer at the end.

* Local variable access is faster, so assign variables to local if they are accessed a lot. They will be stored on the stack and access is much quicker.

* Data Binding expressions take up memory and can slow down the application startup. It may be more efficient to do an assignment in code rather than using binding.

* Find a slow computer and run your application. If it runs OK ship it! Other wise you can use flash.utils.getTimer():int to get a time value in miliseconds before and after some process to time it.

More information on tuning ActionScript can be found in Matt chotin's article

http://www.adobe.com/devnet/flex/articles/as3_tuning.html

Sources:

http://www.peachpit.com/articles/article.aspx?p=1182473&seqNum=3
http://www.adobe.com/devnet/flex/articles/as3_tuning.html

Integrating Lucene with Spring Framework & Hibernate

2006-11-02T00:51:00.001-08:00

While looking for integration support for Lucene with Spring Framework & Hibernate, I have come across a full-blown open source Java Search Engine Framework called Compass Framework which is built on top of the Lucene Search Engine and provides seamless integration support to popular development frameworks like Hibernate and Spring Framework.

Why do we need yet another framework for implementing search functionality?

Lucene is a low level API which implies that it can easily cause coupling problems especially with the domain objects. This way of directly coding the Lucene API into the application maybe a performance killer and can also become a cause of maintenance nightmare in future (with domain model changes). Looking for other options for integrating Lucene with our Spring based application, I came across two alternatives that exist in the open-source arena:

1. Lucene Spring Modules

One option is using the "Lucene Spring Modules", which is a part of "Spring Modules project" which tries to extend the functionalities of Spring Framework to include other open-source tools. The project is intended to facilitate integration between Spring Framework and other projects without cluttering or expanding the Spring core.

2. Compass Framework

Another option is to use Compass Framework which provides a declarative way to map the domain model to the search engine. Compass provides a high level abstraction on top of the Lucene's low level API which supports a declarative mapping of domain objects. It externalizes all dependencies and coupling in a compass meta data file and thus provides a declarative technique to map the domain objects. Compass also implements fast index operations and optimization which increases the application performance.

Compass Framework provides a module named "Compass::Spring" which is intended to provide closer integration with the Spring Framework. It supports IoC using Spring's Application Context and provides support for Hibernate Session Factory. CF claims to support complex applications with bigger domain models easily. Compass also claims to bring maintenance and performance down to negligible values. Compass comes with a sample project (the old petclinic sample with additional search functionalities using Compass Framework) that demonstrates its integration support with Spring Framework & Hibernate. The product is also quite mature with much elaborate documentation. The current stable version is compass version 1.1M2.

More about Compass Framework

Compass is a first class open source Java Search Engine Framework, enabling the power of Search Engine semantics to your application stack decoratively. Compass is a powerful, transactional Object to Search Engine Mapping (OSEM) Java framework which allows you to declaratively map your Object domain model to the underlying Search Engine, synchronizing data changes between Index and different datasources. Compass provides a high level abstraction on top of the Lucene low level API. Compass also implements fast index operations and optimization and introduce transaction capabilities to the Search Engine.

In recent versions, compass provides a Lucene Jdbc Directory implementation, allowing storing Lucene index within a database for both pure Lucene applications and Compass enabled applications. Compass also provides support to SpringHibernate Gps Device (configured in Spring context file using IoC) which utilizes Compass OSEM feature (Object to Search Engine Mappings) and Hibernate ORM feature (Object to Relational Mappings) to provide simple database indexing. All the OSEM mappings are defined in a compass meta-data file and the SpringHibernate Gps Device intercepts the Hibernate session factory object to index data transparently. The Gps Device also provide real time mirroring of data changes done through Hibernate so you didn't have to explicitly re-index data after a store/update/delete. The path data travels through the system are: Database -- Hibernate -- Objects -- Compass::Gps -- Compass::Core (Search Engine). The compass returns the ids of objects matched along with a tag that identifies the class of object it belongs.

Dear readers don’t forget to read about the origin of compass framework as described by the author Shay Banon’s on his blog. It is well written and I bet you will surely enjoy the narration!!!!!!!!

References:

Open Symphony's Page
Shay Banon’s Blog

Full Text Search

2006-11-01T21:45:00.001-08:00

In this article I have tried to evaluate some of the options for integrating full-text search features in java applications.

MySQL’s built-in Full Text Search engine

From my initial search what I could find was that MySQL’s built-in Text Search Engine surprisingly does effective full-text searching if the dataset is small. Also it has the least cost to implement since the search criteria can be specified as a part of query itself. But as the size of dataset grows its efficiency becomes dependent on the system resources like CPU, RAM etc.

Open Source Full Text Search engines

Most of the external full text search engines work by keeping a separate index of the table data which will be updated at frequent intervals (maybe with some amount of caching) so that time spend on the database server is less for searching for information. This approach will certainly lessen the load on the database server.

A complete list of popular full text search engines is available at WikiMedia site

1. Sphinx

Sphinx is a full-text search engine, distributed under GPL version 2. Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data source drivers support fetching data either via direct connection to MySQL, PostgreSQL, or from a pipe in a custom XML format.

2. Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java with features like Scalable High-Performance Indexing, Powerful Accurate and Efficient Search Algorithms etc.

As a full-text search engine, Lucene needs little introduction. Lucene, an open source project hosted by Apache, aims to produce high-performance full-text indexing and search software. The Java Lucene product itself is a high-performance, high capacity, full-text search tool used by many popular Websites such as the Wikipedia online encyclopedia and TheServerSide.com, as well as in many, many Java applications. It is a fast, reliable tool that has proved its value in countless demanding production environments.

Although Lucene is well known for its full-text indexing, many developers are less aware that it can also provide powerful complementary searching, filtering, and sorting functionalities. Indeed, many searches involve combining full-text searches with filters on different fields or criteria. For example, you may want to search a database of books or articles using a full-text search, but with the possibility to limit the results to certain types of books. Traditionally, this type of criteria-based searching is in the realm of the relational database. However, Lucene offers numerous powerful features that let you efficiently combine full-text searches with criteria-based searches and sorts.

Bench Marks

The results of benchmarking the most popular full text search engines (MySQL’s built-in Text Search engine, Sphinix Text Search engine plug-in for MySQL and Lucene) is published in the PlanetMySQL site.

Conclusion

Lucene is “the most” popular full text search solution available now to conduct efficient full text searches on database compared to MySQL’s built-in Text Search engine and Sphinix plug-in for MySQL. Lucene is just a java API so it provides seamless integration with other Java programs as compared to Sphinix written in pearl. It is a set of tools that allows us to create an index and then search it. So we need to manually handle the index creation/updation and searches using the API. But the good news is that Spring & Hibernate supports integration of Lucene through various support classes. So go ahead and use it in your java projects. Have a great time searching inside your applications with Lucene!!!

Reference:

Java World Article on Integrating Lucene

Part III - Open Source Java Frameworks for Web Development

2006-04-14T00:42:00.000-07:00

This is the last part of the article on Frameworks. In the previous articles I have given an introduction on the concepts of frameworks and how foundations on modern java frameworks were laid. In this article, I will compare several production quality web frameworks, such as Struts, Spring, and Hibernate and go over basic similarities and underlying concepts.

Basic Concepts

Almost all modern Web-development frameworks follow the Model-View-Controller (MVC) design. Business logic and presentation are separated and a controller of logic flow coordinates requests from clients and actions taken on the server. This approach has become the de facto of Web development. The underlying mechanics of each framework are of course different, but the APIs that developers use to design and implement their Web applications are very similar. The difference also lies in the extensions that each framework provides, such as tag libraries, Java Server Faces, or Java Bean wrappers.

All frameworks use different techniques to coordinate the navigation within the Web application, such as the XML configuration file, java property files, or custom properties. All frameworks also differ in the way the controller module is implemented. For instance, EJBs may instantiate classes needed in each request, or Java reflection can be used to dynamically invoke an appropriate action classes. Also, frameworks may differ conceptually. For example, one framework may define the user request and response (and error) scenario, and another may only define a complete flow from one request to multiple responses and subsequent requests.

Java frameworks are similar in the way they structure data flow. After request, some action takes place on the application server, and some data populated objects are always sent to the JSP layer with the response. Data is then extracted from those objects, which could be simple classes with setter and getter methods, java beans, value objects, or some collection objects. Modern Java frameworks also simplify a developer's tasks by providing automatic Session tracking with easy APIs, database connection pools, and even database call wrappers. Some frameworks either provide hooks into other J2EE technologies, such as JMS (Java Messaging Service) or JMX, or have these technologies integrated. Server data persistence and logging also could be part of a framework.

Popular Web Frameworks

Apache Struts Framework
The Struts framework is an open-source product for building Web applications based on the model-view-controller (MVC) design paradigm. It uses and extends the Java Servlet API and was originally created by Craig McClanahan. In May 2000, it was donated to the Apache Foundation. It features a powerful custom tag library, tiled displays, form validation, and I18N (internationalization). Also, Struts supports a variety of presentation layers, including JSP, XML/XSLT, JavaServer Faces (JSF), and Velocity, as well as a variety of model layers, including JavaBeans and EJB.

Spring Framework
The Spring Framework is a layered Java/J2EE application framework based on code published in Expert One-on-One J2EE Design and Development. The Spring Framework provides a simple approach to development that does away with numerous properties files and helper classes that litter projects. Key features of the Spring Framework include:

Powerful JavaBeans-based configuration management, applying Inversion-of-Control (IoC) principles.

A core bean factory, usable in any environment, from applets to J2EE containers.

Generic abstraction layer for database transaction management, allowing for pluggable transaction managers, and making it easy to demarcate transactions without dealing with low-level issues.

JDBC abstraction layer with a meaningful exception hierarchy.

Integration with Hibernate, DAO implementation support, and transaction strategies.

Hibernate Framework
Hibernate is an object-relational mapping (ORM) solution for the Java language. It is also open source software, as is Struts, and is distributed under the LGPL. Hibernate was developed by a team of Java software developers around the world. It provides an easy to use framework for mapping an object-oriented domain model to a traditional relational database. It not only takes care of the mapping from Java classes to database tables (and from Java data types to SQL data types), but also provides data query and retrieval facilities and can significantly reduce development time otherwise spent with manual data handling in SQL and JDBC.

Hibernate's goal is to relieve the developer from a significant amount of common data persistence-related programming tasks. Hibernate adapts to the development process, whether it is started with a design from scratch or from a legacy database. Hibernate generates the SQL, and relieves the developer from manual result set handling and object conversion, and keeps the application portable to all SQL databases. It provides transparent persistence, the only requirement for a persistent class is a no-argument constructor.

There are more frameworks than I have described here, of course, both open-source and commercial, such as

WebWork - http://www.opensymphony.com/webwork/
Tapestry - http://jakarta.apache.org/tapestry/,

and many frameworks were in-house developed by extending some other MVC frameworks.

Enterprise Development Environments

Some of these frameworks became very popular within the Web developer Community and enterprise development space. As these frameworks matured into stable releases, commercial IDE (integrated development environment) toolmakers started to build support for them into their products. Some even went as far as to develop whole products based on the concepts of the framework. For example, BEA WebLogic Workshop is build around the Struts framework.Borland JBuilder has built-in support for Struts and features JSF and JSTL support as well.

The Eclipse platform became a very popular development tool, partly because of its plug-in base and partly because of its Web framework support. Numerous plug-ins to Eclipse or even entire distributions of Eclipse-based IDEs appeared. Many of the plug-ins were designed for Struts framework development, such as MyEclipse (www.myeclipse.org) or M7 (www.m7.com).

As the Web development arena continues to evolve its tools and programming methodologies, so will the Java application frameworks continue to grow. The future seems very bright for the Java Web-development frameworks.

End of Part III

Part II - Java Frameworks (The evolution of Java development)

2006-04-14T00:33:00.000-07:00

This is the second part of the article on Frameworks which illustrates the evolution of java development. Major part of it have been gleaned from the article “Java Frameworks Take Hold “ By Rene Bonvanie

Java 2 Platform, Enterprise Edition (J2EE) is an incredibly powerful technology. It is designed to be flexible enough to adapt to many different types of applications without requiring developers to invent new approaches.

Start that first project, and the questions come fast and furious. What combination of JavaServer Pages (JSP), Enterprise JavaBeans (EJB), and servlet components should you use to build each part of the system? How will performance be ensured? Is one approach more scalable than another? And finally, how can the choices, once selected, be enforced consistently across a development team?

These questions are at the core of one of the most important discussions in the Java community today. And the mass adoption of Java in internet development projects has resulted in a flood of solutions in the shape of best practices, frameworks, and development tools.

In The Beginning: Design Patterns

The Java community recognized early on that guidelines were necessary to help developers deal with the myriad J2EE-related choices. Gradually, a set of best practices emerged, usually called the J2EE Design Patterns.

J2EE Design Patterns generalize proven, high-quality approaches for frequently encountered design issues with the J2EE application model in a format that all developers can use. Typically, a design pattern is a written description of the problem domain followed by some sample code implementing a solution.

Take, for example, the Web tier in a typical J2EE application. JSP and servlets do a great job at increasing developer productivity when building individual dynamic Web pages but provide little support for managing page-to-page flow. Furthermore, on their own, JSP and servlets do not enforce separation of the Web presentation and business logic.

Here is where design patterns fit in. The basic problem just described is resolved by a pattern called the Model-View-Controller (MVC) design pattern. This pattern specifies a way to build an application so there is a consistent way to control page flow and to separate presentation and business logic layers. The MVC approach naturally builds on JSP and servlets, using the strength of these core specifications.

Next Generation: Frameworks

Developers have gravitated to the J2EE Design Patterns en masse because they represent some of the best-known practices for J2EE application development. Incorpora-ting design patterns into applications promises high-quality, high-performance implementations.

Yet the problem most developers face when working with design patterns is that they are exactly as their name implies: a set of patterns that tend to be academically rigorous but are not easily enforceable or automated on their own. Design patterns are merely coding templates and recommendations that developers are expected to follow, with no guarantees of consistency, understanding, or enforceability. Vendors and developers are now moving to the next generation: developing frameworks based on the J2EE Design Patterns.

At the lowest level, J2EE frameworks automate the easily repeatable coding aspects of the patterns with techniques such as automatic code generation or a metadata-driven approach. At the highest level, J2EE frameworks turn into visual design and declarative programming environments.

An example framework is Apache Struts, implementing the MVC design pattern. It is a popular open source Web-tier framework that originated in an effort to provide a standard implementation of the MVC design pattern. Struts took the major concepts of MVC and created a consistent, reusable metadata layer into which J2EE developers plug the specifics of their applications.

With Struts, J2EE developers no longer have to worry about building the MVC design pattern "plumbing" in every project; rather, they can focus on applying their creative thinking to the presentation layer of the business application itself. Struts—and frameworks in general—bring other benefits too: reduced training costs, faster project delivery, and consistency across application implementations.

One can work across a standard J2EE architecture and find representative frameworks implementing design patterns in each tier. For example, Web-tier frameworks such as Apache Struts are easily combined with business-tier frameworks such as Business Components for Java (BC4J), to write entire applications.

There are many options in the data tier. For instance, BC4J provides a highly scalable implementation of the data access object pattern for persistence. Such business-tier frameworks are also frequently paired with persistence layers such as Oracle9iAS TopLink that help developers map general-purpose business-domain models to data stores such as relational databases.

Frameworks such as Apache Struts and BC4J represent a growing trend in the framework world: implement a set of J2EE Design Patterns and ensure the framework is open and flexible enough to easily plug into other popular frameworks. The goal is to give J2EE developers choice and productivity at the same time.

Making Choices

In the open source and commercial space, dozens of frameworks are emerging. Logical questions follow. What makes a successful framework? How does a developer choose the right framework? Which ones will survive?

One answer is that the surviving, widely adopted frameworks will likely be those that cleanly and elegantly solve architectural problems and significantly increase productivity over straight programming. Leading J2EE frameworks will be judged on quality of implementation, maturity, usability, cost, performance, and reliability.

As the core J2EE specifications evolve to incorporate framework features, the J2EE containers will provide developers with best practices and design consistency already built in. When this happens, developers will focus their selection criteria on the second major area frameworks tend to feature: productivity and ease of use for developers.

Open source Java Frameworks

Below is the list of some popular frameworks in the open source space.

Open Source J2EE Application Frameworks

Spring - Spring is a layered Java/J2EE application framework, based on code published in Expert One-on-One J2EE Design and Development

Jeenius - Jeenius is a framework to simplify the creation of J2EE applications. It has a strong focus on building web-based applications.

Open Source Web Frameworks in Java

Struts - The core of the Struts framework is a flexible control layer based on standard technologies like Java Servlets, JavaBeans, ResourceBundles, and XML, as well as various Jakarta Commons packages. Struts encourages application architectures based on the Model 2 approach, a variation of the classic Model-View-Controller (MVC) design paradigm.

Spring MVC – MVC framework provided by Spring is amost similar to Struts but is more powerful and easy to use.

WebWork - WebWork is a web application framework for J2EE. It is based on a concept called "Pull HMVC" (Pull Hierarchical Model View Controller).

Cocoon - Apache Cocoon is a web development framework built around the concepts of separation of concerns and component-based web development. Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming.

Turbine - Turbine is a servlet based framework that allows experienced Java developers to quickly build secure web applications. Turbine is an excellent choice for developing applications that make use of a services-oriented architecture. Some of the functionality provided with Turbine includes a security management system, a scheduling service, XML-defined form validation server, and an XML-RPC service for web services. It is a simple task to create new services particular to your application.

Tapestry - Tapestry is a powerful, open-source, all-Java framework for creating leading edge web applications in Java. Tapestry reconceptualizes web application development in terms of objects, methods and properties instead of URLs and query parameters. Tapestry is an alternative to scripting environments such as JavaServer Pages or Velocity. Tapestry goes far further, providing a complete framework for creating extremely dynamic applications with minimal amounts of coding.

Open Source Persistence Frameworks in Java

Hibernate - Hibernate is a powerful, ultra-high performance object/relational persistence and query service for Java. Hibernate lets you develop persistent objects following common Java idiom - including association, inheritance, polymorphism, composition and the Java collections framework. Extremely fine-grained, richly typed object models are possible. The Hibernate Query Language, designed as a "minimal" object-oriented extension to SQL, provides an elegant bridge between the object and relational worlds. Hibernate is now the most popular ORM solution for Java.

OJB - ObJectRelationalBridge (OJB) is an Object/Relational mapping tool that allows transparent persistence for Java Objects against relational databases.

Ibatis SQL Maps - The SQL Maps framework will help to significantly reduce the amount of Java code that is normally needed to access a relational database. This framework maps JavaBeans to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of SQL Maps over other frameworks and object relational mapping tools. To use SQL Maps you need only be familiar with JavaBeans, XML and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using SQL Maps you have the full power of real SQL at your fingertips. The SQL Maps framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.

End of Part II

Part I - Introduction to Frameworks

2006-04-14T00:26:00.000-07:00

For all of my friends who are already familiar with development of web based applications with Java (J2EE) using JSP and servlets, and would like to start using Java Frameworks, here is a three part article that introduce the basics of modern java frameworks.

The concept of framework has been kicking around in software development for a long time in one form or another. In its simplest form, a framework is simply a body of tried and tested code that is reused in multiple software development projects. A framework in general, provides an implementation for the core and unvarying functions and includes mechanisms to allow developer to plug-in various functions or to extend the funtions.

Frameworks can be classified into 3 based on their scope, as follows:

1. System infrastructure frameworks - These frameworks simplify the development of portable and efficient system infrastructure such as operating system and communication frameworks, and frameworks for user interfaces and language processing tools. System infrastructure frameworks are primarily used internally within a software organization and are not sold to customers directly.

2. Middleware integration frameworks - These frameworks are commonly used to integrate distributed applications and components. Middleware integration frameworks are designed to enhance the ability of software developers to modularize, reuse, and extend their software infrastructure to work seamlessly in a distributed environment. There is a thriving market for Middleware integration frameworks, which are rapidly becoming commodities. Common examples include ORB frameworks, message-oriented middleware, and transactional databases.

3. Enterprise application frameworks - These frameworks address broad application domains (such as telecommunications, avionics, manufacturing, and financial engineering) and are the cornerstone of enterprise business activities. Relative to System infrastructure and Middleware integration frameworks, Enterprise frameworks are expensive to develop and/or purchase. However, Enterprise frameworks can provide a substantial return on investment since they support the development of end-user applications and products directly.

Regardless of their scope, frameworks can also be classified by the techniques used to extend them, which range along a continuum from whitebox frameworks to blackbox frameworks.

1. Whitebox frameworks rely heavily on OO language features like inheritance and dynamic binding to achieve extensibilty. Existing functionality is reused and extended by (1) inheriting from framework base classes and (2) overriding pre-defined hook methods using patterns like Template Method. Whitebox frameworks require application developers to have intimate knowledge of the frameworks' internal structure. Although whitebox frameworks are widely used, they tend to produce systems that are tightly coupled to the specific details of the framework's inheritance hierarchies.

2. Blackbox frameworks support extensibility by defining interfaces for components that can be plugged into the framework via object composition. Existing functionality is reused by (1) defining components that conform to a particular interface and (2) integrating these components into the framework using patterns like Strategy and Functor. Blackbox frameworks are structured using object composition and delegation more than inheritance. As a result, blackbox frameworks are generally easier to use and extend than whitebox frameworks. However, blackbox frameworks are more difficult to develop since they require framework developers to define interfaces and hooks that anticipate a wider range of potential use-cases.

Object-Oriented (OO) Application Frameworks

Object-oriented (OO) application frameworks are a promising technology for reifying proven software designs and implementations in order to reduce the cost and improve the quality of software. An OO application framework is a reusable, ``semi-complete'' application that can be specialized to produce custom applications. In contrast to earlier OO reuse techniques based on class libraries, frameworks are targeted for particular business units (such as data processing or cellular communications) and application domains (such as user interfaces or persistance).

The primary benefits of OO application frameworks stem from the modularity, reusability, extensibility, and inversion of control they provide to developers, as described below:

Modularity - Frameworks enhance modularity by encapsulating volatile implementation details behind stable interfaces. Framework modularity helps improve software quality by localizing the impact of design and implementation changes. This localization reduces the effort required to understand and maintain existing software.

Reusability - The stable interfaces provided by frameworks enhance reusability by defining generic components that can be reapplied to create new applications. Framework reusability leverages the domain knowledge and prior effort of experienced developers in order to avoid re-creating and re-validating common solutions to recurring application requirements and software design challenges. Reuse of framework components can yield substantial improvements in programmer productivity, as well as enhance the quality, performance, reliability and interoperability of software.

Extensibility - A framework enhances extensibility by providing explicit hook methods that allow applications to extend its stable interfaces. Hook methods systematically decouple the stable interfaces and behaviors of an application domain from the variations required by instantiations of an application in a particular context. Framework extensibility is essential to ensure timely customization of new application services and features.

Inversion of control - The run-time architecture of a framework is characterized by an ``inversion of control.'' This architecture enables canonical application processing steps to be customized by event handler objects that are invoked via the framework's reactive dispatching mechanism. When events occur, the framework's dispatcher reacts by invoking hook methods on pre-registered handler objects, which perform application-specific processing on the events. Inversion of control allows the framework (rather than each application) to determine which set of application-specific methods to invoke in response to external events (such as window messages arriving from end-users or packets arriving on communication ports).

Early object-oriented frameworks (such as MacApp and Interviews) originated in the domain of graphical user interfaces (GUIs). The Microsoft Foundation Classes (MFC) is a contemporary GUI framework that has become the de facto industry standard for creating graphical applications on PC platforms. Although MFC has limitations (such as lack of portability to non-PC platforms), its wide-spread adoption demonstrates the productivity benefits of reusing common frameworks to develop graphical business applications.

The next generation of OO application frameworks targeted at complex business and application domains. At the heart of this effort were the Object Request Broker (ORB) frameworks, which facilitate communication between local and remote objects. ORB frameworks eliminate many tedious, error-prone, and non-portable aspects of creating and managing distributed applications and reusable service components. This enables programmers to develop and deploy complex applications rapidly and robustly, rather than wrestling endlessly with low-level infrastructure concerns. Widely used ORB frameworks include CORBA, DCOM, and Java RMI.

In server-side development, a number of core tasks crop up over and over again. Such tasks can be pulled into a core framework, built and tested once, and reused across multiple projects. Utilizing this opportunity, many frameworks emerged that simplified the development of web based projects. As development of Web-based application servers and their applications expanded, so did the frameworks that supported these technologies. Currently, there are many software frameworks in the enterprise development space especially for the Java J2EE platform.

A good framework enhances the maintainability of software through API consistency, comprehensive documentation, and thorough testing. Some companies invest formally in frameworks and developers build up a library of components that they use often. Such actions reduce development time while improving delivered software quality - which means that developers can spend more time concentrating on the business-specific problem at hand rather than on the plumbing code behind it. There are also many mature frameworks available in the open source arena. Adopting such stable frameworks are more effective than going on to develop a framework from scratch.

End of Part I

Object Oriented Database Management Systems

2006-04-12T07:02:00.000-07:00

In today's world, Client-Server applications that rely on a database on the server as a data store while servicing requests from multiple clients are quite commonplace. Most of these applications use a Relational Database Management System (RDBMS) as their data store while using an object oriented programming language for development. This causes a certain inefficency as objects must be mapped to tuples in the database and vice versa instead of the data being stored in a way that is consistent with the programming model. The "impedance mismatch" caused by having to map objects to tables and vice versa has long been accepted as a necessary performance penalty. The following article is aimed at seeking out an alternative that avoids this penalty.This information was gleaned from the article “An Exploration Of Object Oriented Database Management Systems“ by Dare Obasanjo.

Overview of OODBMS

An OODBMS is the result of combining object oriented programming principles with database management principles. Object oriented programming concepts such as encapsulation, polymorphism and inheritance are enforced as well as database management concepts such as the ACID properties (Atomicity, Consistency, Isolation and Durability) which lead to system integrity, support for an ad hoc query language and secondary storage management systems which allow for managing very large amounts of data.

The Object Oriented Database Manifesto specifically lists the following features as mandatory for a system to support before it can be called an OODBMS; Complex objects, Object identity, Encapsulation, Types and Classes, Class or Type Hierarchies, Overriding, overloading and late binding, Computational completeness, Extensibility, Persistence, Secondary storage management, Concurrency, Recovery and an Ad Hoc Query Facility. An OODBMS is thus a full scale object oriented development environment as well as a database management system. Features that are common in the RDBMS world such as transactions, the ability to handle large amounts of data, indexes, deadlock detection, backup and restoration features and data recovery mechanisms also exist in the OODBMS world.

A primary feature of an OODBMS is that accessing objects in the database is done in a transparent manner such that interaction with persistent objects is no different from interacting with in-memory objects. This is very different from using an RDBMSs in that there is no need to interact via a query sub-language like SQL nor is there a reason to use a Call Level Interface such as ODBC, ADO or JDBC. Database operations typically involve obtaining a database root from the the OODBMS which is usually a data structure like a graph, vector, hash table, or set and traversing it to obtain objects to create, update or delete from the database.

Comparisons of OODBMSs to RDBMSs

There are concepts in the relational database model that are similar to those in the object database model. A relation or table in a relational database can be considered to be analogous to a class in an object database. A tuple is similar to an instance of a class but is different in that it has attributes but no behaviors. A column in a tuple is similar to a class attribute except that a column can hold only primitive data types while a class attribute can hold data of any type. Finally classes have methods which are computationally complete (meaning that general purpose control and computational structures are provided) while relational databases typically do not have computationally complete programming capabilities although some stored procedure languages come close.

Below is a list of advantages and disadvantages of using an OODBMS over an RDBMS with an object oriented programming language.

Advantages

Composite Objects and Relationships: Objects in an OODBMS can store an arbitrary number of atomic types as well as other objects. It is thus possible to have a large class which holds many medium sized classes which themselves hold many smaller classes, ad infinitum. In a relational database this has to be done either by having one huge table with lots of null fields or via a number of smaller, normalized tables which are linked via foreign keys. Having lots of smaller tables is still a problem since a join has to be performed every time one wants to query data based on the "Has-a" relationship between the entities. Also an object is a better model of the real world entity than the relational tuples with regards to complex objects. The fact that an OODBMS is better suited to handling complex,interrelated data than an RDBMS means that an OODBMS can outperform an RDBMS by ten to a thousand times depending on the complexity of the data being handled.

Class Hierarchy: Data in the real world is usually has hierarchical characteristics. The ever popular Employee example used in most RDBMS texts is easier to describe in an OODBMS than in an RDBMS. An Employee can be a Manager or not, this is usually done in an RDBMS by having a type identifier field or creating another table which uses foreign keys to indicate the relationship between Managers and Employees. In an OODBMS, the Employee class is simply a parent class of the Manager class.

Circumventing the Need for a Query Language: A query language is not necessary for accessing data from an OODBMS unlike an RDBMS since interaction with the database is done by transparently accessing objects. It is still possible to use queries in an OODBMS however.

No Impedence Mismatch: In a typical application that uses an object oriented programming language and an RDBMS, a signifcant amount of time is usually spent mapping tables to objects and back. There are also various problems that can occur when the atomic types in the database do not map cleanly to the atomic types in the programming language and vice versa. This "impedance mismatch" is completely avoided when using an OODBMS.

No Primary Keys: The user of an RDBMS has to worry about uniquely identifying tuples by their values and making sure that no two tuples have the same primary key values to avoid error conditions. In an OODBMS, the unique identification of objects is done behind the scenes via OIDs and is completely invisible to the user. Thus there is no limitation on the values that can be stored in an object.

One Data Model: A data model typically should model entities and their relationships, constraints and operations that change the states of the data in the system. With an RDBMS it is not possible to model the dynamic operations or rules that change the state of the data in the system because this is beyond the scope of the database. Thus applications that use RDBMS systems usually have an Entity Relationship diagram to model the static parts of the system and a seperate model for the operations and behaviors of entities in the application. With an OODBMS there is no disconnect between the database model and the application model because the entities are just other objects in the system. An entire application can thus be comprehensively modelled in one UML diagram.

Disadvantages

Schema Changes: In an RDBMS modifying the database schema either by creating, updating or deleting tables is typically independent of the actual application. In an OODBMS based application modifying the schema by creating, updating or modifying a persistent class typically means that changes have to be made to the other classes in the application that interact with instances of that class. This typically means that all schema changes in an OODBMS will involve a system wide recompile. Also updating all the instance objects within the database can take an extended period of time depending on the size of the database.

Language Dependence: An OODBMS is typically tied to a specific language via a specific API. This means that data in an OODBMS is typically only accessible from a specific language using a specific API, which is typically not the case with an RDBMS.

Lack of Ad-Hoc Queries: In an RDBMS, the relational nature of the data allows one to construct ad-hoc queries where new tables are created from joining existing tables then querying them. Since it is currently not possible to duplicate the semantics of joining two tables by "joining" two classes then there is a loss of flexibility with an OODBMS. Thus the queries that can be performed on the data in an OODBMS is highly dependent on the design of the system.

List of Object Oriented Database Management Systems

Proprietary
Object Store
O2
Gemstone
Versant
Ontos
DB/Explorer ODBMS
Ontos
Poet
Objectivity/DB
EyeDB

Open Source
Ozone
Zope
FramerD
XL2

Conclusion
The gains from using an OODBMS while developing an application using an OO programming language are many. The savings in development time by not having to worry about seperate data models as well as the fact that there is less code to write due to the lack of impedance mismatch is very attractive. There is little reason to pick an RDBMS over an OODBMS system for new application development unless there are legacy issues that have to be dealt with.

Technorati links: Databases Open Source