Saturday, March 21, 2009

Full-Text Search using Oracle Text

This is in continuation with my previous post on Full Text Search in which I discussed about MySQL’s built-in Full Text Search engine and external Open Source Full Text Search engines as options for performing integrating full-text search features in java applications. This time I want to share information on building Full-Text Search Applications with Oracle Text.

Oracle Text:

Oracle Text is a powerful search technology built into all Oracle Database editions, including the free Express Edition (XE). The development APIs provided by Oracle Text allow software developers to easily implement full-featured content search applications.

Oracle Text is suitable for a wide variety of search-related use cases and storage structures. Application areas for Text include e-business, document and records management as well as issue tracking just to name a few. Retrievable text can reside in a structured form inside the database or in unstructured form either in a local file system or on the Web.

Oracle Text can be used to search structured and unstructured documents complementing the SQL wildcard matching. Oracle Text provides a complete SQL-based search API that consists of custom query operators, DDL syntax extensions, a set of PL/SQL procedures and database views. Text API gives the application developer full control over indexing, queries, security, presentation, and software configuration that is sometimes required. Oracle Text is also programming-language agnostic and works equally well for PHP as well as Java applications.

Setting Up Oracle Text:

Oracle Text is installed with an Oracle Database XE installation by default. With other database editions, you need to install the Oracle Text feature yourself. Once the feature is present, you only need to create a normal database user and grant the CTXAPP role to the user. This will allow the user to execute certain index management procedures:

Indexing Process and Searching

Oracle Text indexes retrievable data items before users are able to find content with search. This is a common approach used to ensure adequate search performance. The Oracle Text indexing process is modeled after a pipeline, where data items retrieved from a data store pass through a series of transformations before their keywords are added to the index. The indexing process is split into multiple phases, where each phase is handled by a separate entity and configurable by the application developer.

Oracle Text has different index types that are suitable for different purposes. For full-text search with large documents, the CONTEXT index is the appropriate index type. The indexing process includes the following phases:

  1. Data Retrieval: Data is simply fetched from a data store, for example, a Web page, database large object, or local file system, and passed as a stream of data to the next phase.
  2. Filtering: The filters are responsible for converting data in different file formats to plain text. The other components in the indexing pipeline only process plain text data and don't know about file formats such as Microsoft Word or Excel.
  3. Sectioning: The sectioner adds metadata about the structure of the original data item.
  4. Lexing: A stream of characters is split into words based on the language of the item.
  5. Indexing: In this final phase, the keywords are added to the actual index.

Once the index has been built, an application can use plain SQL queries to execute a search entered by an end user.

Searching

The CONTAINS operator is used for searching CONTEXT indexes.

Index Maintenance

Because base table data is replicated by the index, the data needs to be periodically synchronized to the index. Index maintenance procedures can be found in the CTX_DDL PL/SQL package.

Summary

Oracle Text allows users to create full-text index on a single column / multiple columns in a single table as well across multiple tables in a database. Details on creating the index, searching and index maintence is discussed comprehensively in the OTN Developer article on full text indexing.

References:

1. OTN Developer article on full text indexing

2. OTN Discussion Forum - Topic on multi-table indexing

3. Thread on full-text indexing

39 comments:

Rupam Srivastava said...

Hi Roshan.
Thanks for this very useful post.
Exactly what I was looking for.
A brief description given by you and a handful of useful and comprehensive links.!!
Keep up the good work
Thanks again !!

投癢癢 said...

要像鐵鎚和釘子一樣,永遠向著定點努力。 ....................................................

龐克搖滾 said...

來給你加加油~打打氣!!!更新之餘,也要注意休息哦~~........................................

曉豪 said...

累死了…來去看看文章轉換心情~..................................................

佳玲曉豪 said...

hello~~........................................

黃政弘 said...

77p2p影片區hi5 tv 免費影片xxx383美女寫真aa一夜情台中援交妹視訊甜心寶貝直播貼片av成人網g8mm 視訊壞朋友論壇視訊美女msvt中部人聊天室視訊網愛聊天室網路援交168論壇辣妹貼圖新竹援交38ga片下載全國最大俱樂部豆豆聊天室1007視訊xvideo打飛機專用網哈尼視訊援交友aio34c蒼井空影片下載avdvd69性殿dodo豆豆聊天室色美眉部落格 2,視訊主播脫衣秀台南視訊月宮貼圖情趣 商品拓網交友-情色視訊拓網交友情色視訊免費視訊美女,情人視訊網自拍線上av免費影片線上免費av影片咆哮小老鼠專區一葉情貼影禁區kiss168一葉情貼影色影cu成人bt色站girl5320cu成人bt色片

少菁 said...

路過--你好嗎..很棒的BLOG.........................................

KarolR_Sundqui22855 said...

Habit is a second nature. ........................................

琦竹 said...

快樂與滿足的秘訣,就在全心全意投注於現在的每一分,每一秒上 ....................................................

CrystlePiper8455 said...

每一粒厄運的種子,卻包孕著未來豐盛的果實........................................

丁SeritaG上心 said...

請繼續發表好文!加油加油再加油!........................................

M12aeganT_Moe12 said...

喜歡你的部落格,留言請您繼續加油.............................................

振宇 said...

blog is great~~祝你人氣高高~ ........................................

文迪 said...

凡是遇到困擾的問題,不要把它當作可怕的,討厭的,無奈的遭遇,而要把它當作歷練、訓練和幫助。......................................................

冠宛君中 said...

出遊不拘名勝,有景就是好的..................................................

趙佳治 said...

生命如夏花洵爛;死如秋葉之靜美。 ..................................................

怡潔 said...

No pains, no gains...................................................................

育財 said...

感謝不吝分享您的心得.................................................................

李哲維 said...

知識可以傳授,智慧卻不行。每個人必須成為他自己。....................................................................

王瑞 said...

幸福不是一切,人還有責任。.................................................................

juliancu said...

成熟,就是有能力適應生活中的模糊。.................................................................

云依恩HFH謝鄭JTR安 said...

睇完之後覺得有d頓悟..感謝分享...................................................................

宥妃宥妃 said...

世間唯一永恆的,就是改變..................................................

佳皓佳皓 said...

希望我的支持可以帶給你快樂--加油.............................................................

佩怡佩怡 said...

加油!充實內函最重要!Beauty is but skin- deep...................................................................

蔡舜娟蔡舜娟 said...

好多很有用資訊...感謝你的分享喔............................................................

宮惠如宮惠如 said...

可愛的小天使~~~支持你~~~..................................................................

洪瑋婷洪瑋婷 said...

來幫忙拼人氣~Go Go Go...........................................................

RicoLisi0802志竹 said...

人生最重要的一件事,就是從生活中認識你自己。............................................................

莉璇藍 said...

很精彩的部落格 期待你的繼續加油..................................................................

謝翁穎翰毓珍 said...

More haste, less speed..................................................................

恩宛玲如 said...

Poverty tries friends...................................................................

王辛江淑萍康 said...

人生中最好的禮物就是屬於自己的一部份............................................................

esthermelvin said...

很精彩的部落格 期待你的繼續加油..................................................

秋懿綺懿綺娥 said...

教育的目的,不在應該思考什麼,而是教吾人怎樣思考......................................................................

凡柯柯豐柯柯柯 said...

好文不寂寞~支持!!!!@@a 搞錯了,這不是論壇推文 XDDD............................................................

幸平平平平杰 said...

^^ 謝謝你的分享,祝你生活永遠多彩多姿!............................................................

麗王王珠 said...

無私分享,很不錯哦~謝謝~~.................................[/url]...............

应瑞生 said...

看著你的BLOG 好多朋友都回應 真厲害..................................................