In our last post we built a simple index over file system. While our example works fine but cannot be extended over clustered environment and also cannot be used for a large document because of memory foot print. Lucene doesn’t provide a direct in built JDBC interface but Compass does, though the JDBC interface of Compass is not compatible with Lucene 3.6. We will extend Compass JDBC interface as per Lucene 3.6 changes.
First we will create our Maven Project in Eclipse and add required dependencies.
pom.xml
[sourcecode language=”xml”]
<project
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.mumz.test.lucene</groupId>
<artifactId>ApacheLuceneTest</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>ApacheLuceneTest</name>
<description>ApacheLuceneTest</description>
<dependencies>
<dependency>
<artifactId>lucene-core</artifactId>
<groupId>org.apache.lucene</groupId>
<type>jar</type>
<version>3.6.1</version>
</dependency>
<dependency>
<groupId>org.compass-project</groupId>
<artifactId>compass</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.21</version>
</dependency>
</dependencies>
</project>
[/sourcecode]
Second since JdbcDirectory provided with Compass 2.2.0 doesn’t implement all the methods defined in abstract class Directory, we will provide our own implementation which will extend JdbcDirectory.
JdbcDirectory
[sourcecode language=”java”]
package com.mumz.test.lucene.jdbc;
import java.io.IOException;
import javax.sql.DataSource;
import org.apache.lucene.store.jdbc.JdbcDirectory;
import org.apache.lucene.store.jdbc.JdbcDirectorySettings;
import org.apache.lucene.store.jdbc.JdbcStoreException;
import org.apache.lucene.store.jdbc.dialect.Dialect;
import org.apache.lucene.store.jdbc.support.JdbcTable;
/**
* The Class MyJDBCDirectory.
*
* @author prabhat.jha
*/
public class MyJDBCDirectory extends JdbcDirectory {
/**
* Instantiates a new my jdbc directory.
*
* @param dataSource
* the data source
* @param dialect
* the dialect
* @param settings
* the settings
* @param tableName
* the table name
*/
public MyJDBCDirectory(DataSource dataSource, Dialect dialect, JdbcDirectorySettings settings, String tableName) {
super(dataSource, dialect, settings, tableName);
}
/**
* Instantiates a new my jdbc directory.
*
* @param dataSource the data source
* @param dialect the dialect
* @param tableName the table name
*/
public MyJDBCDirectory(DataSource dataSource, Dialect dialect, String tableName) {
super(dataSource, dialect, tableName);
}
/**
* Instantiates a new my jdbc directory.
*
* @param dataSource the data source
* @param settings the settings
* @param tableName the table name
* @throws JdbcStoreException the jdbc store exception
*/
public MyJDBCDirectory(DataSource dataSource, JdbcDirectorySettings settings, String tableName) throws JdbcStoreException {
super(dataSource, settings, tableName);
}
/**
* Instantiates a new my jdbc directory.
*
* @param dataSource the data source
* @param table the table
*/
public MyJDBCDirectory(DataSource dataSource, JdbcTable table) {
super(dataSource, table);
}
/**
* Instantiates a new my jdbc directory.
*
* @param dataSource the data source
* @param tableName the table name
* @throws JdbcStoreException the jdbc store exception
*/
public MyJDBCDirectory(DataSource dataSource, String tableName) throws JdbcStoreException {
super(dataSource, tableName);
}
/**
* (non-Javadoc).
*
* @return the string[]
* @throws IOException Signals that an I/O exception has occurred.
* @see org.apache.lucene.store.Directory#listAll()
*/
@Override
public String[] listAll() throws IOException {
return super.list();
}
}
[/sourcecode]
JdbcDirectory has a predefined table structure where it works on a clob field.
Third we will write our Indexer code. In our code we will achieve following:
- Add few records in our database by using
JDBCBatchInsert.java - Create table to be used by
JdbcDirectoryusingcreateIndexTablemethod - Build an index on this record by using
indexmethod
Let’s add some records in our database for testing our application.
JDBCBatchInsert.java
[sourcecode language=”java”]
package com.mumz.test.lucene.jdbc;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
/**
* The Class JDBCBatchInsert.
* @author prabhat.jha
*/
public class JDBCBatchInsert {
/** The Constant QUERY. */
private static final String QUERY = "INSERT INTO BOOKS (BOOK_ID, BOOK_NAME, BOOK_AUTHOR, BOOK_PUBLISHER) VALUES (?, ?, ?, ?)";
/** The Constant BOOK_FIRST_PART. */
private final static String[] BOOK_FIRST_PART = {"Spring", "Hibernate", "Lucene", "Mahout", "JPA", "JSF", "Swing", "Hadoop", "Hbase"};
/** The Constant BOOK_LAST_PART. */
private final static String[] BOOK_LAST_PART = {"In Action", "Complete Reference", "Demystified", "Tutorial", "Explained",
"Simplified", "Bible", "Cook Book", "Crash Course"};
/** The Constant BLANK_SPACE. */
private final static String BLANK_SPACE = " ";
/**
* Insert records.
*/
public void insertRecords() {
Connection connection = null;
PreparedStatement pstmt = null;
ResultSet resultSet = null;
try {
connection = JDBCDatabaseUtil.getConnection();
pstmt = connection.prepareStatement(QUERY);
int index = 0;
for (String firstPart : BOOK_FIRST_PART) {
for (String lastPart : BOOK_LAST_PART) {
pstmt.setInt(1, ++index);
pstmt.setString(2, firstPart + BLANK_SPACE + lastPart);
pstmt.setString(3, "Test Author" + BLANK_SPACE + firstPart + BLANK_SPACE + lastPart + index);
pstmt.setString(4, "Test Publisher" + BLANK_SPACE + firstPart + BLANK_SPACE + lastPart + index);
pstmt.addBatch();
}
}
pstmt.executeBatch();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (resultSet != null) {
resultSet.close();
}
if (pstmt != null) {
pstmt.close();
}
if (connection != null) {
connection.close();
}
resultSet = null;
pstmt = null;
connection = null;
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
[/sourcecode]
Next we will add index in our database.
JDBCIndexer.java
[sourcecode language=”java” highlight=”75,76,110,111,112,113,114,115,116,117,118,119,120,121,122,124,125,126,127,128,129,130,131,132,134,135,136,137,138,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150″]
package com.mumz.test.lucene.jdbc;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.jdbc.JdbcDirectory;
import org.apache.lucene.store.jdbc.dialect.MySQLDialect;
import org.apache.lucene.util.Version;
/**
* The Class JDBCIndexer.
*
* @author prabhat.jha
*/
public class JDBCIndexer {
/** The jdbc directory. */
private Directory jdbcDirectory = null;
/**
* Instantiates a new jDBC indexer.
*
* @param jdbcDirectory
* the jdbc directory
*/
public JDBCIndexer(Directory jdbcDirectory) {
super();
this.jdbcDirectory = jdbcDirectory;
}
/**
* Gets the jdbc directory.
*
* @return the jdbc directory
*/
public Directory getJdbcDirectory() {
if (jdbcDirectory == null) {
throw new IllegalStateException("Index not yet build, rerun indexing");
}
return jdbcDirectory;
}
/**
* Sets the jdbc directory.
*
* @param jdbcDirectory
* the new jdbc directory
*/
public void setJdbcDirectory(Directory jdbcDirectory) {
this.jdbcDirectory = jdbcDirectory;
}
/**
* Builds the index.
*/
public void buildIndex() {
createAndBuildIndex();
}
/**
* Creates the and build index.
*/
private void createAndBuildIndex() {
createIndexTable();
index();
}
/**
* Index.
*/
private void index() {
Analyzer analyzer = new SimpleAnalyzer(Version.LUCENE_36);
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_36, analyzer);
IndexWriter indexWriter = null;
try {
indexWriter = new IndexWriter(getJdbcDirectory(), indexWriterConfig);
addIndex(indexWriter);
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (indexWriter != null) {
try {
indexWriter.close();
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
indexWriter = null;
}
}
}
}
/**
* Add index on records present in BOOKS table
*
* @param indexWriter
* the index writer
*/
private void addIndex(IndexWriter indexWriter) {
try {
Connection connection = JDBCDatabaseUtil.getConnection();
String query = "SELECT BOOK_ID, BOOK_NAME, BOOK_AUTHOR, BOOK_PUBLISHER FROM BOOKS";
PreparedStatement pstmt = connection.prepareStatement(query);
ResultSet resultSet = pstmt.executeQuery();
while (resultSet.next()) {
Document document = new Document();
document.add(new Field("name", String.valueOf(resultSet.getString(2)), Field.Store.YES, Field.Index.ANALYZED));
document.add(new Field("author", String.valueOf(resultSet.getString(3)), Field.Store.YES, Field.Index.ANALYZED));
document.add(new Field("publisher", String.valueOf(resultSet.getString(4)), Field.Store.YES, Field.Index.ANALYZED));
indexWriter.addDocument(document);
}
} catch (SQLException e) {
e.printStackTrace();
}
}
/**
* Creates the index table.
*/
private void createIndexTable() {
if (this.jdbcDirectory == null) {
setJdbcDirectory(new MyJDBCDirectory(JDBCDatabaseUtil.getDataSource(), new MySQLDialect(), "LUCENE_INDEX_TABLE"));
}
try {
/**
* No need to manually create index table, create method will
* automatically create it.
*/
((JdbcDirectory) getJdbcDirectory()).create();
} catch (IOException e) {
e.printStackTrace();
}
}
}
[/sourcecode]
Last we have used getDataSource and getConnection methods while inserting records and building index, these two methods are utility methods part of our database util.
JDBCDatabaseUtil.java
[sourcecode language=”java”]
package com.mumz.test.lucene.jdbc;
import java.sql.Connection;
import java.sql.SQLException;
import javax.sql.DataSource;
import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
/**
* The Class JDBCDatabaseUtil.
* @author prabhat.jha
*/
public class JDBCDatabaseUtil {
/**
* Gets the data source.
*
* @return the data source
*/
public static DataSource getDataSource() {
MysqlDataSource dataSource = new MysqlDataSource();
dataSource.setUser("root");
dataSource.setPassword("root");
dataSource.setUrl("jdbc:mysql://localhost:3306/search_schema?emulateLocators=true&useUnicode=true&characterEncoding=UTF-8&useFastDateParsing=false");
return dataSource;
}
/**
* Gets the connection.
*
* @return the connection
* @throws SQLException
* the sQL exception
*/
public static Connection getConnection() throws SQLException {
return getDataSource().getConnection();
}
}
[/sourcecode]
Finally sql script to create our database schema which will be used in this tutorial.
[sourcecode language=”sql”]
SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE=’TRADITIONAL’;
DROP SCHEMA IF EXISTS `search_schema` ;
CREATE SCHEMA IF NOT EXISTS `search_schema` DEFAULT CHARACTER SET utf8 ;
USE `search_schema` ;
— —————————————————–
— Table `search_schema`.`books`
— —————————————————–
DROP TABLE IF EXISTS `search_schema`.`books` ;
CREATE TABLE IF NOT EXISTS `search_schema`.`books` (
`BOOK_ID` INT(11) NOT NULL AUTO_INCREMENT ,
`BOOK_NAME` VARCHAR(45) NOT NULL ,
`BOOK_AUTHOR` VARCHAR(45) NOT NULL ,
`BOOK_PUBLISHER` VARCHAR(45) NOT NULL ,
PRIMARY KEY (`BOOK_ID`) )
ENGINE = InnoDB
AUTO_INCREMENT = 82
DEFAULT CHARACTER SET = utf8;
SET SQL_MODE=@OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;
[/sourcecode]
Leave a Reply