Chris Umbel

Employing Solr/Lucene with SQL Server for Full-Text Searching

Solr logo I've been fiddling with Lucene a good bit of late and have been quite impressed. It's more than just a "blazing fast" full-text indexing system, especially when implemented via Solr. With Solr it becomes an incredibly scalable, full-featured and extensible search engine platform.

I had always assumed that the Lucene stack wasn't for me. For the most part I store my data either in SQL Server or MySQL, both of which have perfectly adequate full-text search capability. It turns out that I could have saved myself a few headaches and saved my employer some money by adopting Solr and not writing my own faceting, caching, etc.

Lucene Logo

Naturally, Lucene/Solr isn't for everyone. If you just have a few hundred-thousand rows of text that you want to perform some basic searches on under light load then you're probably better off using the full-text search facility within your RDMS.

However, If you need to scale out widely, perform faceted searches or use some advanced/custom search techniques then it's probably worth looking into Solr, even if you're already deployed under an RDBMS with full-text support.

Apache logo

In this article I'll outline the *VERY* basics of getting Solr up and running using SQL Server as a data source. While I'm actually doing this in production under Linux I'm going to tailor my instructions to Windows here to appeal to the average SQL Server DBA. I'll also employ the AdventureWorks sample database for demonstrative purposes.

Note that you'll have to have TCP/IP enabled in your SQL Server instance. Named pipes, VIA and shared memory won't cut it.

Step 1: Download and install Java

Solr and Lucene are written in Java so a Java Runtime is a prerequisite. It can be downloaded here.

After installation make sure to set the JRE_HOME environment variable to your Java install directory i.e. C:\Program Files\Java\jre6

Step 2: Download Tomcat

Tomcat Logo Solr requires a servlet container. I recommend Tomcat which can be downloaded here. Then extract it to C:\tomcat6 (Note that I'm going to hang this all right off C:\ to keep the tutorial simple).

Step 3: Download Solr

This whole thing's about Solr, right? You can pick it up here. Extract the contents to a temporary location.

Step 4: Move Solr into Tomcat

Copy:

  • apache-solr-1.4.0\example\solr to c:\tomcat6
  • apache-solr-1.4.0\dist\apache-solr-1.4.0.war to c:\tomcat6\webapps\solr.war

Congratulations! Solr is essentially operational now, or would be upon starting tomcat. It'd just be devoid of data.

Step 5: Download and install a SQL Server JDBC driver

In order for Java to talk to SQL Server we'll have to supply a JDBC driver. There are many available but I used Microsoft's which can be downloaded here. Note that there's also a unix version available.

Now create a C:\tomcat6\solr\lib folder. Copy the file sqljdbc4.jar out of the archive downloaded above into it.

Step 6: Configure the import

Create a C:\tomcat6\solr\conf\data-config.xml file and put the following content in it, modifying it to the details of your configuration, naturally. This file defines what data we're going to import (SQL statement), how we're going to get it (definition of JDBC driver class) and where form (connection string and authentication information). The resultant columns are then mapped to fields in Lucene.

<dataConfig>
  <dataSource type="JdbcDataSource"
	    driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
	    url="jdbc:sqlserver://localhost\INSTANCENAME;databaseName=AdventureWorks" 
	    user="TESTUSER"
	    password="TESTUSER"/>
  <document name="productreviews">
    <entity name="review" query="
        SELECT ProductReviewID, ProductID, EmailAddress, Comments
        FROM Production.ProductReview">
	    
      <field column="ProductReviewID" name="id"/>
      <field column="ProductID" name="product_id"/>
      <field column="EmailAddress" name="email"/>
      <field column="Comments" name="comments"/>
    </entity>
  </document>
</dataConfig>

Step 7: Tell Solr about our import

Add the following requesthandler to C:\tomcat6\solr\conf\solrconfig.xml:

<requestHandler name="/dataimport"
    class="org.apache.solr.handler.dataimport.DataImportHandler">
  <lst name="defaults">
    <str name="config">C:\tomcat6\solr\conf\data-config.xml</str>
  </lst>
</requestHandler>

This essentially allows Solr to perform operations of the data import we defined above upon a visit to the /dataimport URL.

Step 8: Configure schema

Ensure the fields are set up correctly in C:\tomcat6\solr\conf\schema.xml. There will be plenty of example fields, copy fields dynamic fields and a default search field in there to start with. Just get rid of them.

 <fields>
  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="comments" type="text" indexed="true" stored="true"/>
  <field name="email" type="string" indexed="true" stored="true"/>
  <field name="product_id" type="int" indexed="true" stored="true"/>
  <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
 </fields>

 <copyField source="comments" dest="text"/>
 <copyField source="email" dest="text"/>
  
 <defaultSearchField>text</defaultSearchField>

There's quite a bit of power that I won't go into in this article dealing with Solr schemas. Dynamic fields, copy fields, compression... Needless to say it's worth reading up on which you can do here.

Step 9: Start Tomcat

OK! We're finally configured well enough for an import. All we have to do is start up Tomcat. Make sure you're in Tomcat's directory as the quick-and-dirty configuration I showed you here requires it in order to find the Solr webapp.

c:\tomcat6>.\bin\startup.bat

If you'd like to move the Solr webapp elsewhere on the filesystem, remove the requirement for starting in Tomcat's directory or perform an advanced configuration please see the Solr with Apache Tomcat article in the Solr Wiki. Pay special attention to the section labeled, "Installing Solr instances under Tomcat" where they show you how to create contexts.

Step 10: Import

Now visit http://localhost:8080/solr/dataimport?command=full-import with a web browser. That'll trigger the import. Because we're just importing a small amount of test data the process will be nearly instantaneous.

Step 11: Observe your results

That's it! You can verify your work by issuing a query against Solr with a RESTful query like http://localhost:8080/solr/select/?q=heavy&version=2.2&start=0&rows=10&indent=on that searches the index for all reviews with the word heavy in the comments.

Pitfalls

There are a number of reasons a data import could fail, most likely due to problem with the configuration of data-config.xml. To see for sure what's going on you'll have to look in C:\tomcat6\solr\logs\catalina.*.

If you happen to find that your import is failing due to system running out of memory, however, there's an easy, SQL Server specific fix. Add responseBuffering=adaptive and selectMethod=cursor to the url attribute of the dataSource node in data-config.xml. That stops the JDBC driver from trying to load the entire result set into memory before reads can occur.

Next Steps

So we've gone from zero to a functioning Solr instance rather quickly there. Not too shabby! However, we've only queried Solr through REST. Libraries like solrnet are handy for wrapping objects around the data in .Net. For example:

/* review domain object */
public class Review
{
    /*  attribute decorations tell solrnet how to map
        the properties to Solr fields. */
    [SolrUniqueKey("id")]
    public string Id { get; set; }

    [SolrField("product_id")]
    public string ProductID { get; set; }

    [SolrField("email")]
    public string EmailAddress { get; set; }

    [SolrField("comments")]
    public string Text { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        /* create a session */
         Startup.Init<Review>("http://localhost:8080/solr");
        ISolrOperations<Review> solr =
                  ServiceLocator.Current.GetInstance<ISolrOperations<Review>>();
        /* issue a lucene query */
        ICollection<Review> results = solr.Query("comments:heavy");

        foreach (Review r in results)
        {
            Console.WriteLine(r.Id);
        }
    }
}

Resulting in:

2
4

If you're totally new to Solr it's worth checking out the wiki. It outlines the handy features such as replication, facets and distribution.

Sat Dec 05 2009 23:12:00 GMT+0000 (UTC)

8 Comments Comment Feed - Permalink
Thanks alot for the detailed step by step procedure. 

I'm facing a problem while using the data-config. while starting solr in tomcat it is saying that "content is not allowed in prolog". When I take off the dih, solr is up and running fine. I couldn't find any syntax mistakes in my data-config. Can there be any other reason for this error?
Please let me know.
by ram on Mon Mar 14 2011 20:45:48 GMT+0000 (UTC)
Thanks Chris. How did you get SolrNET to work with a console app. I am getting an error. a system.web dependency
by Brian Walters on Mon Mar 28 2011 17:27:46 GMT+0000 (UTC)
Brian,

Been a while since I've done any .Net work, but I suspect you need a reference to System.Web.dll
by Chris Umbel on Wed Mar 30 2011 12:45:26 GMT+0000 (UTC)
Nice tutorial Chris thanks a million! ran into a couple of small problems following it.

Error: Neither the JAVA_HOME nor the JRE_HOME environment variable is defined 
At least one of these environment variable is needed to run this program 

Solution: Edit startup.bat add path to your java dir eg:
JAVA_HOME=c:\Program Files\Java\jdk1.7.0

---

When doing import there if you get error 

Error: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459)

Solution: apache-solr-dataimporthandler-3.4.0.jar needs to be copied from solr dist folder to \tomcat6\solr\lib\
by Proge on Mon Sep 26 2011 14:05:34 GMT+0000 (UTC)
Hey there, thanks for the feedback!

Yeah, when I wrote the post it was during Solr 1.4.x. A lot has changed since then. I'll update the post shortly based on your findings. The JAVA_HOME is important because windows users might not think to have that set.

Thanks again for dropping me a line!
by Chris Umbel on Mon Sep 26 2011 14:37:39 GMT+0000 (UTC)
Thank you,This tutorial saved my time ..Thanks once again.
by Mukthyar on Sat Nov 05 2011 12:35:24 GMT+0000 (UTC)
Thanks Chris!! This tutorial helped me in configuring Solr in a couple of minutes. Very helpful and we appreciate a lot.
by Mark on Tue Dec 20 2011 10:52:43 GMT+0000 (UTC)
Thanks for your tutorial. It was written very well and high level - perfect for beginners. 
by Shan on Sat Jun 02 2012 12:33:15 GMT+0000 (UTC)
Add a comment
Name
E mail (Private)
URL
Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google