Chris Umbel

Employing Solr/Lucene with SQL Server for Full-Text Searching

Solr logo I've been fiddling with Lucene a good bit of late and have been quite impressed. It's more than just a "blazing fast" full-text indexing system, especially when implemented via Solr. With Solr it becomes an incredibly scalable, full-featured and extensible search engine platform.

I had always assumed that the Lucene stack wasn't for me. For the most part I store my data either in SQL Server or MySQL, both of which have perfectly adequate full-text search capability. It turns out that I could have saved myself a few headaches and saved my employer some money by adopting Solr and not writing my own faceting, caching, etc.

Lucene Logo

Naturally, Lucene/Solr isn't for everyone. If you just have a few hundred-thousand rows of text that you want to perform some basic searches on under light load then you're probably better off using the full-text search facility within your RDMS.

However, If you need to scale out widely, perform faceted searches or use some advanced/custom search techniques then it's probably worth looking into Solr, even if you're already deployed under an RDBMS with full-text support.

Apache logo

In this article I'll outline the *VERY* basics of getting Solr up and running using SQL Server as a data source. While I'm actually doing this in production under Linux I'm going to tailor my instructions to Windows here to appeal to the average SQL Server DBA. I'll also employ the AdventureWorks sample database for demonstrative purposes.

Note that you'll have to have TCP/IP enabled in your SQL Server instance. Named pipes, VIA and shared memory won't cut it.

Step 1: Download and install Java

Solr and Lucene are written in Java so a Java Runtime is a prerequisite. It can be downloaded here.

After installation make sure to set the JRE_HOME environment variable to your Java install directory i.e. C:\Program Files\Java\jre6

Step 2: Download Tomcat

Tomcat Logo Solr requires a servlet container. I recommend Tomcat which can be downloaded here. Then extract it to C:\tomcat6 (Note that I'm going to hang this all right off C:\ to keep the tutorial simple).

Step 3: Download Solr

This whole thing's about Solr, right? You can pick it up here. Extract the contents to a temporary location.

Step 4: Move Solr into Tomcat


  • apache-solr-1.4.0\example\solr to c:\tomcat6
  • apache-solr-1.4.0\dist\apache-solr-1.4.0.war to c:\tomcat6\webapps\solr.war

Congratulations! Solr is essentially operational now, or would be upon starting tomcat. It'd just be devoid of data.

Step 5: Download and install a SQL Server JDBC driver

In order for Java to talk to SQL Server we'll have to supply a JDBC driver. There are many available but I used Microsoft's which can be downloaded here. Note that there's also a unix version available.

Now create a C:\tomcat6\solr\lib folder. Copy the file sqljdbc4.jar out of the archive downloaded above into it.

Step 6: Configure the import

Create a C:\tomcat6\solr\conf\data-config.xml file and put the following content in it, modifying it to the details of your configuration, naturally. This file defines what data we're going to import (SQL statement), how we're going to get it (definition of JDBC driver class) and where form (connection string and authentication information). The resultant columns are then mapped to fields in Lucene.

  <dataSource type="JdbcDataSource"
  <document name="productreviews">
    <entity name="review" query="
        SELECT ProductReviewID, ProductID, EmailAddress, Comments
        FROM Production.ProductReview">
      <field column="ProductReviewID" name="id"/>
      <field column="ProductID" name="product_id"/>
      <field column="EmailAddress" name="email"/>
      <field column="Comments" name="comments"/>

Step 7: Tell Solr about our import

Add the following requesthandler to C:\tomcat6\solr\conf\solrconfig.xml:

<requestHandler name="/dataimport"
  <lst name="defaults">
    <str name="config">C:\tomcat6\solr\conf\data-config.xml</str>

This essentially allows Solr to perform operations of the data import we defined above upon a visit to the /dataimport URL.

Step 8: Configure schema

Ensure the fields are set up correctly in C:\tomcat6\solr\conf\schema.xml. There will be plenty of example fields, copy fields dynamic fields and a default search field in there to start with. Just get rid of them.

  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="comments" type="text" indexed="true" stored="true"/>
  <field name="email" type="string" indexed="true" stored="true"/>
  <field name="product_id" type="int" indexed="true" stored="true"/>
  <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>

 <copyField source="comments" dest="text"/>
 <copyField source="email" dest="text"/>

There's quite a bit of power that I won't go into in this article dealing with Solr schemas. Dynamic fields, copy fields, compression... Needless to say it's worth reading up on which you can do here.

Step 9: Start Tomcat

OK! We're finally configured well enough for an import. All we have to do is start up Tomcat. Make sure you're in Tomcat's directory as the quick-and-dirty configuration I showed you here requires it in order to find the Solr webapp.


If you'd like to move the Solr webapp elsewhere on the filesystem, remove the requirement for starting in Tomcat's directory or perform an advanced configuration please see the Solr with Apache Tomcat article in the Solr Wiki. Pay special attention to the section labeled, "Installing Solr instances under Tomcat" where they show you how to create contexts.

Step 10: Import

Now visit http://localhost:8080/solr/dataimport?command=full-import with a web browser. That'll trigger the import. Because we're just importing a small amount of test data the process will be nearly instantaneous.

Step 11: Observe your results

That's it! You can verify your work by issuing a query against Solr with a RESTful query like http://localhost:8080/solr/select/?q=heavy&version=2.2&start=0&rows=10&indent=on that searches the index for all reviews with the word heavy in the comments.


There are a number of reasons a data import could fail, most likely due to problem with the configuration of data-config.xml. To see for sure what's going on you'll have to look in C:\tomcat6\solr\logs\catalina.*.

If you happen to find that your import is failing due to system running out of memory, however, there's an easy, SQL Server specific fix. Add responseBuffering=adaptive and selectMethod=cursor to the url attribute of the dataSource node in data-config.xml. That stops the JDBC driver from trying to load the entire result set into memory before reads can occur.

Next Steps

So we've gone from zero to a functioning Solr instance rather quickly there. Not too shabby! However, we've only queried Solr through REST. Libraries like solrnet are handy for wrapping objects around the data in .Net. For example:

/* review domain object */
public class Review
    /*  attribute decorations tell solrnet how to map
        the properties to Solr fields. */
    public string Id { get; set; }

    public string ProductID { get; set; }

    public string EmailAddress { get; set; }

    public string Text { get; set; }

class Program
    static void Main(string[] args)
        /* create a session */
        ISolrOperations<Review> solr =
        /* issue a lucene query */
        ICollection<Review> results = solr.Query("comments:heavy");

        foreach (Review r in results)

Resulting in:


If you're totally new to Solr it's worth checking out the wiki. It outlines the handy features such as replication, facets and distribution.

Sat Dec 05 2009 23:12:00 GMT+0000 (UTC)

Follow Chris
RSS Feed