java – Using Solr search index as a database – is this wrong?

java – Using Solr search index as a database – is this wrong?

Yes, you can use SOLR as a database but there are some really serious caveats :

  1. SOLRs most common access pattern, which is over http doesnt respond particularly well to batch querying. Furthermore, SOLR does NOT stream data — so you cant lazily iterate through millions of records at a time. This means you have to be very thoughtful when you design large scale data access patterns with SOLR.

  2. Although SOLR performance scales horizontally (more machines, more cores, etc..) as well as vertically (more RAM, better machines, etc), its querying capabilities are severely limited compared to those of a mature RDBMS. That said, there are some excellent functions, like the field stats queries, which are quite convenient.

  3. Developers who are used to using relational databases will often run into problems when they use the same DAO design patterns in a SOLR paradigm, because of the way SOLR uses filters in queries. There will be a learning curve for developing the right approach to building an application that uses SOLR for part of its large queries or statefull modifications.

  4. The enterprisy tools that allow for advanced session management and statefull entities that many advanced web-frameworks (Ruby, Hibernate, …) offer will have to be thrown completely out the window.

  5. Relational databases are meant to deal with complex data and relationships – and they are thus accompanied by state of the art metrics and automated analysis tools. In SOLR, Ive found myself writing such tools and manually stress-testing alot, which can be a time sink.

  6. Joining : this is the big killer. Relational databases support methods for building and optimizing views and queries that join tuples based on simple predicates. In SOLR, there arent any robust methods for joining data across indices.

  7. Resiliency : For high availability, SolrCloud uses a distributed file system underneath (i.e. HCFS). This model is quite different then that of a relational database, which usually does resiliency using slaves and masters, or RAID, and so on. So you have to be ready to provide the resiliency infrastructure SOLR requires if you want it to be cloud scalable and resistent.

That said – there are plenty of obvious advantages to SOLR for certain tasks : (see http://wiki.apache.org/solr/WhyUseSolr) — loose queries are much easier to run and return meaningful results. Indexing is done as a matter of default, so most arbitrary queries run pretty effectively (unlike a RDBMS, where you often have to optimize and de-normalize after the fact).

Conclusion: Even though you CAN use SOLR as an RDBMS, you may find (as I have) that there is ultimately no free lunch – and the cost savings of super-cool lucene text-searches and high-performance, in-memory indexing, are often paid for by less flexibility and adoption of new data access workflows.

Its perfectly reasonable to use Solr as a database, depending on your application. In fact, thats pretty much what guardian.co.uk is doing.

Its definitely not bad practice per se. Its only bad if you use it the wrong way, just like any other tool at any level, even GOTOs.

When you say An XML representation… I assume youre talking about having multiple stored Solr fields and retrieving this using Solrs XML format, and not just one big XML-content field (which would be a terrible use of Solr). The fact that Solr uses XML as default response format is largely irrelevant, you can also use a binary protocol, so its quite comparable to traditional relational databases in that regard.

Ultimately, its up to your applications needs. Solr is primarily a text search engine, but can also act as a NoSQL database for many applications.

java – Using Solr search index as a database – is this wrong?

This was probably done for performance reasons, if it doesnt cause any problems I would leave it alone. There is a big grey area of what should be in a traditional database vs a solr index. Ive seem people do similar things to this (usually key value pairs or json instead of xml) for UI presentation and only get the real object from the database if needed for updates/deletes. But all reads just go to Solr.

Leave a Reply

Your email address will not be published. Required fields are marked *