In this post I just want to cover Azure Search from a different perspective and share some details that might help people getting starting in the interesting world of search. The main focus will be on pointing out the underlying technology, helping you to better understand and build search solutions.
The information below is most likely of interest for everyone using SQL Servers full text search indexes and looking for alternatives (In case you are using SQL Azure, using full text search indexes isn’t even an option). However the main reason for writing this post is to get my DBA friends up to speed.
I won’t be covering everything in great detail, but instead provide some helpful links to interesting articles or courses.
Azure Search – SQL Servers full text limitations
When it comes to search solutions within the Microsoft development eco-system, people have always blindly selected full text search indexes. Now I’m not telling that this is wrong however there have always been many shortcomings when it comes to features and manageability. Some issues have been addressed in most recent releases, but you will notice more and more shortcomings after playing with a service like Azure Search.
Opening the Azure Search Matryoshka
Now if you really want to dig into the search domain, it’s necessary to understand that Azure Search isn’t just Microsoft reinventing the wheel as we have seen often before. Microsoft has drastically changed over the last few years when it comes to adopting open source solutions. Especially within the web and cloud space. This is also something you will recognize within the architecture of Azure Search.
Just to keep things simple, Azure search is build on top of a different product known as Elasticsearch. Elasticsearch (http://www.elasticsearch.org/) is a NoSQL based full text search engine responsible for the server side plumbing (scalability) and providing a common HTTP interface, allowing you to interact with the indexing core known as Lucene (http://lucene.apache.org). Lucene is an indexing engine responsible for rapid indexing and search. It’s important to know that Lucene cannot just be used on its own because it’s not an end-to-end solution. You will need a container for hosting Lucene like Elasticsearch.
Elasticsearch isn’t the only option out there. There are other solutions available with similar options like Solr for example. Solr is also utilizing Lucene under the covers, just like Elasticsearch.
Interesting read: http://solr-vs-elasticsearch.com/
You might ask yourself; why do I need to know all of this?
Well, you don’t, if you just want to use Microsoft’s abstraction and able to host your data within Azure. However if want to design and maintain search solutions, it’s required to know what’s happening under the covers. Also keep in mind that Azure search isn’t exposing all options available within Elasticsearch and in addition the Azure Search team is putting their own icing on the cake.
If you want to get started in the world of search, I highly recommend reading up and learn about the basics of the items listed below;
NOTE: this was written with SQL server DBA’s / Developers in mind.
REST (Representational state transfer)
Interacting with Azure Search, Elasticsearch or Solr is performed via REST based API calls. There are tools, different API abstractions and management GUI’s available however not being familiar with REST will slow you down or limit your options.
Having knowledge regarding document based databases will help you understand some of the concepts better. You don’t have to be an expert, but on the other hand, you can’t ignore NoSQL databases any longer. Being a Relational only DBA is something from the past!
If you are a channel 9 fan as I am, it shouldn’t be very hard to find some interesting Series.
Elasticsearch (and Apache Solr)
It’s obvious that the more you know about Elasticsearch, the better you’ll be at understanding and working with Azure Search. This will allow you to identify the areas where Elasticsearch has been extended or Azure search isn’t exposing functionality. And in addition also enables you to choose a different Elasticsearch provider or host the solution on premise if required.
I would also recommend reading up on Solr. I’ve been working with Solr for quite some time and really like the community and the resources available on the web.
There are excellent introduction courses available on pluralsight.com
- Getting Started with Enterprise Search Using Apache Solr by Xavier Morera
- Getting Started With Elasticsearch for .NET Developers by JP Toto
Elasticsearch and Solr eco-system and community
This is a very important part for a couple of reasons; First of all, you will be able to find an answer to most of your rudimentary questions. This is also the place where you will find best practices and common mistakes.
However what’s even more valuable are the domain specific solutions like importers, parsers, crawlers and other extensions allowing you to add new functionality within Elasticsearch and or Solr. You do have to keep in mind that it’s most likely not possible to extend hosted search solutions like for example Azure Search, but in some cases this logic can be added as a pre-processing or post-processing step while feeding the index or submitting a search request.
I’ve recently blogged about how you could host a document parser like Tika within a separate process (more details here). And the same could be done for other process Apache Nutch etc.
Now we know a bit more about what’s under the covers of Azure Search. There is a lot more of course, but learning further about Elasticsearch will give you a deeper understanding about its limitations, upcoming features and even allow you to choose for a 100% Elasticsearch or Solr solution instead.