In this post, I want to cover Azure Search from a different perspective and share some details that might help people getting starting in the exciting world of search. The main focus will be on pointing out the underlying technology, helping you to understand better and build search solutions.
The information below is most likely of interest for everyone using SQL Servers full-text search indexes and looking for alternatives (In case you are using SQL Azure, using full-text search indexes isn’t even an option). However, the main reason for writing this post is to get my DBA friends up to speed.
I won’t be covering everything in great detail, but instead provide some helpful links to interesting articles or courses.
Azure Search – SQL Servers full-text limitations
When it comes to search solutions within the Microsoft development eco-system, people have always blindly selected full-text search indexes. I’m not telling that this is wrong; however, there have always been many shortcomings when it comes to features and manageability. Some issues have been addressed in most recent releases, but you will notice more and more gaps.
Opening the Azure Search Matryoshka
If you want to dig into the search domain, it’s necessary to understand that Azure Search isn’t just Microsoft reinventing the wheel, as we have often seen before. Microsoft has drastically changed over the last few years when it comes to adopting open source solutions. Especially within the web and cloud space. This is also something you will recognize within the architecture of Azure Search.
Just to keep things simple, Azure search is built on top of a different product known as Elasticsearch. Elasticsearch (http://www.elasticsearch.org/) is a NoSQL based full-text search engine responsible for the server-side plumbing (scalability) and providing a standard HTTP interface, allowing you to interact with the indexing core known as Lucene (http://lucene.apache.org). Lucene is an indexing engine responsible for rapid indexing and search. It’s important to remember that Lucene cannot just be used on its own because it’s not an end-to-end solution. You will need a container for hosting Lucene like Elasticsearch.
Elasticsearch isn’t the only option out there. There are other solutions available with similar options like Solr, for example. Solr is also utilizing Lucene under the covers, just like Elasticsearch.
Interesting read: http://solr-vs-elasticsearch.com/
You might ask yourself, why do I need to know all of this?
Well, you don’t, if you just want to use Microsoft’s abstraction and able to host your data within Azure. If you’re going to design and maintain search solutions, it’s required to know what’s happening under the covers. Also, keep in mind that Azure search isn’t exposing all options available within Elasticsearch, and besides, the Azure Search team is putting their icing on the cake.
If you want to get started in the world of search, I highly recommend reading up and learn about the basics of the items listed below;
NOTE: this was written with SQL server DBA’s / Developers in mind.
REST (Representational state transfer)
Interacting with Azure Search, Elasticsearch, or Solr is performed via REST-based API calls. There are tools, different API abstractions, and management GUI’s available; however, not being familiar with REST will slow you down or limit your options.
Knowing document-based databases will help you understand some of the concepts better. You don’t have to be an expert, but on the other hand, you can’t ignore NoSQL databases any longer. Being a Relational only DBA is something from the past!
If you are a channel nine fan as I am, it shouldn’t be tough to find some exciting Series.
Elasticsearch (and Apache Solr)
It’s evident that the more you know about Elasticsearch, the better you’ll be at understanding and working with Azure Search. This will allow you to identify the areas where Elasticsearch has been extended, or Azure search isn’t exposing functionality. And in addition also enables you to choose a different Elasticsearch provider or host the solution on-premise if required.
I would also recommend reading up on Solr. I’ve been working with Solr for quite some time and really like the community and the resources available on the web.
There are excellent introduction courses available on pluralsight.com
- Getting Started with Enterprise Search Using Apache Solr by Xavier Morera
- Getting Started With Elasticsearch for .NET Developers by JP Toto
Elasticsearch and Solr eco-system and community
This is a very important part for a couple of reasons; First of all, you will be able to find an answer to most of your rudimentary questions. This is also the place where you will find the best practices and common mistakes.
However, what’s even more valuable are the domain-specific solutions like importers, parsers, crawlers, and other extensions, allowing you to add new functionality within Elasticsearch and or Solr. You do have to keep in mind that it’s most likely not possible to extend hosted search solutions like, for example, Azure Search. Still, in some cases, this logic can be added as a pre-processing or post-processing step while feeding the index or submitting a search request.
I’ve recently blogged about how you could host a document parser like Tika within a separate process (more details here). And the same could be done for other methods Apache Nutch etc.
Now we know a bit more about what’s under the covers of Azure Search. There is a lot more, of course, but learning further about Elasticsearch will give you a deeper understanding of its limitations, upcoming features and even allow you to choose for a 100% Elasticsearch or Solr solution instead.