Azure Search Indexers – Indexing collections

Azure Search Indexers

I previously wrote a blog post called Index data without writing code and recently received some interesting comments / questions in regards to using the Collection(Edm.String) collection datatype when having a SQL Server based data source. And I therefore decided to compile the information within a separate blog post. (Special thanks to the Azure search team for answering some of my queries!)

Populating data into a collection datatype isn’t really hard to realize however, the structure of the Azure Search documentation is somewhat odd from time to time. Therefore it was a bit of a struggle to get all pieces together. This is also why I would like to briefly cover some of the index and indexer characteristics.

Azure Search Indexers – Limitations

When it comes to Azure Search Indexers, you will have to keep in mind that the indexers are there to simplify the implementation process. The drawback is that you will lose some control on how and where you can place code to extend the solution (clean and massage data), as well as control over how to update the index.

Azure Search Indexers – Collection(Edm.String) datatype and restrictions.

Before we start working on a solution, it’s important to briefly look at the characteristics of the Collection(Edm.String) datatype. Unfortunately opening the “Supported data types (Azure Search)” https://msdn.microsoft.com/en-us/library/azure/dn798938.aspx – only contains the following statement:

Collection(Edm.String): A list of strings that can optionally be tokenized for full-text search.

Tokenized: This is the process of splitting a value into individual tokens. For example you set a searchable field to a value like “apple pie”, the index will contain individual tokens “apple” and “pie”

The page called “Create Index (Azure Search Service REST API)” https://msdn.microsoft.com/en-us/library/azure/dn798941.aspx – contains more helpful information listing the index restrictions for each datatype. I’ve summed up the relevant information below;

  • searchable: Fields of type Edm.String or Collection(Edm.String) are searchable by default.
  • filterable: Fields of type Collection(Edm.String)that are filterable do not undergo word-breaking, so comparisons are for exact matches only.
  • sortable: Fields of type Collection(Edm.String) cannot be sortable.

At this point we are ready to build a small sample project and get a Collection(Edm.String) collection field populated from a SQL server data source.

Indexing collection sample setup

Step 1 – Database setup

In order to get a simple sample running, we will require some data to work with. I’m using the AdventureWorksLT [V12] Azure SQL sample database. In addition we will also require a view or table with a flattened version of the collection data.

id name categories
804 HL Fork [ ‘Road Bikes’ , ‘Bikes’ ]
805 Men’s Bib-Shorts, M [ ‘Bib-Shorts’ , ‘Clothing’ ]
806 LL Road Pedal [ ‘Pedals’ , ‘Components’ ]

I’ve created a view and within this view I’m using a function which is responsible for resolving the product category hierarchy and returning a flattened version of the results. This is just a small sample and therefore performing all operations on the fly. Depending on your scenario, you might want to store the result separate column and or table.

As said, this is just a sample. Do whether you prefer when it comes to generating and populating a field with a JSON based collection. SQL server 2016 includes JSON support, so this might be a bit more efficient in the future.

SQL Server Function

SQL Server View

Testing the view

2 – Index creation

At this point it’s time to create the Azure Search index by using the Management Portal or the REST API. Just keep the collection indexing constraints in mind as mentioned before. My setup can be found within the JSON definition below:

3 – Create The Datasource and Indexer

Now it’s time to create the Datasource and the indexer. For this you will need to utilize the REST API. I’ve covered this within my previous blog on Azure search indexers which can be found here. However there are some important changes when it comes to creating the indexer.

Here you will need to explicitly tell the indexer that you are populating a string collection field with a JSON Array. This is done by specifying a mapping function called jsonArrayToStringCollection as shown in the example below.

4 – Testing the index.

Before being able to search for data it’s required to start the indexer.

Within Postman and make sure to alter the following:

  • Change the type to POST
  • Provide the URL pointing to your Search Service endpoint including the api-version and name of the indexer as shown below
    https://[servicename].search.windows.net/indexers/indexer1/run?api-version=2015-02-28-Preview
  • Within the headers section add api-key : [your admin key]

The following URL template can be used to test the index. Just keep in mind that it might take some time before the data has been indexed, depending on the amount of data and the selected tier.

https://[servicename].search.windows.net/indexes/[indexname]/docs?search=Cleaners&api-version=2015-02-28-Preview

As you can see, working with SQL server based collections isn’t that hard if you know whats involved.

NOTE: If you run into the flowing error, make sure to verify the mapping and JSON data
“errorMessage”: “The data field ‘x’ has an invalid value. The expected type was ‘Collection(Edm.String)’.”

Post Navigation