“Searching” is a classic problem that many web applications face.
Look at any e-commerce site — to find what you want from their vast catalog of products, you can run a search on related keywords. Or social websites like Facebook and Reddit — search serves as an entry point to find relevant content like users, threads, and so on.
At times, implementing search on a backend application can be a simple process. If you are using a relational database that supports Structured Query Language (SQL), plugging in the LIKE operator in a query could work in finding relevant data.
However, what if you need to order the results by relevance to the search terms? The LIKE operator may not help us here, but there are other features like full-text searching that aims to solve this problem. Still, the problem remains if we continue to stretch the question to searching across database instances. The conventional database is not fully equipped to solve this application problem, but we can look elsewhere for a solution.
In comes Elasticsearch, can it solve our problem?
Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.
From elastic.co, “What is Elasticseach?”
Ironically, I came to know of Elasticsearch when I was instructed to use the ELK stack (Elasticsearch, Logstash, and Kibana) as the logging solution by my past employer. It’s a powerful solution, able to retrieve search results from both structured and unstructured data.
In Elasticsearch, there are two basic things to know about it in relation to its search capabilities — “indices” and “documents”.
Indices in Elasticsearch, to an application developer like myself, would be the most similar to a “table” entity in a database. In a similar fashion, documents are similar to rows in a table. From this comparison, searching would be equivalent to fetching documents from an index. Since these documents can be unstructured, we can simply maintain documents with all the information that we want to query on an index, satisfying the search problem for a singular data source.
What if we want to search, or even sort, based on related data across multiple data sources? This would often occur in architectural patterns, such as Service-Oriented Architecture (SOA), which advocates for breaking down groups of functionalities into more fine-grained services (as compared to the coarse-grained monoliths).
In solving the search problem in an SOA, we could join related information from different datastores to one common document, and call the search APIs of Elasticsearch in retrieving the relevant document(s) in the intended sort order.
For example, we could have the user index on Elasticsearch for the user service, where each document on the user index contains only the fields required to search and sort for a user entity. Each user entity may belong to an organization, where the name and address are potential search terms. These organization fields can be added to the document for each user before we use them as fields for searching relevant documents.
Separately, as an Open Source Software (OSS), Elasticsearch is free to use. This provides additional benefits, such as wider support from the community that is already using it (and contributing to fixes along the way). Having the capability to host the service yourself can also be a big plus, especially if you are a small company or don’t have the budget to work with third-party tools.
On the other hand, as an OSS, Elasticsearch is also supported by different cloud providers. This is a plus point if one is concerned about achieving High Availability when using Elasticsearch. One such example is Google Cloud, where Elasticsearch can be used as a search-as-a-service solution.
As a developer, having it as an OSS is also a plus point towards my efforts in exploring it. I can easily host an instance of an Elasticsearch cluster locally for testing, and replicate containerized solutions that reliably work in the production environment.
In the last chapter, I talked about how Elasticsearch can be viewed in terms of its “indices” and “documents”. In some ways, I would view Elasticsearch as a datastore, and instead of using SQL queries as with SQL datastores, we can use their APIs for reading and writing information.
However, one huge difficulty with integrating with Elasticsearch was understanding how it works. Before explaining the above, I would like to say that the OSS for Elasticsearch is vastly documented, as seen here. While rich in functionality, when supporting our use case for a search solution across multiple services, having a simple to understand interface to communicate with Elasticsearch seems like a better alternative.
And what better to represent this search interface, than with easy-to-understand and widely used patterns — RESTful API endpoints.
We can achieve this by building a wrapper with these RESTful API endpoints, wrapping around Elasticsearch:
If you already using an SOA, with RESTful API endpoints, the above will be doubly beneficial for your organization. Usage of the API becomes simpler with the wrapper, and developers familiar with this pattern will be able to integrate your new Elasticsearch wrapper service with ease.
Elasticsearch API, an API wrapper
Putting together the above observations, I came up with Elasticsearch Wrapper API, serving as a wrapper to do the following:
- HTTP Endpoints to create and retrieve indices
- HTTP Endpoints to create and retrieve documents
- An HTTP endpoint to patch (partially update) documents whenever data is updated on the originating datastore.
- The endpoint to list documents must have a simple interface for searching and sorting
Retrieval and creation of resources — here as indices and documents — have the standard behavior of GET and POST. As long as the request body or parameters are valid, the right action will be undertaken on creating or retrieving the resource.
To support syncing of data on Elasticsearch, we also introduced the PATCH endpoint for documents. This allows any service in your architecture to partially update any entities, without overwriting other valid properties of the document.
For solving our problem to search for data, we have two endpoints.
The first endpoint is the standard GET endpoint for listing all documents, which was extended to respond to different query parameters. In our implementation, we use a list of query parameters, instead of the Elasticsearch’s original request body on the GET endpoint, or query parameter that uses the Lucene syntax. This allows for our GET endpoint to be compliant with caching on proxies and clients, and also makes it easy to integrate the search function without prior knowledge of Lucene query language.
The second endpoint is a custom POST endpoint, with the same search and sort capabilities. This is an alternative endpoint that accepts a JSON request body instead of the query values of the first endpoint.
The four categories of query fields above (in the JSON request body variant) are closely tuned to how search is used internally.
First, we want to implement criteria for filtering on fully matched keywords. This provides services the opportunity to search and sort on selected subsets of data.
Second, we want to enable services to be able to search or partially match on different properties of a document, with a search term as supplied by the end-user.
Third, we want to introduce multiple sort values. These are in decreasing order of priority, as needed by the caller service.
Using seeded data of 100 different documents, we can verify that the search endpoints with the different query values work as expected:
This service was written in Golang, where I am using the go-kit framework. Using this framework allows me to easily extend the above in the future to using other communication protocols, such as gRPC. This creates the potential for an easy-to-integrate service, especially if you have many different teams that use different communication protocols.
By creating an API wrapper on Elasticsearch, we can use it to solve search problems in a more consistent way. This is especially powerful with integrating into service clusters, such as that implemented under SOA.
My solution for the API wrapper on Elasticsearch can be found here, with the documentation found here. You can also check out the official Elasticsearch documentation here for more information on Elasticsearch.