How to design a search autocomplete system

The Framework proposed in this space (Alex Xu) is applied to propose a design : Getting started - a framework to propose...

Introduction

Autocomplete or word completion is a feature in which an application predicts the rest of a word a user is typing. In graphical user interfaces, users can typically press the tab key to accept a suggestion or the down arrow key to accept one of several. Autocomplete is involved in many search features on various platforms like Facebook, Instagram, Google search, etc.

Step 1 - Understand the problem and establish the scope

Matching at the beginning or the middle ?	both.
How many suggestion should be returned ?	5
How system knows which 5 suggestions to return ?	Popularity, decided by historical query frequency
System supports spell check ?	NO.
Search queries in English ? Multi-language ?	Yes, for now.
Allow capitalization and special characters ?	NO.
How many users ?	10 million DAU (daily active user).

Requirements

Fast response time : within 100 ms. So, low latency.
Relevant to the search term whatever prefix user has entered.
Sorted : By popularity or other ranking models.
Scalable : High traffic volume.
Highly available : remain available and accessible when system is offline, slows down or experiences network errors.

Back-of-the-envelope calculations

Traffic estimations

Assume 10 million DAU
Average person performs 10 searches per day.
20 bytes of data per query string : assume 4 words and each word contains 5 characters on average.
- 4 words * 5 characters = 20 bytes per query.
For 1 character entered in search box, a request is sent to the backend : average 20 requests are sent for 1 search query.
10 million * 10 queries / day * 20 characters / 24 hours / 2600 sec = approx. 24 000 query per sec (QPS).
Peak QPS = QPS * 2 = approx. 48 000 QPS.

Storage estimations

Assume 20% of daily queries are new. 10 million * 10 queries per day * 20 bytes per query * 20% = 0.4 GB → new data added to storage daily.
0.4 GB *365 = 146 GB per year.

On this page.

Step 2 - High-level design

According to our requirements, our autocomplete must be real-time. i.e. new search queries must be added to our database. Hence not only we have to design a system to give suggestions, but we also have to incorporate popular search queries to our database such that users can also get suggestions based on recent as well as popular searches.

Hence our autocomplete consists of two parts:

Giving Suggestions or Query Service
Adding new trending queries to the database or Data Gathering service

Data gathering service

It gathers user input queries and aggregates them in real-time. We have a table that stores the query string and its frequency.

Query service

It stores the query string and frequency in a table (number of times a query has been searched). When a user type “tw” in search box, the top 5 queries are displayed based on the frequency (see the SQL query - References & Glossary for search autocomplete system”).

Step 3 - Design deep dive

Query Service & Scaling the storage

Query Service calls the DB directly to fetch the top 5 results.

To optimize :

AJAX request to fetch autocomplete results; benefit is to not refresh the whole web page.
Browser caching (cache-control private) : most of the time autocomplete search suggestions may not change much within a short.
Data sampling : for large-scale system, it requires a lot of power and storage → SCALE the storage.

Scale the storage with sharding DB

Trie data structure to support the calls

Find the prefix node. Time complexity : O(p)
Traverse subtree from prefix node to get all children. Time complexity : O(c)
Sort children and get top k. Time complexity : O(clogic)

Sum of the algo = O(p) + O(c) + O(clogic)

Trie Algo is too slow because we need to traverse the entire trie to get TOP k results. So, there are 2 potential optimizations :

Limit the max length of a prefix : so, the time complexity for “Find the prefix” can be reduced from O(p) to O(small constant), aka O(1).
Cache top search queries at each node : only the TOP k is cached. However, the design requires a lot of space to store top queries at every node.

Find the prefix node. Time complexity: O(1)
Return top k queries (cached). Time complexity is O(1)

Data Gathering Service & Trie operations

Step 4 - Wrap up

How to support non-english queries → we store Unicode characters in trie nodes (https://docs.microsoft.com/en-us/dotnet/standard/base-types/character-encoding ).
What if top search queries in 1 country is different from others → we store tries in CDNs to improve response time.
How to support real-time search queries →
- Reduce the working data set by sharding;
- Change the ranking model and assign more weight to recent search queries;
- Streaming data making data generated continuously (ex. Apache Spark Streaming, Apache Kafka,…).