Step 1 : Understand the problem and establish the design scope
Understand the context and the constraints : ask questions to establish the type of constraints (technological, infra, etc.).
What specific features we are going to build ?
How many users does the product have ? How many users do we expect in the future ?
How fast the company anticipate to scale up ? 3 months, 6 months and 1 year ?
What is the company's technology stack ? What is the technological legacy ? What existing services we might leverage to simplify the design ?
Example of questions : What is the traffic volume ? What is the expected traffic volume after a time X ? etc.
Step 2 : Establish a high-level design
Establish an initial blueprint for the design according the step 1 : for example, is the organization going hybrid in the first step ?
Draw the design with key components :
client layer (mobile/web, etc...), APIs, Web servers, data store, cache, CDN, message queue, etc.
Estimate the system capacity or performance requirements (See the page "Back-of-the-envelope calculation").
If possible, go through a few concrete use cases says Alex Xu !
Step 3 : Design deep dive
After the step 2 and be agreed on the overall goals.
Focus on deep dive based on the feedback
We can identify and prioritize components in the architecture; if possible, we could focus on the bottlenecks and resource estimations ("Back-of-the-envelope calculation").
Step 4 : Establish the pros and the cons
It's a final step that could be decisive : a wrap up.
Establish the pros and benefits
Establish the cons and considerations:
Is there any single point of failure in our system? What are we doing to mitigate it?
Do we have enough replicas of the data so that we can still serve our users if we lose a few servers?
Similarly, do we have enough copies of different services running such that a few failures will not cause a total system shutdown?
How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades?