Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Step 1 : Understand the problem and establish the design scope

  • Understand the context and the constraints : ask questions to establish the type of constraints (technological, infra, etc.).

  • What specific features we are going to build ?

  • How many users does the product have ? How many users do we expect in the future ?

  • How fast the company anticipate to scale up ? 3 months, 6 months and 1 year ?

  • What is the company's technology stack ? What is the technological legacy ? What existing services we might leverage to simplify the design ?

Example of questions : What is the traffic volume ? What is the expected traffic volume after a time X ? etc.

Step 2 : Establish a high-level design 

  • Establish an initial blueprint for the design according the step 1 : for example, is the organization going hybrid in the first step ?

  • Draw the design with key components

    Image Modified
    • client layer (mobile/web, etc...), APIs, Web servers, data store, cache, CDN, message queue, etc.

  • Estimate the system capacity or performance requirements (See the page "Back-of-the-envelope calculation").

If possible, go through a few concrete use cases says Alex Xu !

Step 3 : Design deep dive

After the step 2 and be agreed on the overall goals.

  • Focus on deep dive based on the feedback

  • We can identify and prioritize components in the architecture; if possible, we could focus on the bottlenecks and resource estimations ("Back-of-the-envelope calculation").

Step 4 : Establish the pros and the cons

It's a final step that could be decisive : a wrap up.

  • Establish the pros and benefits

  • Establish the cons and considerations:

    • Is there any single point of failure in our system? What are we doing to mitigate it?

    • Do we have enough replicas of the data so that we can still serve our users if we lose a few servers?

    • Similarly, do we have enough copies of different services running such that a few failures will not cause a total system shutdown?

    • How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades?