Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Tip

The Framework proposed in this space (Alex Xu) is applied to propose a design : Getting started - a framework to propose...

Introduction

A web crawler is known as a robot or spider. Usually used by search engines to discover new or updated content on the web (Alex Xu) : web page, video, PDF file, etc.

Purposes :

  • Search engine indexing : create local index for search engines.

  • Web archiving : collect info to preserve data for future uses.

  • Web mining : data mining or useful knowledge from internet.

  • Web monitoring : monitor copyright and trademark infringements over internet.

On this page.

Table of Contents

STEP 1 - Understand the problem and establish the scope

STEP 2 - High-level design

STEP 3 - Design deep dive

STEP 4 - Pros & Cons