Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Le centre de données de l’écosystème The Microsoft 365 étant en Europe ne pourrait être considéré comme une solutions applicative viable et acceptable que dans la mesure où le consentement des personnes concernées par la cadre légal local (loi 25 et 64 du Québec) devrait être donné. Cependant, la complexité du processus fait en sorte que les applications de gestion documentaire Microsoft 365 telles que SharePoint, etc. ne feront pas partie de l’architecture applicativeecosystem data center being in Europe could only be considered as a viable and acceptable application solution if the consent of those concerned by the local legal framework (Quebec law 25 and 64) had to be given. However, the complexity of the process means that Microsoft 365 document management applications such as SharePoint, etc. will not be part of the application architecture.

From milestone 1 to milestone 2

...


Choisir la technologie de transfert de données vers Azure ou à partir de Azure

  1. Criteria

  • Need to transfer large amounts of data with a time-consuming Internet connection: physical transfer (Azure Data Box).

  • Need to transfer large amounts of data with a moderate or fast Internet connection:

    • if moderate Internet speed: physical transfer (Azure Data Box).

    • if high-speed (from 1 Gbps): AzCopy, Azure Data Box (Virtual Version), Azure Data Factory, Azure Storage REST Apis (SDKs).

  • Need to orchestrate any quantity: Azure Data Factory.

  • Need to log whatever the quantity: Azure Data Factory, Azure Storage REST APIs.

  • Need to transfer small amounts of data with a slow to moderate Internet connection:

    • Without programming: GUI (Azure Storage Explorer Utility), Azure Portal, SFTP Client.

    • With programming: AzCopy/PowerShell, Azure CLI, Azure Storage REST APIs (Azure Functions, Applications).

  • Regular or continuous transfer requirements:

    • Regular interval: AzCopy, Azure Storage APIs.

    • Continuous interval: Azure Data Factory, Azure Data Box for online transfer

...

Technology

Limitations

SFTP Client

Authentication & Authorization

  • Microsoft Entra ID (Azure Active Directory Identity) not supported for SFTP endpoint.

  • ACLs (Access Control lists) not supported for SFTP endpoint.

  • Local user is the only form of identity management supported.
    Network

  • Network must open port 22

  • Static IP addresses are not supported.

  • Must use Microsoft Internet Routing as Internet routing is not supported.

  • 2-minute timeout for standby or inactive connections.
    Performance

  • Performance degradation over time, impact of network latency.
    Other

  • Constraint on use of account hierarchical namespace functionality (must be enabled, requires Azure Data Lake Gen2 capabilities).

  • Maximum file size is 100 GB.

  • To launch storage redundancy/replication, SFTP must be deactivated.

AzCopy Utility

  • Synchronous operation only, no asynchronous copying.

  • Potential timeout, dependency on on-premise infrastructure, dependency on our network.

  • Impact of log generation on performance.

  • Unexpected behavior: if file size reaches 200 GB, AzCopy splits the exported file.

  • No traceability, just a technical log.

Azure Data Factory

  • Resource limitations - but not really, as it's designed for large quantities of data.

  • If we develop pipelines and there is a lot of transformation that is done outside ADF's native activities, this will make it more difficult to estimate the cost of the operation at the end.

  • Long-term operations are more expensive than on-premise solutions.

Azure Function

  • Not suitable for long, resource-intensive runs.

  • Cold-start latency means you have to pay more for the plan.

  • Despite stateless architecture, if data persistence is required, then the architecture becomes more complex as data is fetched across services.

  • Limited debugging

Initial/subsequent document loading process

Option A : Azure Data Factory with Copy Data Tool (built-in service)

...

1- Configure the self-hosted integration runtime with the on-premise server containing the documents.

2- Azure Data Factory invokes its native copy task with the Copy Data Tool.

3 & 4- Azure Data Factory extracts and copies the documents into the Storage Account's Blob Storage.

5- From the Azure Storage Explorer application interface, visual validation is possible.

Option B : Azure Data Factory with Azure Files

...

1- Configure Azure Files with the network directory (on-premise). Any document deposited in the local network will automatically appear in the Azure Files container.

2- Azure Data Factory invokes its native Copy Data Tool task and extracts documents from the network directory.

3- Azure Data Factory with Copy Data Tool stores the documents in the Storage Account's Blob Storage.

4- From the Azure Storage Explorer application interface, visual validation is possible.


Option C : Azure Data Factory with Azure Function

...

1- Azure Data Factory invokes Azure Function (triggering).

2- Azure Function uses the self-hosted integration runtime with the on-premise server to communicate with the document server.

3- Azure Function copies the documents.

4- Azure Function stores the documents in the Storage Account's Blob Storage.

5- From the Azure Storage Explorer application interface, visual validation is possible.

Option D : AzCopy

...

1- PowerShell invokes directory loading with the azcopy.exe executable

2- AzCopy extracts documents from the target directory

3- AzCopy stores the documents in the Storage Account's Blob Storage.

4- From the Azure Storage Explorer application interface, visual validation is possible.

Option E : SFTP Transfer

...

1- Activate the SFTP protocol in the storage account.

2- Use an SFTP client (software) or a PowerShell script to extract the document (likely to be limited to files only, not directories).

The SFTP client allows simultaneous selection of files.

3- The SFTP client stores documents in the Blob Storage container directory.

4- From the Azure Storage Explorer application interface, visual validation will be possible.

At the Milestone 3 : Architecture - Search & Indexation

...

  1. Documents (Files, Videos, Images) have been stored in a Blob Storage : user identity access control has been applied (Security filters for trimming results - Azure AI Search | Microsoft Learn) - document-level security.

  2. Users (segmented by department) make a search in Azure App Service (PAAS Technology) - Web Interface.

  3. The query will be sent to Azure Search which is attached to Azure AI Search and Azure AI Video Indexer. Both of them are used to indexed the documents.

  4. Azure Search will permit the extraction of the information stored in the Blob Storage.

  5. Azure Search will return the response to the client (Azure App Service).