The Microsoft 365 ecosystem data center being in Europe could only be considered as a viable and acceptable application solution if the consent of those concerned by the local legal framework (Quebec law 25 and 64) had to be given. However, the complexity of the process means that Microsoft 365 document management applications such as SharePoint, etc. will not be part of the application architecture.
From milestone 1 to milestone 2
Initial process for loading scanned documents into on-premises servers: the service that retrieves the documents will be able to run a dual loading process, i.e. one in the vault server and the other as a springboard for loading into an Azure storage service.
1- Scanning is performed from a secure workstation connected to the on-site network, and documents are automatically deposited in a network directory.
2- The service is triggered automatically or manually to retrieve the documents from the network directory.
3- The service stores the documents on the safe server in a secure network directory.
4- In parallel, the service stores the documents in the server that will serve as a springboard for the Azure storage service.
5- Processes 2, 3 and 4 will be logged on the same server on which the service is deployed.
Subsequent process for loading scanned documents into on-site vaults: the service retrieving the documents may execute a loading process, i.e. deposit the document in the vault server. Then, another service will react to this event by extracting the document and depositing it in the server dedicated to integration with the Azure storage service.
1- Scanning is performed from a secure workstation connected to the on-site network, and documents are automatically deposited in a network directory.
2- The service is triggered automatically or manually to retrieve the documents from the network directory.
3- The service stores the documents on the safe server in a secure network directory.
4- The service logs processing in its execution server.
5- The service deployed in the server bridging the Azure storage service, extracts the document stored in the vault and stores it in its runtime server.
6- Once the process is complete, the service logs the processing.
From milestone 2 to milestone 3 : Storage Architecture
The application architecture refers only to milestone 3 (Use of Azure Services) in which there are 2 components to consider:
the document/image component: document indexing with a document indexer (AI Search)
the video component: indexing videos with a video indexer (AI Video Indexer).
From these 2 components, a dual application architecture can be seen, because the AI technologies of the Azuze platform present 2 data mining technologies:
→ for video, it involves video knowledge mining;
→ for document and image, the aim is to perform knwoledge mining of documents and images.
Choisir la technologie de transfert de données vers Azure ou à partir de Azure
Criteria
Need to transfer large amounts of data with a time-consuming Internet connection: physical transfer (Azure Data Box).
Need to transfer large amounts of data with a moderate or fast Internet connection:
if moderate Internet speed: physical transfer (Azure Data Box).
if high-speed (from 1 Gbps): AzCopy, Azure Data Box (Virtual Version), Azure Data Factory, Azure Storage REST Apis (SDKs).
Need to orchestrate any quantity: Azure Data Factory.
Need to log whatever the quantity: Azure Data Factory, Azure Storage REST APIs.
Need to transfer small amounts of data with a slow to moderate Internet connection:
Without programming: GUI (Azure Storage Explorer Utility), Azure Portal, SFTP Client.
With programming: AzCopy/PowerShell, Azure CLI, Azure Storage REST APIs (Azure Functions, Applications).
Regular or continuous transfer requirements:
Regular interval: AzCopy, Azure Storage APIs.
Continuous interval: Azure Data Factory, Azure Data Box for online transfer
Limitations and disadvantages (helping to make decision)
Technology | Limitations |
---|---|
SFTP Client | Authentication & Authorization
|
AzCopy Utility |
|
Azure Data Factory |
|
Azure Function |
|
Initial/subsequent document loading process
Option A : Azure Data Factory with Copy Data Tool (built-in service)
1- Configure the self-hosted integration runtime with the on-premise server containing the documents.
2- Azure Data Factory invokes its native copy task with the Copy Data Tool.
3 & 4- Azure Data Factory extracts and copies the documents into the Storage Account's Blob Storage.
5- From the Azure Storage Explorer application interface, visual validation is possible.
Option B : Azure Data Factory with Azure Files
1- Configure Azure Files with the network directory (on-premise). Any document deposited in the local network will automatically appear in the Azure Files container.
2- Azure Data Factory invokes its native Copy Data Tool task and extracts documents from the network directory.
3- Azure Data Factory with Copy Data Tool stores the documents in the Storage Account's Blob Storage.
4- From the Azure Storage Explorer application interface, visual validation is possible.