Files Service
This microservice allows you to upload and download files to a third-party service. The following file bucket services are currently supported:
- Google Cloud Storage
- MongoDB GridFS
- Amazon S3 and third-parties vendors compliant with its specifications, such as Oracle Object Storage.
- Azure Blob Storage
In addition, after each upload it saves the file's information using the CRUD Service on a configurable mongoDB collection (usually files).
CRUD collection
The CRUD collection can be named as you want, but must contain the following fields:
- name (type: String): original file name.
- file (type: String): unique name of the file that should be used to retrieve it using this service.
- size (type: Number): size in bytes of the uploaded file.
- location (type: String): the URL that can be used to download the file using the same service that performed the upload.
These fields will be automatically filled during the upload of files.
Environment variables
- CONFIG_FILE_PATH (required): the path of the configuration file to configure connection with the online bucket for the supported services.
- CRUD_URL (required): the CRUD url, comprehensive of the files collection name chosen during the CRUD collection creation (e.g. http://crud-service/files/ where files is the CRUD collection name).
- PROJECT_HOSTNAME: the hostname that will be saved in the database as the root of the file location. Incompatible with PATH_PREFIX.
- PATH_PREFIX: Use a relative path as file location prefix. Incompatible with PROJECT_HOSTNAME.
- SERVICE_PREFIX: the prefix used for the path of the service endpoints.
- HEADERS_TO_PROXY: comma separated list of the headers to proxy (the Mia-Platform headers).
- FILE_TYPE_INCLUDE_LIST (from
v2.3.0
): comma separated list of file extensions (without the dot) to be accepted for upload. If you do not set the variable, the service will accept all uploaded file types. - ADDITIONAL_MIME_TYPES: comma separated list of key:value pairs where each item is an additional extension mime-type relationship to add to mime-type db. This allow the upload of file when mime-type is correctly recognized but there is no mime-type linked to file extension in
mime-db
used bymime-types
library. This happens for example with DICOM files. - TRUSTED_PROXIES: the string containing the trusted proxies values.
- ADDITIONAL_FUNCTION_CASTER_FILE_PATH: the path of the file that exports the function to cast.
- GOOGLE_APPLICATION_CREDENTIALS: the path to access to the google storage credentials. This is required for GoogleStorage type.
Either one of PATH_PREFIX and PROJECT_HOSTNAME is required.
Configuration file
Files Service can be used in multi-bucket mode (from version v2.7.0
), meaning that you can configure it to manage different buckets. These have to be related to the same technology provider (e.g., only MongoDB).
In all of the multi-bucket configurations a scope
attribute is needed. This differentiates each bucket instance and is used to separate the different domains of data that the service is to manage. For this reason, its values have to be unique through all of the configuration. In this way, you can use the same instance of the Files Service to manage different types of data.
In fact, a separate CRUD collection is used with the name assigned to the scope
attribute. Not only, this value will become the root for the API path of the different buckets.
Below you can find examples related to each different bucket service supported.
- MongoDb (single-bucket)
- MongoDb (multiple-bucket)
- Amazon S3 (single-bucket)
- Amazon S3 (multiple-bucket)
- Google Storage (single-bucket)
- Google Storage (multiple-bucket)
- Azure Storage (single-bucket)
- Azure Storage (multi-bucket)
MongoDB GridFS configuration file (single-bucket option)
You need to specify the database URL and the name of the GridFS buckets where the files will be stored. If the bucket doesn't exist, the files service will create it as soon as it is needed.
{
"type": "mongodb",
"options": {
"url": "url-to-mongo",
"bucketName": "my-bucket"
}
}
MongoDB GridFS configuration file (multiple-bucket option)
You need to specify the array of the different database URLs and the names of the GridFS buckets where the files will be stored. If a bucket doesn't exist, the files service will create it as soon as it is needed.
[
{
"type": "mongodb",
"options": {
"url": "url-to-mongo-1",
"bucketName": "my-bucket-1",
"scope": "scope-1"
}
},
{
"type": "mongodb",
"options": {
"url": "url-to-mongo-2",
"bucketName": "my-bucket-2",
"scope": "scope-2"
}
}
]
S3 configuration file (single-bucket option)
This configuration allows to store files on any S3-compatible object storage.
Example: Amazon S3 : To use Amazon S3 you should configure the files-service as follows:
{
"type": "s3",
"options": {
"key": "<asw-s3-key>",
"secret": "<aws-s3-secret>",
"bucketName": "<aws-bucket-name>",
"region": "<aws-bucket-region>",
}
}
Example: Oracle Object Storage S3 Compatible: Follow the documentation to obtain a pair of customer access and secret keys.
{
"type": "s3",
"options": {
"key": "<customer-secret-access-key>",
"secret": "<customer-secret>",
"bucketName": "<name-of-the-bucket>",
"region": "<oracle-region>",
"endpoint": "<bucket-name-space>.compat.objectstorage.<oracle-region>.oraclecloud.com",
"s3ForcePathStyle": true,
"signatureVersion": "v4"
}
}
S3 configuration file (multiple-bucket option)
This configuration allows to store files on any S3-compatible object storage.
Example: Amazon S3 : To use Amazon S3 you should configure the files-service as follows:
[
{
"type": "s3",
"options": {
"key": "<asw-s3-key>",
"secret": "<aws-s3-secret>",
"bucketName": "<aws-bucket-name>",
"region": "<aws-bucket-region>",
"scope": "scope-1"
}
},
{
"type": "s3",
"options": {
"key": "<asw-s3-key>",
"secret": "<aws-s3-secret>",
"bucketName": "<aws-bucket-name>",
"region": "<aws-bucket-region>",
"scope": "scope-1"
}
}
]
Example: Oracle Object Storage S3 Compatible: Follow the documentation to obtain a pair of customer access and secret keys.
[
{
"type": "s3",
"options": {
"key": "<customer-secret-access-key>",
"secret": "<customer-secret>",
"bucketName": "<name-of-the-bucket>",
"region": "<oracle-region>",
"endpoint": "<bucket-name-space>.compat.objectstorage.<oracle-region>.oraclecloud.com",
"s3ForcePathStyle": true,
"signatureVersion": "v4",
"scope": "scope-1"
}
},
{
"type": "s3",
"options": {
"key": "<customer-secret-access-key>",
"secret": "<customer-secret>",
"bucketName": "<name-of-the-bucket>",
"region": "<oracle-region>",
"endpoint": "<bucket-name-space>.compat.objectstorage.<oracle-region>.oraclecloud.com",
"s3ForcePathStyle": true,
"signatureVersion": "v4",
"scope": "scope-2"
}
}
]
Google Storage configuration file (single-bucket option)
{
"type": "googleStorage",
"options": {
"bucketName": "my-bucket"
}
}
For this configuration, should be add GOOGLE_APPLICATION_CREDENTIALS
env variable and the credential file. To obtain the configuration file, follow this guide. Once obtained it, you should not commit private_key_id
and private_key
.
The private-key
is a certificate with newline code (\n
). In order to interpolate with in deploy stage of gitlab ci, it should be saved replacing \n
with \\n
.
Google Storage configuration file (multiple-bucket option)
[
{
"type": "googleStorage",
"options": {
"bucketName": "my-bucket",
"scope": "scope-1"
}
},
{
"type": "googleStorage",
"options": {
"bucketName": "my-bucket",
"scope": "scope-2"
}
}
]
Azure Storage configuration file
You need to specify the Azure Storage account, its key and the name of the container (bucket) where the files will be stored. If the container doesn't exist, the files service will create it at startup.
{
"type": "azureStorage",
"options": {
"account": "azure-account",
"accountKey": "azure-account-key",
"containerName": "my-container",
}
}
Azure Storage configuration file
You need to specify the Azure Storage account, its key and the name of the container (bucket) where the files will be stored. If the container doesn't exist, the files service will create it at startup.
[
{
"type": "azureStorage",
"options": {
"account": "azure-account",
"accountKey": "azure-account-key",
"containerName": "my-container",
"scope": "scope-1"
}
},
{
"type": "azureStorage",
"options": {
"account": "azure-account",
"accountKey": "azure-account-key",
"containerName": "my-container",
"scope": "scope-2"
}
},
]
Cache configuration
If the used bucket does not provide any caching mechanism, the Files Service can provide it. To make use of it you can
add the cache
property to the configuration file. If you set the cacheControlMaxAge
property then a cache-control
header will be set in the response of each file with the max-age
attributed defined with the provided number of seconds.
Caster file
An example for a custom caster file. This file add (if present in the post parameters) the tags, authorId and ownerId params to CRUD collection.
'use strict'
module.exports = function caster(doc) {
return {
tags: (doc.tags || '').split(','),
authorId: doc.authorId || undefined,
ownerId: doc.ownerId || undefined,
}
}
module.exports.additionalPropertiesValidator = {
tags: { type: 'string' },
authorId: { type: 'string' },
ownerId: { type: 'string' },
}
Passing from a single-bucket configuration to a multi-bucket
It's possible to configure in a multi-bucket mode a Files Service that was already in a single-bucket configuration. As the multi-bucket mode assures retrocompatibility, there is no need to build up a new Files Service instance.
Single-bucket configuration
Let's say we have a single-bucket Files Service already configured. Its link to the Crud Service is specified in its environment variable:
CRUD_URL=http://localhost:3001/animals/
As we can see, the Crud Service is managing the animals
collection, where metadata about files saved on the bucket are saved.
Being in a single-bucket configuration, there is only one bucket configured (a mongodb type in this case):
{
"type": "mongodb",
"options": {
"url": "mongodb://localhost:27017/animals",
"bucketName": "filesbucket"
},
"cache": {
"cacheControlMaxAge": 3000
}
}
To manage another domain of files, let's say files related to fruits
in a way. we have to configure another Files Service, having the Crud Service pointing to:
CRUD_URL=http://localhost:3001/fruits/
and the config for another bucket:
{
"type": "mongodb",
"options": {
"url": "mongodb://localhost:27017/fruits",
"bucketName": "filesbucket"
},
"cache": {
"cacheControlMaxAge": 3000
}
}
Multi-bucket configuration
We want now to adopt a multi-bucket configuration. In this case, we can use just one instance of Files Service simply changing the pointer to the Crud Service and updating the config file for the bucket configuration.
The Crud environment variable will become:
CRUD_URL=http://localhost:3001/
and the configuration for the bucket will now be an array of configurations:
[
{
"type": "mongodb",
"options": {
"url": "mongodb://localhost:27017/animals",
"bucketName": "filesbucket",
"scope": "animals"
},
"cache": {
"cacheControlMaxAge": 3000
}
},
{
"type": "mongodb",
"options": {
"url": "mongodb://localhost:27017/fruits",
"bucketName": "filesbucket",
"scope": "fruits"
},
"cache": {
"cacheControlMaxAge": 3000
}
},
]
As we can see, we simply got rid of the name of the specific collection in the environment variable for the Crud Service.
In fact, it will be the Files Service itself to build the proper url to the Crud Service using the scope
parameter.