Version: 14.x

Files Service

This microservice allows you to upload and download files to a third-party service. The following file bucket services are currently supported:

Google Cloud Storage
MongoDB GridFS
Amazon S3 and third-parties vendors compliant with its specifications, such as Oracle Object Storage.
Azure Blob Storage

In addition, after each upload it saves the file's information using the CRUD Service on a configurable mongoDB collection (usually files).

CRUD collection

The CRUD collection can be named as you want, but must contain the following fields:

name (type: String): original file name.
file (type: String): unique name of the file that should be used to retrieve it using this service.
size (type: Number): size in bytes of the uploaded file.
location (type: String): the URL that can be used to download the file using the same service that performed the upload.

These fields will be automatically filled during the upload of files.

Environment variables

CONFIG_FILE_PATH (required): the path of the configuration file to configure connection with the online bucket for the supported services.
CRUD_URL (required): the CRUD url, comprehensive of the files collection name chosen during the CRUD collection creation (e.g. http://crud-service/files/ where files is the CRUD collection name).
PROJECT_HOSTNAME: the hostname that will be saved in the database as the root of the file location. Incompatible with PATH_PREFIX.
PATH_PREFIX: Use a relative path as file location prefix. Incompatible with PROJECT_HOSTNAME.
SERVICE_PREFIX: the prefix used for the path of the service endpoints.
HEADERS_TO_PROXY: comma separated list of the headers to proxy (the Mia-Platform headers).
FILE_TYPE_INCLUDE_LIST (from v2.3.0): comma separated list of file extensions (without the dot) to be accepted for upload. If you do not set the variable, the service will accept all uploaded file types.
ADDITIONAL_MIME_TYPES: comma separated list of key:value pairs where each item is an additional extension mime-type relationship to add to mime-type db. This allow the upload of file when mime-type is correctly recognized but there is no mime-type linked to file extension in mime-db used by mime-types library. This happens for example with DICOM files.
TRUSTED_PROXIES: the string containing the trusted proxies values.
ADDITIONAL_FUNCTION_CASTER_FILE_PATH: the path of the file that exports the function to cast.
GOOGLE_APPLICATION_CREDENTIALS: the path to access to the google storage credentials. This is required for GoogleStorage type.

Either one of PATH_PREFIX and PROJECT_HOSTNAME is required.

Configuration file

Files Service can be used in multi-bucket mode (from version v2.7.0), meaning that you can configure it to manage different buckets. These have to be related to the same technology provider (e.g., only MongoDB).

In all of the multi-bucket configurations a scope attribute is needed. This differentiates each bucket instance and is used to separate the different domains of data that the service is to manage. For this reason, its values have to be unique through all of the configuration. In this way, you can use the same instance of the Files Service to manage different types of data. In fact, a separate CRUD collection is used with the name assigned to the scope attribute. Not only, this value will become the root for the API path of the different buckets.

Below you can find examples related to each different bucket service supported.

MongoDB GridFS configuration file (single-bucket option)

You need to specify the database URL and the name of the GridFS buckets where the files will be stored. If the bucket doesn't exist, the files service will create it as soon as it is needed.

{
  "type": "mongodb",
  "options": {
    "url": "url-to-mongo",
    "bucketName": "my-bucket"
  }
}

MongoDB GridFS configuration file (multiple-bucket option)

You need to specify the array of the different database URLs and the names of the GridFS buckets where the files will be stored. If a bucket doesn't exist, the files service will create it as soon as it is needed.

[
  {
    "type": "mongodb",
    "options": {
      "url": "url-to-mongo-1",
      "bucketName": "my-bucket-1",
      "scope": "scope-1"
    }
  },
  {
    "type": "mongodb",
    "options": {
      "url": "url-to-mongo-2",
      "bucketName": "my-bucket-2",
      "scope": "scope-2"
    }
  }
]

S3 configuration file (single-bucket option)

This configuration allows to store files on any S3-compatible object storage.

Example: Amazon S3 : To use Amazon S3 you should configure the files-service as follows:

{
  "type": "s3",
  "options": {
    "key": "<asw-s3-key>",
    "secret": "<aws-s3-secret>",
    "bucketName": "<aws-bucket-name>",
    "region": "<aws-bucket-region>",
  }
}

Example: Oracle Object Storage S3 Compatible: Follow the documentation to obtain a pair of customer access and secret keys.

{
  "type": "s3",
  "options": {
    "key": "<customer-secret-access-key>",
    "secret": "<customer-secret>",
    "bucketName": "<name-of-the-bucket>",
    "region": "<oracle-region>",
    "endpoint": "<bucket-name-space>.compat.objectstorage.<oracle-region>.oraclecloud.com",
    "s3ForcePathStyle": true,
    "signatureVersion": "v4"
  }
}

S3 configuration file (multiple-bucket option)

This configuration allows to store files on any S3-compatible object storage.

Example: Amazon S3 : To use Amazon S3 you should configure the files-service as follows:

[
  {
    "type": "s3",
    "options": {
      "key": "<asw-s3-key>",
      "secret": "<aws-s3-secret>",
      "bucketName": "<aws-bucket-name>",
      "region": "<aws-bucket-region>",
        "scope": "scope-1"
    }
  },
  {
    "type": "s3",
    "options": {
      "key": "<asw-s3-key>",
      "secret": "<aws-s3-secret>",
      "bucketName": "<aws-bucket-name>",
      "region": "<aws-bucket-region>",
        "scope": "scope-1"
    }
  }
]

Example: Oracle Object Storage S3 Compatible: Follow the documentation to obtain a pair of customer access and secret keys.

[
  {
    "type": "s3",
    "options": {
      "key": "<customer-secret-access-key>",
      "secret": "<customer-secret>",
      "bucketName": "<name-of-the-bucket>",
      "region": "<oracle-region>",
      "endpoint": "<bucket-name-space>.compat.objectstorage.<oracle-region>.oraclecloud.com",
      "s3ForcePathStyle": true,
      "signatureVersion": "v4",
      "scope": "scope-1"
    }
  },
  {
    "type": "s3",
    "options": {
      "key": "<customer-secret-access-key>",
      "secret": "<customer-secret>",
      "bucketName": "<name-of-the-bucket>",
      "region": "<oracle-region>",
      "endpoint": "<bucket-name-space>.compat.objectstorage.<oracle-region>.oraclecloud.com",
      "s3ForcePathStyle": true,
      "signatureVersion": "v4",
      "scope": "scope-2"
    }
  }
]

Google Storage configuration file (single-bucket option)

{
  "type": "googleStorage",
  "options": {
    "bucketName": "my-bucket"
  }
}

For this configuration, should be add GOOGLE_APPLICATION_CREDENTIALS env variable and the credential file. To obtain the configuration file, follow this guide. Once obtained it, you should not commit private_key_id and private_key. The private-key is a certificate with newline code (\n). In order to interpolate with in deploy stage of gitlab ci, it should be saved replacing \n with \\n.

Google Storage configuration file (multiple-bucket option)

[
  {
    "type": "googleStorage",
    "options": {
      "bucketName": "my-bucket",
      "scope": "scope-1"
    }
  },
  {
    "type": "googleStorage",
    "options": {
      "bucketName": "my-bucket",
      "scope": "scope-2"
    }
  }
]

Azure Storage configuration file

You need to specify the Azure Storage account, its key and the name of the container (bucket) where the files will be stored. If the container doesn't exist, the files service will create it at startup.

{
  "type": "azureStorage",
  "options": {
    "account": "azure-account",
    "accountKey": "azure-account-key",
    "containerName": "my-container",
  }
}

Azure Storage configuration file

[
  {
    "type": "azureStorage",
    "options": {
      "account": "azure-account",
      "accountKey": "azure-account-key",
      "containerName": "my-container",
      "scope": "scope-1"
    }
  },
  {
    "type": "azureStorage",
    "options": {
      "account": "azure-account",
      "accountKey": "azure-account-key",
      "containerName": "my-container",
      "scope": "scope-2"
    }
  },
]

Cache configuration

If the used bucket does not provide any caching mechanism, the Files Service can provide it. To make use of it you can add the cache property to the configuration file. If you set the cacheControlMaxAge property then a cache-control header will be set in the response of each file with the max-age attributed defined with the provided number of seconds.

Caster file

An example for a custom caster file. This file add (if present in the post parameters) the tags, authorId and ownerId params to CRUD collection.

'use strict'

module.exports = function caster(doc) {
  return {
    tags: (doc.tags || '').split(','),
    authorId: doc.authorId || undefined,
    ownerId: doc.ownerId || undefined,
  }
}

module.exports.additionalPropertiesValidator = {
  tags: { type: 'string' },
  authorId: { type: 'string' },
  ownerId: { type: 'string' },
}

Passing from a single-bucket configuration to a multi-bucket

It's possible to configure in a multi-bucket mode a Files Service that was already in a single-bucket configuration. As the multi-bucket mode assures retrocompatibility, there is no need to build up a new Files Service instance.

Single-bucket configuration

Let's say we have a single-bucket Files Service already configured. Its link to the Crud Service is specified in its environment variable:

CRUD_URL=http://localhost:3001/animals/

As we can see, the Crud Service is managing the animals collection, where metadata about files saved on the bucket are saved.

Being in a single-bucket configuration, there is only one bucket configured (a mongodb type in this case):

{
  "type": "mongodb",
  "options": {
    "url": "mongodb://localhost:27017/animals",
    "bucketName": "filesbucket"
  },
  "cache": {
    "cacheControlMaxAge": 3000
  }
}

To manage another domain of files, let's say files related to fruits in a way. we have to configure another Files Service, having the Crud Service pointing to:

CRUD_URL=http://localhost:3001/fruits/

and the config for another bucket:

{
  "type": "mongodb",
  "options": {
    "url": "mongodb://localhost:27017/fruits",
    "bucketName": "filesbucket"
  },
  "cache": {
    "cacheControlMaxAge": 3000
  }
}

Multi-bucket configuration

We want now to adopt a multi-bucket configuration. In this case, we can use just one instance of Files Service simply changing the pointer to the Crud Service and updating the config file for the bucket configuration.

The Crud environment variable will become:

CRUD_URL=http://localhost:3001/

and the configuration for the bucket will now be an array of configurations:

[
  {
    "type": "mongodb",
    "options": {
      "url": "mongodb://localhost:27017/animals",
      "bucketName": "filesbucket",
      "scope": "animals"
    },
    "cache": {
      "cacheControlMaxAge": 3000
    }
  },
  {
    "type": "mongodb",
    "options": {
      "url": "mongodb://localhost:27017/fruits",
      "bucketName": "filesbucket",
      "scope": "fruits"
    },
    "cache": {
      "cacheControlMaxAge": 3000
    }
  },
]

As we can see, we simply got rid of the name of the specific collection in the environment variable for the Crud Service. In fact, it will be the Files Service itself to build the proper url to the Crud Service using the scope parameter.

CRUD collection​

Environment variables​

Configuration file​

MongoDB GridFS configuration file (single-bucket option)​

MongoDB GridFS configuration file (multiple-bucket option)​

S3 configuration file (single-bucket option)​

S3 configuration file (multiple-bucket option)​

Google Storage configuration file (single-bucket option)​

Google Storage configuration file (multiple-bucket option)​

Azure Storage configuration file​

Azure Storage configuration file​

Cache configuration​

Caster file​

Passing from a single-bucket configuration to a multi-bucket​

Single-bucket configuration​

Multi-bucket configuration​

CRUD collection

Environment variables

Configuration file

MongoDB GridFS configuration file (single-bucket option)

MongoDB GridFS configuration file (multiple-bucket option)

S3 configuration file (single-bucket option)

S3 configuration file (multiple-bucket option)

Google Storage configuration file (single-bucket option)

Google Storage configuration file (multiple-bucket option)

Azure Storage configuration file

Azure Storage configuration file

Cache configuration

Caster file

Passing from a single-bucket configuration to a multi-bucket

Single-bucket configuration

Multi-bucket configuration