Carbonio Storages and S3 buckets

Carbonio Storages and S3 buckets

The majority of your data may be stored in secure and long-lasting cloud storage thanks to 

Carbonio Storages and S3 buckets, 
which allows for the hosting of Primary and Secondary volumes generated using Carbonio Storages.
Services compatible with S3
Although Carbonio Storages should function right out of the box with any storage service compatible with the Amazon S3 API, the following are the only systems that are formally supported:
  • (Standard Local Volume) FileBlob
  • S3 Amazon
  • EMC 
  • OpenIO
  • Swift
  • S3 Scality
  • Cloudian 
  • Any unsupported S3-compliant solution that is custom 
The “Incoming” Directory and Primary Volumes
A local “Incoming” directory must exist on a mailbox server before a remote Primary Store may be created there. The default directory is /opt/|carbonio|/incoming; the following commands allow you to view or change the current setting:
 
zextras$ carbonio config server get $(zmhostname) attribute incomingPath

zextras$ carbonio config server set $(zmhostname) attribute incomingPath value /path/to/dir
Regional Cache
The |carbonio| user must have read and write access to the local directory that will be used for item caching when storing a volume on third-party remote storage solutions.
 

Warning

Failing to correctly configure the cache directory will cause items to be unretrievable, meaning that users will get a No such BLOB error when trying to access any item stored on an S3 volume.

Setup of a bucket
On an S3-compatible device or service, you won’t be able to create any secondary volumes if the Local Cache directory is not configured. your volumes is simple because Carbonio Storages requires no special S3 configuration or settings. Although setting up a separate user bucket and access policy is not essential, it is highly advised because it makes management much simpler.
 
To begin storing your secondary discs on S3, you only need to:
  • an S3 container. To utilise the bucket, you must be aware of its name and location.
  • Access Key and Secret of a user
  • a rule that gives the user complete access to your bucket.
Management of buckets
The Carbonio Carbonio Admin Panel offers a centralised Bucket Management UI. Instead of inputting the information each time, this makes it easier to save bucket information to be reused when building a new volume on an S3-compatible storage.
 
Go to Mailstore ‣ Global Servers ‣ Bucket List in the Carbonio Admin Panel to view the Bucket Management UI.
 
When creating a new volume of the following types: Amazon S3, Ceph, Cloudian, EMC, Scality S3, Custom S3, Yandex, Alibaba, any bucket added to the system will be accessible.
 
Thecarbonio core doCreateBucket commands may also be used to create new buckets using the CLI.
 
Bucket Paths and Naming
Files are kept in buckets using well-defined paths that may be changed at whim to make it simpler to identify the contents of your bucket even in multi-server situations with several secondary volumes:
 
/Bucket Name/Destination Path/[Volume Prefix-]serverID/
  • There can be as many volumes under the same destination route as you like because the Bucket Name and Destination route are not linked to the volume itself.
  • On the other hand, the Volume Prefix is unique to each volume and serves as a rapid means of differentiating and identifying various volumes inside the bucket.
Amazon S3 Bucket of Tips
There are no required bucket requirements for storing your secondary Carbonio volumes on Amazon S3, however we advise that you do so and turn off Static Website Hosting to make maintenance simpler.
User
Programmatic Access  is required in order to receive an Access Key and the associated Secret. For simpler management, we advise you to create a dedicated user in Amazon’s IAM Service.
 
Rights Administration
You may define access policies for your users in Amazon’s IAM. A set of suitable privileges must be granted to the user of your Access Key and Secret for both the bucket itself and its contents. We advise assigning complete permissions like in the example below for simpler administration.
Example structure of user’s permission
{
    "Version": "[LATEST API VERSION]",
    "Statement": [
        {
            "Sid": "[AUTOMATICALLY GENERATED]",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "[BUCKET ARN]/*",
                "[BUCKET ARN]"
            ]
        }
    ]
}

Warning

This is not a valid configuration policy. Don’t copy and paste it into your user’s settings as it won’t be validated.

Change the Action section to: if you simply want to grant the bare minimum of permissions.

"Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListBucket"
              ],

According to Amazon’s standard naming convention, the bucket’s ARN is written as arn:partition:service:region:account-id:resource. Please refer to Amazon’s documentation for further details on this subject.

Bucket Paths and Naming
Files are kept in buckets using well-defined paths that may be changed at whim to make it easier to identify what is in your bucket (even in multi-server situations with several secondary volumes):
 
/Bucket Name/Destination Path/serverID/
There can be as many volumes under the same destination route as you like because the Bucket Name and Destination route are not linked to the volume itself.
 
On the other hand, the Volume Prefix is unique to each volume and serves as a rapid means of differentiating and identifying various volumes inside the bucket.
 
Infrequent Access Storage Class
As long as the option has been enabled on the volume, Carbonio Storages will set any file bigger than theInfrequent Access Threshold value to this storage class. Carbonio Storages is compatible with the  Amazon S3 Standard - Infrequent access  storage class.
 

See also

The official Amazon S3 documentation on Infrequent Access

Intelligent Tiering Storage Class
As long as the option has been enabled on the volume, Carbonio Storages is compatible with the Amazon S3 – Intelligent Tiering storage class and will set the necessary Intelligent Tiering flag on all files.
 

See also

The official Amazon S3 documentation on Intelligent Tiering

Item Deduplication
The definition of item deduplication
By keeping only one duplicate of an item and referring it numerous times rather than storing several copies of the same item and accessing each copy only once, you can conserve disc space using the item deduplication approach.
 
This could appear to be a modest advancement. However, it actually makes a big impact in real life.
 
Item Deduplication Carbonio

 performs item deduplication when a new item is stored in the Current Primary Volume.

The  message ID  of a newly produced item is compared to a list of previously cached objects. Instead of creating a brand-new BLOB for the message if there is a match, a hard link to the BLOB of the cached message is generated.
 
The following configuration attributes in Carbonio are used to control the dedupe cache.

zimbrarefDedupeMessagesSentToSelf

Used to set the deduplication behavior for sent-to-self messages:

<attr id="144" name="|carbonio|PrefDedupeMessagesSentToSelf" type="enum" value="dedupeNone,secondCopyifOnToOrCC,dedupeAll" cardinality="single"
optionalIn="account,cos" flags="accountInherited,domainAdminModifiable">
  <defaultCOSValue>dedupeNone</defaultCOSValue>
  <desc>dedupeNone|secondCopyIfOnToOrCC|moveSentMessageToInbox|dedupeAll</desc>
</attr>

zimbraMessageIdDedupeCacheSize

Number of cached Message IDs:

<attr id="334" name="|carbonio|MessageIdDedupeCacheSize" type="integer" cardinality="single" optionalIn="globalConfig" min="0">
  <globalConfigValue>3000</globalConfigValue>
  <desc>
    Number of Message-Id header values to keep in the LMTP dedupe cache.
    Subsequent attempts to deliver a message with a matching Message-Id
    to the same mailbox will be ignored.  A value of 0 disables deduping.
  </desc>
</attr>

zimbraPrefMessageIdDedupingEnabled

Manage deduplication at account or COS-level:

<attr id="1198" name="|carbonio|PrefMessageIdDedupingEnabled" type="boolean" cardinality="single" optionalIn="account,cos" flags="accountInherited"
 since="8.0.0">
  <defaultCOSValue>TRUE</defaultCOSValue>
  <desc>
    Account-level switch that enables message deduping.  See zimbraMessageIdDedupeCacheSize for more details.
  </desc>
</attr>

zimbraMessageIdDedupeCacheTimeout

Timeout for each entry in the dedupe cache:

<attr id="1340" name="zimbraMessageIdDedupeCacheTimeout" type="duration" cardinality="single" optionalIn="globalConfig" since="7.1.4">
  <globalConfigValue>0</globalConfigValue>
  <desc>
    Timeout for a Message-Id entry in the LMTP dedupe cache. A value of 0 indicates no timeout.
    zimbraMessageIdDedupeCacheSize limit is ignored when this is set to a non-zero value.
  </desc>
</attr>
 
Deduplication of items with Carbonio storage systems
The doDeduplicate action of the Carbonio Storages parses a target volume to identify and remove any duplicated items.
 
While Carbonio’s automated deduplication is restricted to a small cache, Carbonio Storages’s deduplication will also detect and handle numerous copies of the same email independent of any cache or timing. By doing this, you will save even more disc space.
 
In order to optimise your storage consumption, running the  doDeduplicate procedure following a migration or a significant data import is also strongly advised.
Volume Deduplication being used
Via the CLI

To run a volume deduplication through the CLI, use the carbonio powerstore doDeduplicate command.

zextras$ carbonio powerstore doDeduplicate *volume_name* [param \
VALUE[,VALUE]]

Parameter List

NAME

TYPE

EXPECTED VALUES

DEFAULT

volume_name (M)

String[,..]

  

dry_run (O)

Boolean

true|false

false

(M) == mandatory parameter, (O) == optional parameter

Usage Example

zextras$ carbonio powerstore doDeduplicate secondvolume

Starts a deduplication on volume secondvolume

Use the carbonio carbonio getAllVolumes to view a list of all accessible volumes.
 

doDeduplicatestatistics

You may observe the statistics of the doDeduplicate operation while it is executing by using the carbonio powerstore monitor [operationID] command. The doDeduplicate operation is a suitable target for the monitor command. Typical Output is:

Current Pass (Digest Prefix):  63/64
 Checked Mailboxes:             148/148
 Deduplicated/duplicated Blobs: 64868/137089
 Already Deduplicated Blobs:    71178
 Skipped Blobs:                 0
 Invalid Digests:               0
 Total Space Saved:             21.88 GB
  • Current Pass (Prefix for Digest): Based on the first character of their digests (name), the BLOBS will be analysed by the doDeduplicate  command in groups.
  • Number of mailboxes that were checked during the current pass.
  • Deduplicated/duplicated Blobs: Total number of duplicated items on the volume / Number of BLOBS deduplicated by the current operation 
  • Number of already deduplicated blobs (duplicated blobs that have already undergone a prior run) on the volume
  • Blobs that have not been examined, typically as a result of a read error or a missing file, are known as skipped blobs.
  • Invalid Digests: BLOBs having incorrect digests (names that differ from the file’s real digest).
  • Amount of disc space saved overall with the doDeduplicate process.
The example output shown above reveals the following:
 
  • On the last mailbox, the procedure is now on the second-to-last pass.
  • 137089 duplicated BLOBs have been discovered, of which 71178 have already undergone deduplication.
  • 64868 BLOBs were deduplicated in the current operation, saving a total of 21.88GB in disc space.