Introduction to Carbonio Storages

One primary volume and an unknown number of subsidiary volumes make up each Carbonio installation. The secondary volumes are managed and transferred between using the Carbonio Storages module.

Using Hierarchical Storage Management (Hierarchical Storage Management), a policy-based method, objects may be relocated in accordance with: One of the most practical is, for instance, to set aside the fastest storage for intense I/O operations and for often accessed data, while managing older data with the slower storage.

The remaining paragraphs of this part go with policies, HSM, other advanced approaches, and volumes and their management.

The Foundations of Carbonio Stores: Store Types and Their Functions

Carbonio permits two distinct kinds of stores:

Catalogue Store

a repository that houses your data’s metadata and is utilised by Apache Lucene to do indexing and search.

Data Centre

a location where all of your Carbonio data is stored and is arranged in a MySql database.

However, only one Index Store, one Primary Data Store, and one Secondary Data Store may be designated as Current, meaning that these are the ones that Carbonio is presently using. You can have more than one store of each kind.

Secondary and Primary Data Stores

Carbonio’s data stores come in two flavours: primary data stores and secondary data stores.

Volumes
Carbonio distinguishes three categories of volumes:
 
Principal Current
a storage device where new data is immediately written.
 
Primary Current
a volume to which data are written once an HSM policy has been applied.
 
Not Current Volumes are those that are not marked as Current and where data may only be written via particular manual procedures.
 
Items are usually put in the destination server’s Primary Current volume by default.
Instead of using “independent” volumes, which are isolated and whose directory structure is only related to the server and volume itself, the Centralised Storage feature enables the use of an S3 bucket to host data from multiple servers simultaneously sharing the same directory structure.
 
This dramatically increases mailbox move speed and enables better data management in big multistore systems.
 
Before using centralised storage, the following two crucial factors need to be considered.
 
  1. Deduplication of the item fails.
  2. The only storage type that may be utilised centrally is S3 buckets.
Configuring Centralised Storage
There are a few procedures involved in setting up a bucket for centralised storage. The steps are outlined below and will walk you through creating the bucket and connecting the new Storage to various AppServers.
 
All supported bucket types follow the same process, albeit the syntax or some of the parameters may vary significantly based on the bucket type. For instance, the URL argument must be stated in a language that the object storage itself can comprehend because it specifies the API endpoint of the object storage.
We set up an S3 bucket in our example; to set up a different sort of bucket, just use the corresponding command.
 
  1. Use the command doCreateBucket to create an S3 bucket.
We employ the following values in this example:
  • S3 as the bucket type
  • The bucket name, BucketName, must match the name on the remote provider exactly for the command to succeed.
  • The remote username is X58Y54E5687R543.
  • The remote password is abCderT577eDfjhf.
  • The bucket is given the label My_New_Bucket.
  • The bucket connection endpoint for Carbonio Storages is https://example_bucket_provider.com.
If the command is successful, it prints the bucket’s unique identification number (UUID), which looks like this: 60b8139c-d56f-4012-a928-4b6182756301. Make a note of it because the rest of the operation will require it.
 
  1. Test the connection using the bucket ID (60b8139c-d56f-4012-a928-4b6182756301) that was received in the previous step:
You will see that the message connection is fine if the command is successful.
 
  1. Make a volume on the first AppServer connected to the bucket:
These values are utilised in this illustration:
 
  • The type of bucket, S3
  • The volume name defined on the server where the command is run is Store_01. 
  • secondary: the volume’s kind
  • The bucket ID as received in step 1 is 60b8139c-d56f-4012-a928-4b6182756301.
  • volume_prefix A designation given to the volume, such as main_vol, is utilised for rapid searches.
  1. True: The volume is centralised and accessible to several AppServers.
 
Use the following command to set the volume to current so that it may receive data right away:
These values are utilised in this illustration:
 
  • The type of bucket, S3
  • The volume name defined on the server where the command is run is Store_01.
  • secondary: the volume’s kind
  1. After the Centralised Volume has been built, it must be added to the volume list and its configuration copied from the first server to all mailbox servers. Run the following commands on every other AppServer to do this:
These values are utilised in this illustration:
  • The type of bucket, S3
  • The volume name defined on the server where the command is run is Store_01 
  • The _servername_ of the server on which the volume was specified and created is mailbox_01.example.com.
The command reported in the preceding step is the second one that must be executed:
Storage Structure – Centralised Storage Structure A centralised volume simply stores data by having a directory for each mailbox that is saved in it at the same level and a single empty directory for each server that is connected to the volume.
 
The servers 595a4409-6aa1-413f-9f45-3ef0f1e560f5 and 3aa2d376-1c59-4b5a-94f6-101602fa69c6 are both linked to the same centrally located volume, which has three mailboxes. As you can see, the storage has no bearing on the actual server where the mails are hosted:
Volume management Both primary and secondary volumes may be created on supported third-party storage systems as well as on local storage.
 
Volumes by Carbonio
On a filesystem, a volume is a unique entity (path) with all the corresponding attributes that contains Carbonio Blobs.
 
Volume Characteristics
The following characteristics describe all Carbonio volumes:
 
  • Name: The volume’s special identification number
  • The data’s intended saving location is designated by a route. On this route, the zextras user has to have r/w rights.
  • File compression can be turned on or off for the volume. 
  • Compression Threshold: The smallest file size at which compression begins. Even if compression is enabled, files that are smaller than this size will never be compressed.
  • Current: A current volume is one that will have HSM policy application (secondary current) or data written to it immediately upon arrival (primary current).
Regional Volumes

Regardless of where the mountpoint is located, Local Volumes (i.e., FileBlob type) can be hosted on any mountpoint on the system and are determined by the following properties:

  • Name: The volume’s special identification number
  • The data’s intended saving location is designated by a route. This location requires r/w permissions for the zextras user.
  • File compression can be turned on or off for the volume.
  • Compression Threshold: the smallest file size at which compression begins. Even if compression is enabled, files that are smaller than this size will never be compressed.
Actual Volumes
A current volume is one that will have data written to it immediately (primary current) or through the secondary current of an HSM policy application. Except for specialised manual actions like the Volume-to-Volume transfer, volumes that are not set as Current will not be written onto.
 
Managing Volumes using Carbonio Storages
There are essentially three volume management commands: COBIO POWERSTORE zextras$ doUpdateVolume [storeType] | doDeleteVolume [name] | doCreateVolume [storeType]
 
Volume deletion simply needs the volume name, however the other two actions require the storeType parameter, which is always placed first and can take any value corresponding to an S3-Compatible Service. The command’s subsequent parameters are now dependent on the storeType that was chosen.
Depending on the [kind] of volume that has to be defined, which is one of the following, the parameters required by these commands may change.
 
  • (Local) FileBlob 
  • Alibaba 
  • Ceph
  • OpenIO
  • Swift
  • Cloudian (object storage that is S3 compliant) 
  • Amazon and any other S3-compatible solution are not expressly supported by S3.
  • Scalability (object storage S3 compatible)
  • EMC (object storage that is S3 compatible)
 
The Hierarchical Storage Management Technique, or Custom S3 Hierarchical Storage Management
Data is transferred between stores using the HSM data storage technology in accordance with a predetermined policy.
 
The HSM approach is most frequently used to transfer older data from a faster but more costly storage device to a slower but less expensive one based on the following assumptions:
  • More is spent on quick storage.
  • Slow storage has lower expenses.
  • In comparison to new data, old data will be accessed significantly less frequently.
The HSM approach has two distinct benefits: it lowers total storage costs because just a tiny portion of your data has to be on expensive storage, and it enhances user experience in general.
 
Volumes, Stores, and Policies
It’s important to grasp the following terms in order to use HSM:
 
  • Primary storage: Your data is originally deposited in this quick-but-expensive storage.
  • Secondary Store: The expensive yet sluggish storage location where older data will be sent.
Changing Products Between Stores
The ability to implement predetermined HSM rules is the Carbonio Storages module’s key feature.
 
Start the doMoveBlobs action using the CLI to initiate the move.
 
Following the move’s launch, the following actions are taken:
 
In order to determine whether things adhere to the established policy, Carbonio Storages searches through the Primary Store.
 
  • The Secondary Store receives a copy of every Blob of the products discovered in the first phase.
  • The duplicated objects’ database records are changed to reflect the relocation.
  • The old Blobs are removed from the Primary Store if (and only if) the second and third phases are successfully performed.
  • Since each step of the Move operation is stateful and is only carried out if the one before it has been properly completed, there is zero chance that any data will be lost.
 
DoMoveBlobs: Carbonio Storages’ DoMoveBlobs Operation
The doMoveBlobs serve as Carbonio Storages’ beating heart.
 
According to the appropriate HSM policy, it transfers objects between the Current Primary Store and the Current Secondary Store.
 
An algorithm for transactions executes the move. If there is a problem with one of the phases of the procedure, a rollback occurs, and the data is not changed.
 
The following actions are taken when Carbonio Storages has determined which things need to be moved:
  • The Blob is copied to the current secondary store.
  •  To inform Carbonio of the item’s new location, the Carbonio Database is updated.
  • The primary store’s current copy of the original Blob is removed.
Everything that complies with the designated HSM policy gets migrated.
Policy Order: A policy’s requirements are fulfilled in the precise order that they are listed. Before beginning the subsequent condition, Carbonio Storages will loop over all of the objects in the Current Primary Store and apply each individual condition.
 
This implies that the subsequent laws
the same outcome will be obtained if everyday on a sample server that sends/receives a total of 1000 emails each day, 100 of which contain one or more attachments. However, depending on the quantity and size of the emails stored on the server, the second policy’s execution time would presumably be significantly longer.
 
As a result, there are fewer items for the second condition to cycle on in the first policy’s first condition (message,document:before:-20day), which loops on all things and moves many of them to the Current Secondary Store.
 
The second condition will have more items to loop on if the first condition is message:before:-10day has:attachment.
This is only an illustration and does not apply in all circumstances, but it illustrates the necessity for meticulous planning in your HSM strategy.
 
Applying the HSM Policy, also known as carrying out the doMoveBlobs Operation
Running the doMoveBlobs function in order to move objects between the Primary and Secondary stores in accordance with the specified policy is referred to as applying a policy.
 
There are two choices available to you from Carbonio Storages:
  • via CLI
  • By use of scheduling
Run the following command as the zextras user to apply the HSM Policy through the CLI.
What is a policy in terms of policy management?
When the doMoveBlobs function of Carbonio Storages is performed, either manually or by scheduling, a set of rules known as an HSM policy specifies which objects will be transferred from the Primary Store to the Secondary Store.
 
Single rules that apply to all item kinds make up a simple policy, whereas many rules that apply to one or more item categories make up a composite policy.
 
Policy Case Studies
Here are some instances of policies. See the information below to learn how to establish the policies in the Carbonio Storages module.
 
Move everything that is older than 30 days.
 
Move emails that are more than 15 days old and all other items that are more than 30 days old.
“Put all emails in the Archive folder, Carbonio Files items older than 20 days, and calendar items older than 15 days”
 
Establishing a Policy
By using one of the two accessible policy management commands from the CLI, policies may be defined.
The difference between both commands is that +setHSMPolicy adds policies to existing ones while setHSMPolicy generates new policies and replaces old ones.
 

The majority of your data may be stored in secure and long-lasting cloud storage thanks to 

Carbonio Storages and S3 buckets, 
which allows for the hosting of Primary and Secondary volumes generated using Carbonio Storages.
Services compatible with S3
Although Carbonio Storages should function right out of the box with any storage service compatible with the Amazon S3 API, the following are the only systems that are formally supported:
  • (Standard Local Volume) FileBlob
  • S3 Amazon
  • EMC 
  • OpenIO
  • Swift
  • S3 Scality
  • Cloudian 
  • Any unsupported S3-compliant solution that is custom 
The “Incoming” Directory and Primary Volumes
A local “Incoming” directory must exist on a mailbox server before a remote Primary Store may be created there. The default directory is /opt/|carbonio|/incoming; the following commands allow you to view or change the current setting:
Regional Cache
The |carbonio| user must have read and write access to the local directory that will be used for item caching when storing a volume on third-party remote storage solutions.
 
On an S3-compatible device or service, you won’t be able to create any secondary volumes if the Local Cache directory is not configured.
Setup of a bucket for your volumes is simple because Carbonio Storages requires no special S3 configuration or settings. Although setting up a separate user bucket and access policy is not essential, it is highly advised because it makes management much simpler.
 
To begin storing your secondary discs on S3, you only need to:
  • an S3 container. To utilise the bucket, you must be aware of its name and location.
  • Access Key and Secret of a user
  • a rule that gives the user complete access to your bucket.
Management of buckets
The Carbonio Carbonio Admin Panel offers a centralised Bucket Management UI. Instead of inputting the information each time, this makes it easier to save bucket information to be reused when building a new volume on an S3-compatible storage.
 
Go to Mailstore Global Servers Bucket List in the Carbonio Admin Panel to view the Bucket Management UI.
 
When creating a new volume of the following types: Amazon S3, Ceph, Cloudian, EMC, Scality S3, Custom S3, Yandex, Alibaba, any bucket added to the system will be accessible.
The Carbonio Core doCreateBucket commands may also be used to create new buckets using the CLI.
 
Files are kept in buckets using well-defined paths that may be changed at whim to make it simpler to identify the contents of your bucket even in multi-server situations with several secondary volumes:
There can be as many volumes under the same destination route as you like because the Bucket Name and Destination route are not linked to the volume itself.
 
 
On the other hand, the Volume Prefix is unique to each volume and serves as a rapid means of differentiating and identifying various volumes inside the bucket.
Amazon S3 Bucket of Tips
There are no required bucket requirements for storing your secondary Carbonio volumes on Amazon S3, however we advise that you do so and turn off Static Website Hosting to make maintenance simpler.
User
A Programmatic Access user is required in order to receive an Access Key and the associated Secret. For simpler management, we advise you to create a dedicated user in Amazon’s IAM Service.
 
Rights Administration
You may define access policies for your users in Amazon’s IAM. A set of suitable privileges must be granted to the user of your Access Key and Secret for both the bucket itself and its contents. We advise assigning complete permissions like in the example below for simpler administration.
Change the Action section to: if you simply want to grant the bare minimum of permissions.
According to Amazon’s standard naming convention, the bucket’s ARN is written as arn:partition:service:region:account-id:resource. Please refer to Amazon’s documentation for further details on this subject.

Files are kept in buckets using well-defined paths that may be changed at whim to make it easier to identify what is in your bucket (even in multi-server situations with several secondary volumes):
There can be as many volumes under the same destination route as you like because the Bucket Name and Destination route are not linked to the volume itself.
On the other hand, the Volume Prefix is unique to each volume and serves as a rapid means of differentiating and identifying various volumes inside the bucket.
 
As long as the option has been enabled on the volume, Carbonio Storages will set any file bigger than the Infrequent Access Threshold value to this storage class. Infrequent Access Storage Class Carbonio Storages is compatible with the Amazon S3 Standard – Infrequent access storage class.
As long as the option has been enabled on the volume, Carbonio Storages is compatible with the Amazon S3 – Intelligent Tiering storage class and will set the necessary Intelligent Tiering flag on all files.
The definition of item deduplication
By keeping only one duplicate of an item and referring it numerous times rather than storing several copies of the same item and accessing each copy only once, you can conserve disc space using the item deduplication approach.
 
This could appear to be a modest advancement. However, it actually makes a big impact in real life.
Item Deduplication in Carbonio Carbonio performs item deduplication when a new item is stored in the Current Primary Volume.
 
The message ID of a newly produced item is compared to a list of previously cached objects. Instead of creating a brand-new BLOB for the message if there is a match, a hard link to the BLOB of the cached message is generated.
 
The following configuration attributes in Carbonio are used to control the dedupe cache.
Deduplication of items with Carbonio storage systems
The doDeduplicate action of the Carbonio Storages parses a target volume to identify and remove any duplicated items.
 
While Carbonio’s automated deduplication is restricted to a small cache, Carbonio Storages’s deduplication will also detect and handle numerous copies of the same email independent of any cache or timing. By doing this, you will save even more disc space.
 
In order to optimise your storage consumption, running the doDeduplicate procedure following a migration or a significant data import is also strongly advised.
Volume Deduplication being used
Use the carbonio getAllVolumes command to view a list of all accessible volumes.
duplicate statistics
You may observe the statistics of the doDeduplicate operation while it is executing by using the carbonio powerstore monitor [operationID] command. The doDeduplicate operation is a suitable target for the monitor command. Typical Output is:
  • Current Pass (Prefix for Digest): Based on the first character of their digests (name), the BLOBS will be analysed by the doDeduplicate command in groups.
  • Number of mailboxes that were checked during the current pass.
  • Deduplicated/duplicated Blobs: Total number of duplicated items on the volume / Number of BLOBS deduplicated by the current operation 
  • Number of already deduplicated blobs (duplicated blobs that have already undergone a prior run) on the volume
  • Blobs that have not been examined, typically as a result of a read error or a missing file, are known as skipped blobs.
  • Invalid Digests: BLOBs having incorrect digests (names that differ from the file’s real digest).
  • Amount of disc space saved overall with the doDeduplicate process.
The example output shown above reveals the following:
 
  • On the last mailbox, the procedure is now on the second-to-last pass.
  • 137089 duplicated BLOBs have been discovered, of which 71178 have already undergone deduplication.
  • 64868 BLOBs were deduplicated in the current operation, saving a total of 21.88GB in disc space.
Operations using Advanced Volume
More to Carbonio Storages than First Appearances
Carbonio Storages appears to be only focused on HSM at first glance. However, it also includes a number of really helpful volume-related utilities that are unrelated to HSM.
 
These tools are only accessible through the CLI because to the associated dangers in volume management.
 
Operations by Volume at a Glance
There are accessible volume procedures such as:
 
perform BLOB coherency tests on one or more volumes with the doCheckBlobs command.
 
doDeduplicate: Start a volume’s item deduplication.
 
Move all things from one volume to another using the doVolumeToVolumeMove command.
 
getVolumeStats: Show statistics on the size and quantity of things or blobs that a volume contains.
 

Leave a Reply

Your email address will not be published. Required fields are marked *