In many enterprise integration scenarios, data security and efficient handling of large datasets are non-negotiable. Modern enterprises struggle with transferring massive datasets securely to cloud storage while maintaining optimal performance. Traditional upload methods often fail when handling large data volumes due to network timeouts, memory constraints, and security vulnerabilities.
This implementation demonstrates how combining MuleSoft’s integration capabilities with Java-based client-side encryption and AWS S3’s multi-part upload features creates a robust solution for enterprise-scale data operations. The architecture excels at processing datasets from gigabytes to terabytes while maintaining constant memory usage and comprehensive encryption before transmission.
Encrypting data while uploading to AWS S3 with MuleSoft
Enterprise data integration projects frequently encounter challenges when transferring large datasets to cloud storage, particularly when dealing with sensitive information requiring end-to-end encryption.
The combination of MuleSoft’s enterprise service bus capabilities, Java’s robust encryption libraries, and AWS S3’s scalable storage creates a powerful architecture for handling massive data volumes securely and efficiently. This implementation leverages client-side AES-256 encryption through Java, ensuring data is encrypted before leaving the enterprise network perimeter.
When dealing with large file uploads to Amazon S3, performance, resilience, and security are critical. Amazon S3’s Multipart Upload capability improves efficiency by allowing you to upload large files in parallelized chunks, reducing the risk of failure and improving throughput.However, when sensitive data is involved, security must come first. That’s where client-side encryption becomes vital, ensuring your files are encrypted before they leave your network and giving you full control over encryption keys and compliance requirements.
We’ll discuss how to build a scalable and secure way to encrypt data using a custom AES key and upload it in parts to an AWS S3 bucket – all orchestrated via MuleSoft and Java.
Multi-part upload with client-side encryption architecture
The multi-part upload mechanism serves as the foundation for handling large datasets efficiently, breaking down massive files into manageable chunks that can be processed and uploaded in parallel. Each part is independently encrypted using AES-256 encryption with customer-managed keys retrieved from secure key management systems like HashiCorp Vault or AWS KMS.
The Java encryption component creates individual AES encryption contexts for each part, ensuring that even if one part is compromised, the remaining data remains secure. This part-level encryption strategy also enables parallel processing, where multiple encryption and upload operations can occur simultaneously without compromising security.
Problem statement | Incrementally extract data from a database or any source Secure the data before transmission by encrypting it on the client side Upload the data efficiently using AWS S3’s multi-part upload feature Build a reusable, secure, and MuleSoft-compatible module |
Architecture overview | The solution combines several key components optimized for large data handling: MuleSoft integration platform: Orchestrates high-volume data flows with batch processingCustom Java encryption layer: Implements client-side AES-256 encryption for data at any scaleAWS S3 multi-part upload: Enables efficient large file uploads with parallel processingStreaming database processing: Handles large result sets without memory exhaustion |
High-level flow | MuleSoft invokes a pagination-enabled SQL query to fetch a batch of incremental records The data is serialized and passed to the custom Java module Java encrypts the data using Advanced Encryption Standard (AES) The encrypted data is uploaded to AWS S3 using the multipart upload API MuleSoft continues with the next page of data |
Pagination logic in MuleSoft | On the MuleSoft side, we built a pagination logic to handle large datasets without overwhelming the memory footprint. This was done using:Query parameters like OFFSET and LIMITA While scope to continue fetching pages until no more results existConditional checks to break the loop |
Each batch of data is passed to the Java module for encryption:
public class AWSS3ClientSideEncryption {
private static AmazonS3Encryption s3ClientEncryption;
/**
* Constructor method to initialize AWSS3ClientSideEncryption object
* @param accessKey AWS S3 accessKey
* @param accessSecret AWS S3 accessSecret
* @param encryptionKey Client side encryption key
*/
static Logger logger = LoggerFactory.getLogger(AWSS3ClientSideEncryption.class.getName());
public AWSS3ClientSideEncryption() {
}
public AWSS3ClientSideEncryption(String accessKey, String accessSecret, String encryptionKey)
throws NoSuchProviderException, InvalidKeySpecException, IOException, NoSuchAlgorithmException {
Security.addProvider(new BouncyCastleProvider());
SecretKey aesKey = null;
try {
aesKey = generateAESKey(encryptionKey);
} catch (Exception e) {
e.printStackTrace();
}
BasicAWSCredentials awsCreds = new BasicAWSCredentials(accessKey, accessSecret);
try {
updateEnv("AWS_REGION", "REGION");
} catch (ReflectiveOperationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
AWSS3ClientSideEncryption.s3ClientEncryption = AmazonS3EncryptionClient.encryptionBuilder()
.withCredentials(new AWSStaticCredentialsProvider(awsCreds)).withRegion(Regions.REGION)
.withEncryptionMaterials(new StaticEncryptionMaterialsProvider(new EncryptionMaterials(aesKey)))
.build();
}
/**
* Method to place the client side encrypted file in AWS S3
* @param bucket AWS S3 bucket name
* @param key AWS S3 key name
* @param content Content of the file to be placed in S3
* @return PutObjectResult Object created in s3
*/
public PutObjectResult encryptPayload(String bucket, String key, String content) {
// Encrypt and store our object
PutObjectResult putObjectResult = s3ClientEncryption.putObject(bucket, key, content);
return putObjectResult;
}
Multipart upload to AWS S3
To handle large files efficiently, we used AWS S3’s Multipart Upload API. This allows for resumable uploads, parallel uploads, and optimized performance. Here’s a peek at the logic used to perform the multipart upload to S3:
public String initiateMultiPartUpload(String bucket, String fileName) {
InitiateMultipartUploadRequest initiateMultiPartUploadRequest = new InitiateMultipartUploadRequest(bucket,
fileName);
InitiateMultipartUploadResult initiateMultipartUploadResult = s3ClientEncryption
.initiateMultipartUpload(initiateMultiPartUploadRequest);
logger.info("Successfully initiated Multi-part upload. UploadId is :: "
+ initiateMultipartUploadResult.getUploadId());
return initiateMultipartUploadResult.getUploadId();
}
Upload multiparts:
public Map<String, List<Map<String, Object>>> uploadpart(String uploadId, String bucketName,
String objectName, String content, Integer partNumber, Boolean isLast, Integer partSize) {
// Optimized for large data chunks - processes data in 5MB+ segments
// Each segment is encrypted individually for security and performance
if (!isLast) {
InputStream is = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
int totalStreamSize = is.available();
int diff = totalStreamSize % partSize;
int actual = totalStreamSize - diff;
int iterations = actual / partSize;
// Process large data in multiple iterations to avoid memory overflow
for (int i = 0; i < iterations; i++) {
byte[] bytes = new byte[partSize]; // 5MB chunks optimal for S3
is.read(bytes, 0, partSize);
// Each chunk is encrypted and uploaded independently
}
}
}
Complete mulipart:
/**
* Method to complete the multi-part upload to AWS S3
* @param uploadId upload id from S3
* @param bucket AWS S3 bucket name
* @param fileName name of the file created in S3
* @param partETags list of part tags
* @return etag unique id of the uploaded parts
*/
public String completeMultiPartUpload(String uploadId, String bucket, String fileName, List<PartETag> partETags) {
CompleteMultipartUploadRequest completeMultipartUploadRequest = new CompleteMultipartUploadRequest()
.withUploadId(uploadId).withBucketName(bucket).withKey(fileName).withPartETags(partETags);
CompleteMultipartUploadResult completeMultipartUploadResult = s3ClientEncryption
.completeMultipartUpload(completeMultipartUploadRequest);
return completeMultipartUploadResult.getETag();
}
Dynamic part sizing for optimal performance
For datasets under 1GB, the system uses 10MB parts with standard processing, while larger datasets benefit from 50-100MB parts that maximize throughput. Configurable sizing adjusts part dimensions based on real-time bandwidth analysis, ensuring that high-bandwidth connections can leverage larger parts for increased efficiency.
BENEFITS |
Memory-efficient: Processes TB-scale data without loading entire datasets into memory Parallel processing: Multiple 5MB+ chunks uploaded simultaneously for maximum throughput Resumable operations: Large uploads can resume from failed parts, not from the beginning Scalable architecture: Seamlessly handles datasets from GB to multi-TB scale |
Java class initialization in Mule
<java:invoke-static doc:name="newS3EncryptionObject"
doc:id="67c69083-c7f3-49d6-a5f0-c35e4d3bdd63" class="com.sfdc.edh.AWSS3ClientSideEncryption"
method="evaluate(String,String,String)" target="encryptor">
<java:args><![CDATA[#[{
arg0 : Mule::p('aws.accessKey'),
arg1 : Mule::p('aws.accessSecret'),
arg2 : Mule::p('aws.clientSideEncryptionKey')
}]]]></java:args>
</java:invoke-static>
<java:invoke doc:name="initiate Multi-part upload" doc:id="868be535-ff39-4c29-acd5-8c41b74bad7a"
instance="#[vars.encryptor]" class="com.sfdc.edh.AWSS3ClientEncryption"
method="initiateMultiPartUpload(String, String)" target="uploadId">
<java:args><![CDATA[#[{
arg0: Mule::p('aws.bucketName') as String,
arg1: vars.fileName as String
}]]]></java:args>
</java:invoke>
<java:invoke doc:name="Upload Part" doc:id="4d438f37-ed76-4933-b65a-bd9346762bde"
class="com.sfdc.edh.AWSS3ClientEncryption"
method="uploadpart(String, String, String, String, Integer, Boolean, Integer)" instance="#[vars.encryptor]">
<java:args><![CDATA[#[{
arg0: vars.uploadId,
arg1: Mule::p('aws.bucketName'),
arg2: vars.fileName,
arg3: vars.partData,
arg4: vars.partIndex as Number,
arg5: vars.isLast as Boolean,
arg6: Mule::p('min.part.size') as Number
}]]]></java:args>
</java:invoke>
BENEFITS |
Security: Full control of encryption keys; data encrypted before network transmission Efficiency: Multipart upload seamlessly handles large files Modularity: The Java class is reusable and maintainable Scalability: Pagination and streaming prevent memory bloat |
Secure, scalable enterprise data uploads
By combining the orchestration power of MuleSoft, the cryptographic control of Java, and the scalability of AWS S3, we created a secure and production-ready pattern for enterprise data uploads. If you’re handling regulated data or building a high-volume pipeline, this approach offers both security and scale by design.