With its headquarters close to Zurich, Switzerland, and branch offices in New Jersey, the US, and Belgrade, Serbia, Brasidas Group is a multinational strategic intelligence and risk advising company. Based on the principles of secrecy, dependability, truthfulness, and timeliness, they provide high-end, customized business intelligence services. Their goal as a partner in global risk advice is to predict today's international headlines and to provide useful information that changes lives.They asked us to help with utilization of AWS services and with data lake management.
Brasidas had various unstructured data in different formats and locations. Their wish was to agregate a multiple data sources in a centralized Data Lake. Utilize AWS ETL services to ingest the data and then query it via SQL for later data-analytical purposes.
The project had several ideas how to approach the data pipeline, but probably the biggest challenge would have been how to make all the connections to various resources and how to deal with conversion of unsupported data formats, which was handled using a prototype of a Glue Python script as one example, and AWS Glue DataBrew pipeline as a second.
The architecture consists of a 3-tier VPC for private, public and database workloads, RDS instance, bastion host EC2, S3 buckets and Glue jobs and crawlers. Athena is used to query the crawled data via SQL statements.
Any confidential data like secrets are kept within Secrets Manager vaults that can be decrypted using a project-bound KMS key. The IAM security policies are based on ABAC (Attribute-based access control), meaning the resources will need to have appropriate attributes in order to be accessed by AWS services.
The project demonstrates several examples of using Glue for ETL process as well as allowing Glue to communicate with various services and resources in order to fetch data.
The entire project is written in Terraform, updates and changes should be done and will be done in Terraform instead of Console/UI in order to avoid configuration drift.
The client can use the PoC to build a full pipeline all the way from ingesting data to querying it for further analytics. The PoC is defined according to best-practices, uses security measures to ensure zero-trust policy, infrastructure is written in Terraform and version controlled via CodeCommit. All of the changes to the stack are visible and can go through a series of reviews before deploying into production.
No matter the field, situation or the initial set up, we delivered. We love to make the journey to cloud easy a secure and that is how our clients like it. Please see our success stories.
Tell us more about you and we'll connect you with a TrustSoft expert who can give you more information about our products and services.