
Brasidas - Utilization of AWS services, pipeline for unstructured data
With its headquarters close to Zurich, Switzerland, and branch offices in New Jersey, the US, and Belgrade, Serbia, Brasidas Group is a multinational strategic intelligence and risk advising company. Based on the principles of secrecy, dependability, truthfulness, and timeliness, they provide high-end, customized business intelligence services. Their goal as a partner in global risk advice is to predict today's international headlines and to provide useful information that changes lives.They asked us to help with utilization of AWS services and with data lake management.
The starting point
Brasidas had various unstructured data in different formats and locations. Their wish was to agregate a multiple data sources in a centralized Data Lake. Utilize AWS ETL services to ingest the data and then query it via SQL for later data-analytical purposes.
The challenge
The project had several ideas how to approach the data pipeline, but probably the biggest challenge would have been how to make all the connections to various resources and how to deal with conversion of unsupported data formats, which was handled using a prototype of a Glue Python script as one example, and AWS Glue DataBrew pipeline as a second.
What we did
The architecture consists of a 3-tier VPC for private, public and database workloads, RDS instance, bastion host EC2, S3 buckets and Glue jobs and crawlers. Athena is used to query the crawled data via SQL statements.
Any confidential data like secrets are kept within Secrets Manager vaults that can be decrypted using a project-bound KMS key. The IAM security policies are based on ABAC (Attribute-based access control), meaning the resources will need to have appropriate attributes in order to be accessed by AWS services.
The project demonstrates several examples of using Glue for ETL process as well as allowing Glue to communicate with various services and resources in order to fetch data.
The entire project is written in Terraform, updates and changes should be done and will be done in Terraform instead of Console/UI in order to avoid configuration drift.

Results
Testimonial

Ask our Experts
Leave us a contact, we will get back to you