Kieran Blackwood

Data Engineer

Phone: (555) 847-2963
Address: Seattle, WA
Website: https://github.com/kblackwood
Email:
  • Experienced Data Platform Engineer with 5+ years building scalable data infrastructure processing 50TB+ daily across cloud and hybrid environments
  • Specialized in real-time streaming architectures using Kafka and Spark, delivering sub-second analytics for business-critical applications
  • Proven track record of migrating legacy systems to modern data platforms, reducing costs by 40% while improving performance 3x

Work Experience

CloudTech Solutions

Senior Data Engineer

March 2022 - Present
  • Architected multi-tenant data platform on AWS processing 75TB daily data from 200+ sources, supporting 500+ concurrent users with 99.95% uptime
  • Designed real-time fraud detection pipeline using Kafka Streams and Apache Flink, reducing detection time from 24 hours to 30 seconds
  • Optimized Snowflake data warehouse performance through intelligent clustering and materialized views, cutting query costs by 45% while maintaining sub-second response times
  • Led migration of 15 legacy ETL jobs to modern ELT architecture using dbt and Airflow, reducing maintenance overhead by 60%
  • Implemented data quality framework with Great Expectations, achieving 99.8% data accuracy across critical business metrics

FinanceFlow Inc

Data Engineer

June 2020 - February 2022
  • Built end-to-end analytics pipeline processing 2M financial transactions daily using Apache Spark and Delta Lake, enabling real-time regulatory reporting
  • Migrated on-premise Teradata warehouse to AWS Redshift, reducing infrastructure costs by $300K annually while improving query performance by 4x
  • Developed automated data reconciliation system using Python and Pandas, reducing manual validation time from 8 hours to 15 minutes daily
  • Created self-service analytics platform using Apache Superset, empowering 50+ business users to generate reports independently
  • Established CI/CD pipelines for data workflows using GitLab and Terraform, reducing deployment time from 4 hours to 20 minutes

RetailMetrics Corp

Junior Data Engineer

August 2019 - May 2020
  • Developed ETL pipelines using Python and Apache Airflow to process e-commerce data from 25+ sources, supporting $50M annual revenue analytics
  • Optimized PostgreSQL database queries reducing average report generation time from 45 minutes to 5 minutes
  • Built automated data monitoring system using Prometheus and Grafana, reducing data incident response time by 70%
  • Collaborated with data science team to productionize ML models, implementing feature stores serving 100+ model predictions per second

Technical Skills

Programming Languages: Python, Scala, SQL, Java, Bash
Big Data Technologies: Apache Spark, Hadoop, Kafka, Flink, Storm
Cloud Platforms: AWS (S3, EMR, Glue, Redshift, Lambda), GCP (BigQuery, Dataflow, Pub/Sub), Azure (Synapse, Data Factory)
Databases: PostgreSQL, MySQL, MongoDB, Cassandra, Redis, Snowflake, BigQuery
Data Orchestration: Apache Airflow, Prefect, Luigi, dbt
Data Modeling: Dimensional Modeling, Data Vault 2.0, Star Schema, Kimball Methodology
DevOps & Tools: Docker, Kubernetes, Git, CI/CD, Terraform, Jenkins, Prometheus

Education

University of Washington

Bachelor of Science in Computer Science | GPA: 3.8/4.0

May 2019
Relevant Coursework: Database Systems, Distributed Computing, Big Data Analytics, Machine Learning, Data Structures & Algorithms, Statistics
Capstone Project: Built real-time recommendation engine processing 1M user interactions daily using Apache Kafka and collaborative filtering algorithms

Certifications

AWS Certified Data Analytics - Specialty

Amazon Web Services

March 2023

Google Cloud Professional Data Engineer

Google Cloud

January 2023

Databricks Certified Associate Developer for Apache Spark

Databricks

November 2022

Snowflake SnowPro Core Certification

Snowflake

September 2022

Honors

1st Place, AWS re:Invent Hackathon

Amazon Web Services

December 2023
Designed serverless data lake solution processing IoT sensor data with 99.9% accuracy. Implemented cost-optimization algorithm reducing storage costs by 55% using intelligent tiering

CloudTech Innovation Award

CloudTech Solutions

September 2023
Recognized for developing automated data lineage tracking system used across 15+ data teams

Publications

Optimizing Spark Performance for Large-Scale ETL Workloads

Data Engineering Weekly

August 2023
Technical article demonstrating 60% performance improvement through advanced partitioning strategies. Featured solution now implemented by 500+ data engineers based on community feedback

Open Source Contributions

Apache Airflow

Contributor

2022-2023
Contributed 3 bug fixes and performance improvements

dbt-utils

Contributor

Developed custom macro for data quality validation, adopted by 200+ projects