Data Engineer | Data Scientist | Technical Writer

Building data systems people can trust, use, and understand.

I build scalable data pipelines, extract actionable insights with statistics and machine learning, and document complex systems so technical and business teams can make better decisions.

4+
years in data roles
18+
technical tutorials
40+
students mentored
95%+
accuracy focus in client insight delivery
Samuel Shaibu
Current focus

Python, SQL, AWS ETL, Power BI, ML modeling, and data education.

Based in

Vilnius, Lithuania. Working across analytics, data platforms, and technical writing.

What I do

Three disciplines, one workflow.

My best work happens where data engineering, data science, and communication overlap: build the system, analyze the signal, explain the result.

Data Engineering

I design pipelines that move, validate, transform, and serve data reliably for analytics and operations.

ETL pipelinesData validationAWS GlueS3 + AthenaStep FunctionsTerraform

Data Science

I use statistical modeling and machine learning to answer business questions and make trade-offs visible.

Machine learningPricing analysisExperimentationForecastingModel evaluationResponsible AI

Technical Writing

I turn technical systems into tutorials, documentation, and explanations people can actually use.

DataCamp tutorialsAPI-style docsNewsletter writingModel cardsGuidesDeveloper education
Featured projects

Case studies, not just repository links.

Pipeline pattern
How I architect data products: reliable flow, visible decisions, useful outputs.
Architecture
SourcesFiles · APIs · DBsGlue ETLPySpark · CrawlersS3 StorageBronze → GoldAthena + MLSQL · scikit-learnBI / DocsPower BI · ReportsORCHESTRATEStep FunctionsEventBridge SchedulerVALIDATEQuality ChecksLogging · MonitoringINFRASTRUCTURETerraform · IAMPolicies · Secrets
Data Engineering

AWS Survey & Ad-Awareness ETL Pipeline

Problem

Survey and ad-awareness data needed repeatable ingestion, transformation, validation, and query-ready outputs.

Solution

Built a Terraform-managed AWS workflow with Glue, S3, Athena, IAM, EventBridge Scheduler, and Step Functions.

Impact

Created an automated, reviewable pipeline across bronze and silver layers with documented resource flow and data checks.

PythonAWS GlueS3AthenaStep FunctionsTerraform
Python Engineering

Legacy Integration Pipeline Modernization

Problem

A large legacy integration workflow created operational friction and required too much manual handling.

Solution

Modernized the Python codebase, added multi-format ingestion, SFTP/FTPS handling, S3 delivery, logging, and monitoring.

Impact

Reduced manual operations workload by about 40% and improved consistency for downstream reporting datasets.

PythonSQLpandasSFTPAWS S3pytest
Responsible AI

AI Ethics Personalized Budget Predictor

Problem

A recommender-style model needed transparency around bias, fairness trade-offs, and threshold decisions.

Solution

Built a reproducible scikit-learn pipeline with MLflow tracking, fairness metrics, reweighing, threshold sweeps, and a Streamlit app.

Impact

Produced governance-ready artifacts including a model card, data sheet, risk assessment, and trade-off summary.

PythonScikit-learnMLflowStreamlitFairness metrics
Data Science

Brand Tracking & Pricing Analytics

Problem

Global brand teams needed clearer pricing, perception, and campaign-performance insights across multiple markets.

Solution

Delivered Power BI dashboards, SQL models, CVI/CBI computations, MaxDiff, Van Westendorp, and key driver analysis.

Impact

Supported client-facing decision-making with 95%+ data accuracy targets and 15-20% pricing strategy precision improvements.

Power BISQLPythonDAXMaxDiffVan Westendorp
Writing & publications

Technical writing is part of the work.

I write tutorials, explainers, and newsletters that make data engineering, analytics, machine learning, and AI ethics easier to understand and apply.

View all writing
PythonDataCampPractical read

Python Circular Import: Causes, Fixes, and Best Practices

A practical breakdown of circular imports, why they happen, and how to restructure Python code cleanly.

Read article
Machine LearningDataCampPractical read

Linear Regression in Python

A hands-on guide to the core ideas behind linear regression and how to implement them in Python.

Read article
SQLDataCampPractical read

Normalization in SQL

Explains database normalization from 1NF to 5NF with examples for cleaner and more reliable schemas.

Read article
NewsletterSubstackPractical read

All About Data & More

A newsletter for aspiring data professionals, covering data analytics, machine learning, SQL, and job-ready workflows.

Read article
ExcelDataCampPractical read

How to Make a Bar Graph in Excel

A beginner-friendly guide to building clear Excel bar charts for reporting and analysis.

Read article
PythonDataCampPractical read

Python Inheritance: Best Practices for Reusable Code

Covers inheritance in Python with examples that make object-oriented programming easier to apply.

Read article
Tech stack

Grouped by how the work gets done.

No percentage bars. The stack is organized by the job it performs: pipelines, modeling, dashboards, applications, and writing.

Data Engineering

PythonSQLpandasAWS GlueS3AthenaStep FunctionsEventBridgeTerraformSSIS

Data Science

Scikit-learnStatisticsModel evaluationFeature importanceExperimentationPricing modelsResponsible AI

Analytics & BI

Power BIDAXKPI designData modelingCVI/CBIMaxDiffVan WestendorpDashboard design

Apps & Writing

StreamlitLangChainOpenAI APITechnical tutorialsDocumentationModel cardsData sheetsCurriculum design
Experience

A career built around data work that ships.

Full experience
2025 - Present

Data Scientist

Syno International

Building ETL workflows, AI-powered internal tools, Power BI outputs, and statistical models for survey, brand, and pricing analytics.

2024 - 2025

Data Analyst

Syno International

Delivered dashboards, SQL models, pricing studies, brand equity analysis, and client-ready insight workflows across multiple markets.

2024 - 2025

Technical Writer

DataCamp

Published peer-reviewed tutorials on Python, SQL, Excel, statistics, machine learning, and practical data concepts.

2023 - 2024

Data Science Instructor

GOMYCODE

Taught and mentored 40+ students through Python, SQL, machine learning, visualization, and end-to-end data projects.

2020 - 2024

BI Specialist / Data Scientist

Freelance

Built dashboards, ETL processes, documentation, and analytical solutions for clients across energy, finance, and edtech.