HPLC/GC Performance Data Analytics for Quality Control Intelligence¶
Author: Christopher Edozie Sunday
Tools Used: Excel, SQL (SQLite), Python, Jupyter Notebook, Tableau & Power BI
Domain: Analytical Chemistry / Data Analytics
Date: 31 st December 2025
Project Description¶
This project demonstrates an end-to-end data analytics project that analyzes data from HPLC and GC instruments using Excel, SQL, Python, Tableau, and Power BI. It demonstrates data cleaning, relational database modeling, statistical QC, anomaly detection, time-series analysis, culminating in interactive dashboards for monitoring calibration performance and instrument health.
Project Overview¶
This project demonstrates analytics workflow that transforms laboratory-generated performance data from HPLC and GC instruments into actionable quality intelligence, enabling early detection of the instrument drift, calibration instability, and process anomalies before they compromise results. It bridges the gap between raw chromatographic outputs and decision-ready insights, allowing scientists, quality professionals, and supervisors to proactively monitor instrument performance, method stability, and analytical reliability.
Project Relevance to Quality Control Scientist and Quality Intelligence Roles¶
This project is designed to reflect real-world laboratory operations in manufacturing quality control, mining assay laboratories, and research facilities where data integrity, reproducibility, and timely interpretation of results are critical. This demonstrates actual responsibilities of QC Scientists, Analytical Chemists, QC Managers, Lab Supervisors, and Data Analysis Teams that works in regulated environments, including:
- Trending of chromatographic performance data for compliance monitoring
- Identification of early warning signals prior to specification failure
- Support of deviation investigations through data-driven evidence
- Instrument performance monitoring and method lifecycle management
- Translation of raw analytical data into actionable quality intelligence
The workflow mirrors industry practice by integrating Excel-based laboratory data, SQL-driven data structuring, and Python-based statistical analysis to support regulatory-compliant quality decisions.
Scientific & Quality Context of this Project¶
Chromatographic data is foundational to analytical decision-making, yet it is often siloed within instrument software, inconsistently structured, and underutilized for trend-based quality monitoring. Across regulated and non-regulated environments, laboratories face common challenges:
- Instrument drift and performance variability
- Delayed detection of analytical anomalies
- Fragmented data across instruments and runs
- Limited visibility into long-term trends
In pharmaceutical and mining laboratories, these issues directly impact compliance, throughput, and risk management. In academic or research settings, they affect data reliability, reproducibility, and research validity. This project addresses these challenges through structured data modeling and statistical analysis.
Quality & Compliance Relevance¶
This project applies data analytics to chromatographic quality control within a regulated laboratory framework. Analytical performance metrics are evaluated using principles consistent with:
- ICH Q2(R2): Validation of Analytical Procedures
- ICH Q14: Analytical Procedure Development
- FDA and EMA expectations for ongoing performance verification
- ISO/IEC 17025 requirements for method control and monitoring
Key quality objectives addressed include:
- System suitability trending and early detection of performance drift
- Instrument equivalency and data comparability
- Method precision, accuracy, and robustness assessment
- Proactive identification of risks leading to OOS or OOT results
The analytical insights generated support deviation prevention, CAPA prioritization, and data-driven decision-making in QC and analytical development environments.
Primary Project Goal:¶
Enable data-driven quality decisions by converting data from chromatographic instruments into clear, traceable, and regulator-aligned performance insights.
Project Objectives¶
- Clean and convert raw HPLC/GC data into structured, analysis-ready datasets
- Design a 3NF relational schema suitable for chromatographic quality control data
- Implement foreign keys and indexing, and demonstrate SQL joins
- Compute accuracy, precision, calibration, and system suitability metrics
- Monitor instrument and method performance over time
- Detect instrument drift early, anomalies, and out-of-control conditions
- Produce audit-ready and reproducible end-to-end analytics workflow in JupyterLab (DDL + queries)
- Produce narrative-driven Jupyter Notebooks explaining analytical intent and methodology
- Scientific visualization in Python for QC interpretation and decision support
- Build interactive Tableau dashboards for performance monitoring
- Generate KPI-driven report in Power BI for management and quality review
- Bridge analytical chemistry and data analytics
Scope of Work:¶
This project delivers a complete, reproducible analytics pipeline that:
- Simulates realistic chromatographic QC datasets
- Structures data into a normalized relational database
- Computes regulatory-aligned QC metrics
- Applies statistical quality control (SQC) and trend analytics
- Visualizes results in interactive dashboards for decision-makers
Tools & Technologies¶
- Excel → Data simulation, initial data profiling, and validation
- SQLite + SQL (DBeaver) → Data modeling, normalization, querying
- Python (Pandas, NumPy, Scikit-Learn) → QC metrics, SQC, trend analysis
- Tableau → Interactive dashboards for QC monitoring and reporting
- Power BI → KPI-driven reporting for management and quality review
This report will be divided into sections corresponding to phases of the data analysis life cycle - Ask, Prepare, Process, Analyse, Share, and Act.
ASK PHASE — Defining the Problem¶
Problem Statement
Laboratory QC data is often:
- Locked inside vendor software
- Reviewed manually and retrospectively
- Poorly structured for trend analysis
This limits early detection of:
- Calibration drift
- Instrument instability
- Method performance degradation
Key Business Questions Addressed
- Are HPLC/GC instruments operating consistently over time?
- Are calibration models stable and linear?
- Can early warning signals detect drift before failure?
- How can QC data be summarized clearly for decision-makers?
PREPARE PHASE — Data Sources & Rationale¶
Data Source Description:
Because real chromatographic QC data is often proprietary, this project uses a Excel-simulated but analytically realistic dataset designed to closely reflectreal HPLC/GC laboratory outputs while ensuring reproducibility and avoiding proprietary or confidential data exposure. Key variables include:
| Variables | Purpose |
|---------------------|-----------------------------------------------------|
| Sample_ID | Unique identifier for each QC injection |
| Instrument_ID | Differentiates HPLC vs GC systems |
| RetentionTime_min | Indicates chromatographic stability |
| Peak_Area | Primary quantitative detector response |
| PeakWidth_min | Reflects column efficiency and system dispersion |
| Concentration_mgL | Calculated analyte concentration |
| TrueValue_mgL | Reference value for accuracy assessment |
| Run_Date | Enables time-based trend analysis |
The Excel-simulation steps are detailed in notebooks/02_data_simulation.ipynb, and the raw dataset saved as:
hplc_gc_qc_data_raw.xlsx
PROCESS PHASE — Cleaning & Structuring¶
Data Cleaning & Validation (Excel)
Why This Matters:
QC statistics are only meaningful if the underlying data is valid.
Key Checks Performed:
- Missing values detection
- Outlier screening (not blind removal)
- Date standardization for time-series analysis
The Cleaning & Validation (Excel) steps are detailed in notebooks/02_data_simulation.ipynb, and the cleaned dataset saved as:
hplc_gc_qc_data_cleaned.xlsx
This file serves as the single source of truth for all downstream analysis.
SQL Data Modeling & Normalization
A fully normalized SQLite schema (3NF) was designed to emulate real laboratory data infrastructure.
SQL Data Modeling & Normalization steps in DBeaver are detailed in notebooks/03_SQL_database_relational_schema_creation.ipynb, and the core tables are:
Core Tables:
- instruments
- samples
- sample_metrics
- calibrations
- system_suitability
- control_summary
Why This Matters:
- Eliminates redundancy
- Enables traceability
- Supports mant-to-one relationship
- Supports robust and scalable QC analytics
ANALYZE PHASE — Statistical & QC Analytics¶
Python-Driven QC Analytics Using Pandas, NumPy, and Scikit-Learn, the project computes the following:
(a) Sample-Level Metrics
- Error (mg/L, %)
- Percent recovery
- Response factor
- Z-score outlier detection
- %RSD (precision)
(b) Calibration Analytics
- Slope, intercept, R²
- Response factor stability
- Linearity assessment
(c) Statistical Process Control
- Shewhart limits
- EWMA charts
- CUSUM charts
- Rolling mean, std, CV
(d) System Suitability
- Plate count
- Resolution
- Tailing factor
Detailed Python code blocks for these statistical and analytical computations are shown in
notebooks/04_data_importation_key_metrics_computation.ipynb. These analyses align conceptually with USP <621>, USP <1225>, and ICH Q2(R2) expectations.
Key Analysis Categories:
- Calibration Trend & Stability
- Method Performance (Accuracy & Precision)
- QC & Anomaly Detection
- Instrument & System Suitability
Detailed steps for each of the Analysis Categories are respectively shown in:
notebooks/05_calibration_trend_analysis.ipynbnotebooks/06_method_performance_analysis.ipynbnotebooks/07_qc_anomaly_analysis.ipynbnotebooks/08_system_suitabilty_analysis.ipynb
SHARE PHASE — Visualization & Communication¶
Tableau Dashboards
Dashboard 1 — Calibration Performance Overview
- Parity plots
- Accuracy heatmaps
- Response factor trends
- R² linearity indicators
Dashboard 2 — Instrument Health Monitoring
- Peak area control charts
- EWMA & CUSUM charts
- Rolling statistics
Dashboard 3 — Method Performance
- Precision distributions
- Accuracy (%Recovery)
- Outlier detection maps
Audience Considerations
| Audience | Needs |
|---|---|
| QC Managers | Stability & compliance signals |
| Lab Supervisors | Instrument health indicators |
| Data Teams | Reproducibility & structure |
| Recruiters | Tool proficiency & clarity |
ACT PHASE — Insights & Recommendations¶
Insight-Driven Analysis Report¶
(a) Peak Area Trend
Data-driven insight: Peak area trends remain statistically controlled across instruments, with isolated >±3σ excursions detected over time.
Risk: Potential loss of quantitative accuracy due to intermittent injector, detector, or sample preparation variability, increasing OOS risk.
Actionable recommendation: Initiate targeted deviation review, verify system suitability, and trend instrument performance per ICH Q2/Q14 and ISO 17025.
(b) EWMA Chart
Data-driven insight: EWMA trends show gradual peak area shifts approaching control limits across sequential runs, indicating low-level systematic drift not evident in individual results.
Risk: Early method bias may progress to OOS or compromised quantitation if unaddressed.
Actionable recommendation: Enhance trending review, verify system suitability parameters, and perform proactive instrument maintenance per ICH Q2/Q14 expectations.
(c) CUSUM Chart
Data-driven insight: CUSUM charts demonstrate sustained cumulative deviation from the historical mean across sequential runs, indicating persistent low-magnitude bias not captured by Shewhart limits.
Risk: Undetected systematic drift may compromise long-term method accuracy and lead to delayed OOS findings.
Actionable recommendation: Trigger trend-based investigation, reassess method control strategy, and implement preventive maintenance per ICH Q14 and ISO 17025.
(d) Rolling Statistics
Data-driven insight: Rolling mean and variability metrics show gradual increases over time, indicating emerging instability in quantitative response across sequential runs.
Risk: Progressive loss of method precision and robustness, increasing likelihood of OOS or invalid trend conclusions.
Actionable recommendation: Strengthen ongoing performance monitoring, review maintenance and consumables, and reassess control limits per ICH Q2/Q14 and ISO 17025 expectations.
(e) Parity Plot
Data-driven insight: Parity plots show strong linear agreement with the 1:1 line across instruments, with minor dispersion at higher concentrations, indicating generally acceptable calibration accuracy.
Risk: Emerging bias at range extremes may impact quantitation accuracy and reportable results.
Actionable recommendation: Review calibration model fit and range suitability, verify r² and recovery against acceptance criteria, and update calibration strategy per ICH Q2/Q14 and ISO 17025.
(f) Accuracy Heatmap
Data-driven insight: Mean percent recovery remains largely centered around 100% across instruments and months, with localized periods approaching acceptance limits, indicating emerging temporal variability.
Risk: Sustained recovery drift may compromise method accuracy and reportable results, increasing OOS risk.
Actionable recommendation: Implement enhanced accuracy trending, review calibration and sample preparation practices, and initiate preventive CAPA per ICH Q2/Q14 and ISO 17025.
(g) Response Factor Stability
Data-driven insight: Response factor trends remain largely stable, with intermittent approaches to ±2σ Westgard limits across instruments, indicating early-stage analytical variability.
Risk: Continued drift may degrade calibration integrity and quantitative accuracy, increasing OOS risk.
Actionable recommendation: Apply Westgard rule evaluation, review calibration preparation and detector performance, and initiate preventive maintenance and trending per FDA, ICH Q2/Q14, and ISO 17025.
(h) Calibration Linearity (R²)
Data-driven insight: Calibration linearity (R²) meets predefined acceptance criteria across instruments, demonstrating adequate method linearity and model fit.
Risk: Marginal proximity to the acceptance threshold may reduce sensitivity to emerging non-linearity over time.
Actionable recommendation: Maintain periodic linearity verification, expand calibration range review, and trend R² results per ICH Q2/Q14 and ISO 17025 to ensure sustained method validity.
(i) Precision (RSD Distribution)
Data-driven insight: %RSD distributions are predominantly within typical ≤5% precision criteria across instruments, with occasional higher values indicating sporadic variability.
Risk: Intermittent precision failures may compromise method repeatability and confidence in reported results.
Actionable recommendation: Investigate high-%RSD events, assess replicate handling and instrument performance, and reinforce system suitability and precision monitoring per ICH Q2/Q14 and ISO 17025.
(j) Accuracy Recovery
Data-driven insight: %Recovery distributions are centered near 100% across instruments, with occasional values approaching acceptance limits, indicating generally acceptable accuracy with minor variability.
Risk: Outlier recoveries may signal calibration drift or sample preparation bias, potentially impacting reportable results.
Actionable recommendation: Review accuracy outliers, verify calibration integrity, and reinforce routine accuracy trending and system suitability checks per ICH Q2/Q14 and ISO 17025.
(k) Instrument Comparison
Data-driven insight: Measured concentration distributions are comparable across instruments, with modest inter-instrument spread, indicating generally consistent performance.
Risk: Systematic inter-instrument bias may affect data comparability and trending across platforms.
Actionable recommendation: Perform cross-instrument equivalency assessment, review calibration and standardization practices, and document instrument comparability per FDA and ISO 17025 requirements.
(l) Resolution Trend
Data-driven insight: Resolution trends remain above the minimum acceptance criterion (Rs ≥ 1.5) across instruments, with localized variability observed in rolling statistics.
Risk: Progressive resolution decline may impair peak separation, increasing mis-identification and quantitation errors.
Actionable recommendation: Intensify system suitability trending, assess column condition and mobile phase performance, and initiate preventive maintenance or method adjustment per ICH Q2/Q14 and ISO 17025.
(m) Cpk Analysis
Data-driven insight: Capability indices (Cpk) for resolution, tailing, plates, and retention time are generally ≥1.33, indicating adequate and stable instrument performance within defined system suitability limits.
Risk: Marginal Cpk values may reduce process robustness, increasing sensitivity to routine variability.
Actionable recommendation: Trend Cpk metrics routinely, tighten preventive maintenance and calibration schedules, and reassess suitability limits per FDA, ICH Q2/Q14, and ISO 17025.
(n) Plate Change Impact
Data-driven insight: Plate count trends are largely stable for both HPLC and GC systems, with isolated ≥3σ deviations indicating abrupt efficiency changes.
Risk: Sudden plate count loss may reflect column degradation or system contamination, reducing separation efficiency and method robustness.
Actionable recommendation: Investigate flagged events per deviation procedures, assess column condition and system cleanliness, and reinforce efficiency trending and preventive maintenance in alignment with ICH Q2/Q14 and ISO 17025.
(o) System Suitability Heatmap
Data-driven insight: Monthly system suitability performance is predominantly compliant across instruments, with intermittent metric-specific failures observed by period.
Risk: Recurrent localized failures may indicate emerging instrument or method instability, increasing deviation and OOS risk.
Actionable recommendation: Perform period-based root cause analysis, strengthen trending reviews, and implement targeted CAPA and preventive maintenance per FDA, ICH Q2/Q14, and ISO 17025.
(p) Tailing Factor Trend
Data-driven insight: Tailing factor values remain predominantly within the established acceptance limit (TF ≤ 2), with occasional upward excursions observed across instruments.
Risk: Increased tailing may impair peak integration accuracy and quantitation reliability.
Actionable recommendation: Investigate excursions via deviation procedures, evaluate column health and system cleanliness, and reinforce routine tailing trending and preventive maintenance per ICH Q2/Q14 and ISO 17025.
Final Takeaway¶
Analytical evaluation of HPLC and GC instruments demonstrates overall compliance with system suitability and QC criteria (RSD ≤5%, %Recovery 95–105%, resolution ≥1.5, tailing ≤2, plates ≥2000). Minor trends in retention time, plate counts, and tailing suggest early-stage column or system variability. Proactive maintenance, targeted trending, and CAPA review are recommended to sustain method robustness, prevent OOS events, and ensure ongoing GxP/regulatory compliance.
This project therefore demonstrated how laboratory QC data can evolve from static records into proactive quality intelligence using accessible analytics tools. It bridges analytical chemistry and data analytics, showcasing a skill set directly relevant to modern, data-driven scientific and industrial environments.
Expected Impact¶
- Earlier detection of analytical drift
- Reduced risk of invalid results
- Improved regulatory defensibility
- Faster QC decision-making
- Demonstrates scalable analytics maturity
Deliverables¶
| Deliverable | Description |
|---|---|
| Cleaned Dataset | Excel-validated QC data |
| SQLite Database | Fully normalized QC schema |
| Python Notebooks | Reproducible analytics pipeline |
| Tableau Dashboards | Interactive QC monitoring |
| GitHub Documentation | End-to-end project narrative |
Schedule Overview¶
| Phase | Duration |
|---|---|
| Data Simulation & Cleaning | Week 1 |
| SQL Modeling & Seeding | Week 2 |
| Python Analytics | Week 3 |
| Visualization & Reporting | Week 4 |
Estimated Completion: 4 weeks (part-time, revision-inclusive)