01_HPLC_GC_performance_analysis

HPLC/GC Performance Data Analytics for Quality Control Intelligence¶

Author: Christopher Edozie Sunday
Tools Used: Excel, SQL (SQLite), Python, Jupyter Notebook, Tableau & Power BI Domain: Analytical Chemistry / Data Analytics
Date: 31 st December 2025

Project Description¶

This project demonstrates an end-to-end data analytics project that analyzes data from HPLC and GC instruments using Excel, SQL, Python, Tableau, and Power BI. It demonstrates data cleaning, relational database modeling, statistical QC, anomaly detection, time-series analysis, culminating in interactive dashboards for monitoring calibration performance and instrument health.

Project Overview¶

This project demonstrates analytics workflow that transforms laboratory-generated performance data from HPLC and GC instruments into actionable quality intelligence, enabling early detection of the instrument drift, calibration instability, and process anomalies before they compromise results. It bridges the gap between raw chromatographic outputs and decision-ready insights, allowing scientists, quality professionals, and supervisors to proactively monitor instrument performance, method stability, and analytical reliability.

Project Relevance to Quality Control Scientist and Quality Intelligence Roles¶

This project is designed to reflect real-world laboratory operations in manufacturing quality control, mining assay laboratories, and research facilities where data integrity, reproducibility, and timely interpretation of results are critical. This demonstrates actual responsibilities of QC Scientists, Analytical Chemists, QC Managers, Lab Supervisors, and Data Analysis Teams that works in regulated environments, including:

Trending of chromatographic performance data for compliance monitoring
Identification of early warning signals prior to specification failure
Support of deviation investigations through data-driven evidence
Instrument performance monitoring and method lifecycle management
Translation of raw analytical data into actionable quality intelligence

The workflow mirrors industry practice by integrating Excel-based laboratory data, SQL-driven data structuring, and Python-based statistical analysis to support regulatory-compliant quality decisions.

Scientific & Quality Context of this Project¶

Chromatographic data is foundational to analytical decision-making, yet it is often siloed within instrument software, inconsistently structured, and underutilized for trend-based quality monitoring. Across regulated and non-regulated environments, laboratories face common challenges:

Instrument drift and performance variability
Delayed detection of analytical anomalies
Fragmented data across instruments and runs
Limited visibility into long-term trends

In pharmaceutical and mining laboratories, these issues directly impact compliance, throughput, and risk management. In academic or research settings, they affect data reliability, reproducibility, and research validity. This project addresses these challenges through structured data modeling and statistical analysis.

Quality & Compliance Relevance¶

This project applies data analytics to chromatographic quality control within a regulated laboratory framework. Analytical performance metrics are evaluated using principles consistent with:

ICH Q2(R2): Validation of Analytical Procedures
ICH Q14: Analytical Procedure Development
FDA and EMA expectations for ongoing performance verification
ISO/IEC 17025 requirements for method control and monitoring

Key quality objectives addressed include:

System suitability trending and early detection of performance drift
Instrument equivalency and data comparability
Method precision, accuracy, and robustness assessment
Proactive identification of risks leading to OOS or OOT results

The analytical insights generated support deviation prevention, CAPA prioritization, and data-driven decision-making in QC and analytical development environments.

Primary Project Goal:¶

Enable data-driven quality decisions by converting data from chromatographic instruments into clear, traceable, and regulator-aligned performance insights.

Project Objectives¶

Clean and convert raw HPLC/GC data into structured, analysis-ready datasets
Design a 3NF relational schema suitable for chromatographic quality control data
Implement foreign keys and indexing, and demonstrate SQL joins
Compute accuracy, precision, calibration, and system suitability metrics
Monitor instrument and method performance over time
Detect instrument drift early, anomalies, and out-of-control conditions
Produce audit-ready and reproducible end-to-end analytics workflow in JupyterLab (DDL + queries)
Produce narrative-driven Jupyter Notebooks explaining analytical intent and methodology
Scientific visualization in Python for QC interpretation and decision support
Build interactive Tableau dashboards for performance monitoring
Generate KPI-driven report in Power BI for management and quality review
Bridge analytical chemistry and data analytics

Scope of Work:¶

This project delivers a complete, reproducible analytics pipeline that:

Simulates realistic chromatographic QC datasets
Structures data into a normalized relational database
Computes regulatory-aligned QC metrics
Applies statistical quality control (SQC) and trend analytics
Visualizes results in interactive dashboards for decision-makers

Tools & Technologies¶

Excel → Data simulation, initial data profiling, and validation
SQLite + SQL (DBeaver) → Data modeling, normalization, querying
Python (Pandas, NumPy, Scikit-Learn) → QC metrics, SQC, trend analysis
Tableau → Interactive dashboards for QC monitoring and reporting
Power BI → KPI-driven reporting for management and quality review

This report will be divided into sections corresponding to phases of the data analysis life cycle - Ask, Prepare, Process, Analyse, Share, and Act.

ASK PHASE — Defining the Problem¶

Problem Statement

Laboratory QC data is often:

Locked inside vendor software
Reviewed manually and retrospectively
Poorly structured for trend analysis

This limits early detection of:

Calibration drift
Instrument instability
Method performance degradation

Key Business Questions Addressed

Are HPLC/GC instruments operating consistently over time?
Are calibration models stable and linear?
Can early warning signals detect drift before failure?
How can QC data be summarized clearly for decision-makers?

PREPARE PHASE — Data Sources & Rationale¶

Data Source Description:

Because real chromatographic QC data is often proprietary, this project uses a Excel-simulated but analytically realistic dataset designed to closely reflectreal HPLC/GC laboratory outputs while ensuring reproducibility and avoiding proprietary or confidential data exposure. Key variables include:

| Variables           | Purpose                                             |
|---------------------|-----------------------------------------------------|
| Sample_ID           | Unique identifier for each QC injection             |
| Instrument_ID       | Differentiates HPLC vs GC systems                   |
| RetentionTime_min   | Indicates chromatographic stability                 |
| Peak_Area           | Primary quantitative detector response              |
| PeakWidth_min       | Reflects column efficiency and system dispersion    |
| Concentration_mgL   | Calculated analyte concentration                    |
| TrueValue_mgL       | Reference value for accuracy assessment             |
| Run_Date            | Enables time-based trend analysis                   |

The Excel-simulation steps are detailed in notebooks/02_data_simulation.ipynb, and the raw dataset saved as: hplc_gc_qc_data_raw.xlsx

PROCESS PHASE — Cleaning & Structuring¶

Data Cleaning & Validation (Excel)

Why This Matters:

QC statistics are only meaningful if the underlying data is valid.

Key Checks Performed:

Missing values detection
Outlier screening (not blind removal)
Date standardization for time-series analysis

The Cleaning & Validation (Excel) steps are detailed in notebooks/02_data_simulation.ipynb, and the cleaned dataset saved as: hplc_gc_qc_data_cleaned.xlsx

This file serves as the single source of truth for all downstream analysis.

SQL Data Modeling & Normalization

A fully normalized SQLite schema (3NF) was designed to emulate real laboratory data infrastructure. SQL Data Modeling & Normalization steps in DBeaver are detailed in notebooks/03_SQL_database_relational_schema_creation.ipynb, and the core tables are:

Core Tables:

instruments
samples
sample_metrics
calibrations
system_suitability
control_summary

Why This Matters:

Eliminates redundancy
Enables traceability
Supports mant-to-one relationship
Supports robust and scalable QC analytics

ANALYZE PHASE — Statistical & QC Analytics¶

Python-Driven QC Analytics Using Pandas, NumPy, and Scikit-Learn, the project computes the following:

(a) Sample-Level Metrics

Error (mg/L, %)
Percent recovery
Response factor
Z-score outlier detection
%RSD (precision)

(b) Calibration Analytics

Slope, intercept, R²
Response factor stability
Linearity assessment

(c) Statistical Process Control

Shewhart limits
EWMA charts
CUSUM charts
Rolling mean, std, CV

(d) System Suitability

Plate count
Resolution
Tailing factor

Detailed Python code blocks for these statistical and analytical computations are shown in
notebooks/04_data_importation_key_metrics_computation.ipynb. These analyses align conceptually with USP <621>, USP <1225>, and ICH Q2(R2) expectations.

Key Analysis Categories:

Calibration Trend & Stability
Method Performance (Accuracy & Precision)
QC & Anomaly Detection
Instrument & System Suitability

Detailed steps for each of the Analysis Categories are respectively shown in:

notebooks/05_calibration_trend_analysis.ipynb
notebooks/06_method_performance_analysis.ipynb
notebooks/07_qc_anomaly_analysis.ipynb
notebooks/08_system_suitabilty_analysis.ipynb

Tableau Dashboards

Dashboard 1 — Calibration Performance Overview

Parity plots
Accuracy heatmaps
Response factor trends
R² linearity indicators

Dashboard 2 — Instrument Health Monitoring

Peak area control charts
EWMA & CUSUM charts
Rolling statistics

Dashboard 3 — Method Performance

Precision distributions
Accuracy (%Recovery)
Outlier detection maps

Audience Considerations

Audience	Needs
QC Managers	Stability & compliance signals
Lab Supervisors	Instrument health indicators
Data Teams	Reproducibility & structure
Recruiters	Tool proficiency & clarity

ACT PHASE — Insights & Recommendations¶

Insight-Driven Analysis Report¶

(a) Peak Area Trend
Data-driven insight: Peak area trends remain statistically controlled across instruments, with isolated >±3σ excursions detected over time.
Risk: Potential loss of quantitative accuracy due to intermittent injector, detector, or sample preparation variability, increasing OOS risk.
Actionable recommendation: Initiate targeted deviation review, verify system suitability, and trend instrument performance per ICH Q2/Q14 and ISO 17025.

(b) EWMA Chart Data-driven insight: EWMA trends show gradual peak area shifts approaching control limits across sequential runs, indicating low-level systematic drift not evident in individual results.
Risk: Early method bias may progress to OOS or compromised quantitation if unaddressed.
Actionable recommendation: Enhance trending review, verify system suitability parameters, and perform proactive instrument maintenance per ICH Q2/Q14 expectations.

(c) CUSUM Chart
Data-driven insight: CUSUM charts demonstrate sustained cumulative deviation from the historical mean across sequential runs, indicating persistent low-magnitude bias not captured by Shewhart limits.
Risk: Undetected systematic drift may compromise long-term method accuracy and lead to delayed OOS findings.
Actionable recommendation: Trigger trend-based investigation, reassess method control strategy, and implement preventive maintenance per ICH Q14 and ISO 17025.

(d) Rolling Statistics
Data-driven insight: Rolling mean and variability metrics show gradual increases over time, indicating emerging instability in quantitative response across sequential runs.
Risk: Progressive loss of method precision and robustness, increasing likelihood of OOS or invalid trend conclusions.
Actionable recommendation: Strengthen ongoing performance monitoring, review maintenance and consumables, and reassess control limits per ICH Q2/Q14 and ISO 17025 expectations.

(e) Parity Plot
Data-driven insight: Parity plots show strong linear agreement with the 1:1 line across instruments, with minor dispersion at higher concentrations, indicating generally acceptable calibration accuracy.
Risk: Emerging bias at range extremes may impact quantitation accuracy and reportable results.
Actionable recommendation: Review calibration model fit and range suitability, verify r² and recovery against acceptance criteria, and update calibration strategy per ICH Q2/Q14 and ISO 17025.

(f) Accuracy Heatmap
Data-driven insight: Mean percent recovery remains largely centered around 100% across instruments and months, with localized periods approaching acceptance limits, indicating emerging temporal variability.
Risk: Sustained recovery drift may compromise method accuracy and reportable results, increasing OOS risk.
Actionable recommendation: Implement enhanced accuracy trending, review calibration and sample preparation practices, and initiate preventive CAPA per ICH Q2/Q14 and ISO 17025.

(g) Response Factor Stability
Data-driven insight: Response factor trends remain largely stable, with intermittent approaches to ±2σ Westgard limits across instruments, indicating early-stage analytical variability.
Risk: Continued drift may degrade calibration integrity and quantitative accuracy, increasing OOS risk.
Actionable recommendation: Apply Westgard rule evaluation, review calibration preparation and detector performance, and initiate preventive maintenance and trending per FDA, ICH Q2/Q14, and ISO 17025.

(h) Calibration Linearity (R²)
Data-driven insight: Calibration linearity (R²) meets predefined acceptance criteria across instruments, demonstrating adequate method linearity and model fit.
Risk: Marginal proximity to the acceptance threshold may reduce sensitivity to emerging non-linearity over time.
Actionable recommendation: Maintain periodic linearity verification, expand calibration range review, and trend R² results per ICH Q2/Q14 and ISO 17025 to ensure sustained method validity.

(i) Precision (RSD Distribution)
Data-driven insight: %RSD distributions are predominantly within typical ≤5% precision criteria across instruments, with occasional higher values indicating sporadic variability.
Risk: Intermittent precision failures may compromise method repeatability and confidence in reported results.
Actionable recommendation: Investigate high-%RSD events, assess replicate handling and instrument performance, and reinforce system suitability and precision monitoring per ICH Q2/Q14 and ISO 17025.

(j) Accuracy Recovery
Data-driven insight: %Recovery distributions are centered near 100% across instruments, with occasional values approaching acceptance limits, indicating generally acceptable accuracy with minor variability.
Risk: Outlier recoveries may signal calibration drift or sample preparation bias, potentially impacting reportable results.
Actionable recommendation: Review accuracy outliers, verify calibration integrity, and reinforce routine accuracy trending and system suitability checks per ICH Q2/Q14 and ISO 17025.

(k) Instrument Comparison
Data-driven insight: Measured concentration distributions are comparable across instruments, with modest inter-instrument spread, indicating generally consistent performance.
Risk: Systematic inter-instrument bias may affect data comparability and trending across platforms.
Actionable recommendation: Perform cross-instrument equivalency assessment, review calibration and standardization practices, and document instrument comparability per FDA and ISO 17025 requirements.

(l) Resolution Trend
Data-driven insight: Resolution trends remain above the minimum acceptance criterion (Rs ≥ 1.5) across instruments, with localized variability observed in rolling statistics.
Risk: Progressive resolution decline may impair peak separation, increasing mis-identification and quantitation errors.
Actionable recommendation: Intensify system suitability trending, assess column condition and mobile phase performance, and initiate preventive maintenance or method adjustment per ICH Q2/Q14 and ISO 17025.

(m) Cpk Analysis
Data-driven insight: Capability indices (Cpk) for resolution, tailing, plates, and retention time are generally ≥1.33, indicating adequate and stable instrument performance within defined system suitability limits.
Risk: Marginal Cpk values may reduce process robustness, increasing sensitivity to routine variability.
Actionable recommendation: Trend Cpk metrics routinely, tighten preventive maintenance and calibration schedules, and reassess suitability limits per FDA, ICH Q2/Q14, and ISO 17025.

(n) Plate Change Impact
Data-driven insight: Plate count trends are largely stable for both HPLC and GC systems, with isolated ≥3σ deviations indicating abrupt efficiency changes.
Risk: Sudden plate count loss may reflect column degradation or system contamination, reducing separation efficiency and method robustness.
Actionable recommendation: Investigate flagged events per deviation procedures, assess column condition and system cleanliness, and reinforce efficiency trending and preventive maintenance in alignment with ICH Q2/Q14 and ISO 17025.

(o) System Suitability Heatmap
Data-driven insight: Monthly system suitability performance is predominantly compliant across instruments, with intermittent metric-specific failures observed by period.
Risk: Recurrent localized failures may indicate emerging instrument or method instability, increasing deviation and OOS risk.
Actionable recommendation: Perform period-based root cause analysis, strengthen trending reviews, and implement targeted CAPA and preventive maintenance per FDA, ICH Q2/Q14, and ISO 17025.

(p) Tailing Factor Trend
Data-driven insight: Tailing factor values remain predominantly within the established acceptance limit (TF ≤ 2), with occasional upward excursions observed across instruments.
Risk: Increased tailing may impair peak integration accuracy and quantitation reliability.
Actionable recommendation: Investigate excursions via deviation procedures, evaluate column health and system cleanliness, and reinforce routine tailing trending and preventive maintenance per ICH Q2/Q14 and ISO 17025.

Final Takeaway¶

Analytical evaluation of HPLC and GC instruments demonstrates overall compliance with system suitability and QC criteria (RSD ≤5%, %Recovery 95–105%, resolution ≥1.5, tailing ≤2, plates ≥2000). Minor trends in retention time, plate counts, and tailing suggest early-stage column or system variability. Proactive maintenance, targeted trending, and CAPA review are recommended to sustain method robustness, prevent OOS events, and ensure ongoing GxP/regulatory compliance.

This project therefore demonstrated how laboratory QC data can evolve from static records into proactive quality intelligence using accessible analytics tools. It bridges analytical chemistry and data analytics, showcasing a skill set directly relevant to modern, data-driven scientific and industrial environments.

Expected Impact¶

Earlier detection of analytical drift
Reduced risk of invalid results
Improved regulatory defensibility
Faster QC decision-making
Demonstrates scalable analytics maturity

Deliverables¶

Deliverable	Description
Cleaned Dataset	Excel-validated QC data
SQLite Database	Fully normalized QC schema
Python Notebooks	Reproducible analytics pipeline
Tableau Dashboards	Interactive QC monitoring
GitHub Documentation	End-to-end project narrative

Schedule Overview¶

Phase	Duration
Data Simulation & Cleaning	Week 1
SQL Modeling & Seeding	Week 2
Python Analytics	Week 3
Visualization & Reporting	Week 4

Estimated Completion: 4 weeks (part-time, revision-inclusive)

In [ ]: