serkosi/discrimination-of-the-two-types-of-constraints-in-statistical-properties-of-the-production-data
This work composes Association Networks, Randomness Control, and synthetic data generation with simulation events.
Discriminating Technology-driven and Load-driven Constraints in Production Data
The Challenge
Steel manufacturing is a complex, multi-stage process where events flow sequentially through integrated production lines. Each event (a semi-finished steel product) carries attributes—width, thickness, weight, length—that change as it progresses through different machines. The critical question: Why do certain product attributes cluster together in production sequences?
The answer lies in constraints—invisible rules that govern which products can be processed together and in what order. But constraints are not monolithic. We identified two fundamentally different types operating in parallel:
-
Technology-driven constraints arise from how machines process materials. For example, a hot rolling mill handles one slab at a time and must arrange events in specific order (by width) to avoid equipment wear and maintain quality.
-
Load-driven constraints come from capacity limits. A pickling tank treats multiple slabs simultaneously but is capped at maximum batch volume. The order matters less than staying within capacity.
These constraints leave statistical fingerprints in production data. The challenge: can we formally discriminate between these two constraint types using only the patterns we observe in historical production sequences?
Our Solution: A Three-Step Framework
Step 1: Extract Hidden Patterns via Association Networks
Raw production data contains thousands of events arranged in sequences. Visually identifying which products cluster together is impossible at scale. We solve this through association rule mining:
-
Calculate co-occurrence strength for every pair of product attributes within the same sequences using the Lift metric:
- Lift > 1 means two products occur together more frequently than independent probability would predict
- Lift < 1 means their co-occurrence is less likely from a probability perspective
-
Convert these patterns into networks where nodes represent product types (binned by attributes) and edges represent strong co-occurrences
-
Analyze network topology to uncover whether constraints have created clusters, hierarchies, or random structures
The key insight: Different constraints generate different network patterns. We needed to expose these differences.
Step 2: Operationalize Constraints via Dual Binning Schemes
We developed two complementary ways to group events:
Fixed Step Size (FSS) Binning: Divide the attribute range into equal intervals. For width, create bins of 99mm width (e.g., 800–899mm, 900–999mm). This mirrors technology-driven constraints, which operate uniformly across the product range.
Fixed Bucket Size (FBS) Binning: Create bins with equal event counts. The first 20% of events go in bin 1, next 20% in bin 2, etc. This mirrors load-driven constraints, which care about batch volume regardless of product specifications.
By constructing networks using both schemes and comparing results, we can see whether FSS or FBS better explains the observed clustering—revealing which constraint type dominates.
Step 3: Quantify Constraint Effects via Statistical Validation
Network patterns alone aren't sufficient; we need statistical rigor. We employ:
Modularity Measurement: Quantify how tightly events cluster within groups. Higher modularity indicates strong constraints forcing specific combinations.
Randomness Testing via Null Models: Compare real networks against random networks that preserve degree structure. If the real network shows significantly higher modularity than random versions, we have evidence of true constraints—not accidental clustering.
Z-Score Standardization: Express deviation from randomness in standard units:
- |z| < 1: Network resembles randomness
- 1 ≤ |z| < 2: Borderline structure
- |z| ≥ 2: Significant, non-random constraint-driven patterns
Robustness Testing: Perturb the data (remove and restore 10% ten times) to ensure findings survive variation. Stable results with low error bars confirm genuine constraints; large error bars signal artifacts.
Implementation and Results
Data Foundation
Analyzed 23 years of production data across four integrated steel production lines:
- CCM (Continuous Casting Machine): 347,418 events
- CSP (Compact Strip Production): 205,496 events
- PLTCM (Pickling Line Tandem Cold Mill): 59,604 events
- CGL (Continuous Galvanizing Line): 27,147 events
Data Preparation Pipeline
Before analysis, we cleaned raw data through:
-
Standardization: Converted string data to floats, normalized punctuation, filled nulls consistently
-
Physical validation: Calculated density for every event (valid range: 6.5–8.5 × 10⁻⁶ kg/mm³); discarded outliers
-
Capacity ranges: Applied machine input capacity limits:
- Width: 800–2000 mm
- Weight: 2669–26,690 kg
-
Gap filling: Imputed missing values using density-mass-volume equations and cross-event consistency checks
-
Sequence filtering: Removed sequences with < 50 events (likely test processes)
-
Cross-line data integration: Joined PLTCM and CGL datasets using Material ID and Piece ID as foreign keys to establish referential integrity across production stages; propagated PLTCM input-labeled attributes (input width, input thickness) to corresponding CGL events for downstream analysis
Key Findings
CCM and CSP (Technology-driven lines) — Width dimension:
- FSS networks: consistently modular and hierarchically organized
- FBS networks: non-random, non-modular hierarchical structure (but unstable—FBS modularity breaks down after 10% data perturbation)
- Interpretation: Fixed-interval binning reveals genuine constraint structure; equal-population binning creates artificial patterns that don't persist
CCM and CSP (Technology-driven lines) — Thickness dimension:
- FSS networks: modular and simple (robust)
- FBS networks: non-modular and hierarchical (robust)
- Interpretation: Technology-driven machines show clear separation between features
PLTCM and CGL (Load-driven lines) — Thickness dimension:
- FSS networks: highly modular
- FBS networks: highly modular
- CGL networks: both approach robust (no complex hierarchical structure)
- PLTCM networks: both modular but hierarchical (less robust than CGL)
- Interpretation: Load constraints strongly organize thickness across both binning schemes; CGL shows more stable structure than PLTCM
PLTCM and CGL (Load-driven lines) — Width dimension:
- PLTCM: both FSS and FBS modular but hierarchical with non-robust structure
- CGL: FSS non-modular hierarchical; FBS modular hierarchical (non-robust)
- Interpretation: Width dimension shows sensitivity to binning method in load-driven lines; suggests multiple constraint mechanisms operate simultaneously
Temporal stability: Constraint patterns remain consistent across time windows, confirming they reflect persistent operational rules rather than transient variation
Theoretical Validation via Simulation
To validate findings beyond empirical observation, we developed a constraint-based simulation inspired by metabolic network modeling:
-
Build artificial production networks with embedded constraints using Flux Balance Analysis (Homo sapiens metabolic model: 738 metabolites, 1008 reactions)
-
Systematically vary constraints through two experimental designs:
- Resource Utilisation (RU): RU-1 progressively deletes reactions (50–450 in 50-unit steps); RU-2 limits flux bounds (105–420 reactions)
- Product Portfolio Diversification (PPD): PPD-1 varies objective function coefficients using directional intervals (±[1,4], ±[4,2], ±[2,4]); PPD-2 reduces objective function richness (25%, 50%, 75%)
-
Compare simulated patterns to real production data using identical analysis methods
Key simulation results:
Resource utilisation effects (RU-1): As deleted reactions increase, modularity decreases and NM-d z-scores show significant decline. NM-m z-scores remain near zero, indicating no hierarchical structure breakdown.
Portfolio diversification effects (PPD-1 & PPD-2):
- FSS networks: no modularity change across constraint variations
- FBS networks: modularity increases only for directional (asymmetric) objective functions
- Symmetric production plans (coefficients balanced around zero) show dampening effects regardless of richness reductions
- Interpretation: Production plan directionality matters for load-driven network topology; generic product capability does not significantly constrain network structure
What This Means
This framework enables:
- Constraint identification: Use network patterns to detect which constraint types are active in the production system of interest
- Process diagnostics: Understand how constraints shape observed production behavior and identify which attributes are most strongly affected
- Theoretical grounding: Validate constraint discrimination through constraint-based simulation with controlled perturbations
Tools and Data
- SQL extraction queries for pulling events from enterprise databases (supporting incremental updates across 23-year periods)
- Binning algorithms implementing FSS and FBS discretization at configurable resolutions
- Network construction via association rules and adjacency matrices
- Statistical analysis suite including modularity computation, null model randomization, z-score calculation, and error estimation
- Simulation framework integrating constraint-based optimization with production-compatible data generation
All methods support large-scale datasets (300,000+ events) with modular, reproducible implementation.
Status
The project is currently in active development including the conversion of its programming language from Wolfram Mathematica to Python.
Project's ground is based on my master thesis report which is available as a PDF file: report/paper/main.pdf.
This document provides a comprehensive overview of the project, including background, methods, results, and conclusions.
Focus
Statistical data analysis, constraint discrimination, network topology characterization