ogulcanakca/steel-defect-segmentation-unet-effnet
Pixel-level steel defect segmentation on the Severstal dataset using a U-Net model with an EfficientNetB4 backbone, implemented in TensorFlow/Keras.
Steel Surface Defect Detection and Segmentation Project
Overview
This project was developed to automatically detect and segment surface defects at the pixel level in images taken from Severstal steel production facilities. The project uses the dataset from the "Severstal: Steel Defect Detection" competition published on Kaggle. The goal is to create a deep learning model that accurately identifies and locates 4 different types of defects (Class 1, 2, 3, 4) on steel plates.
The developed solution aims to achieve high segmentation performance by combining the U-Net architecture with a powerful pre-trained backbone like EfficientNetB4. The model achieved a Dice Coefficient score of approximately 0.7172 on the validation set after training. Such a model has the potential to automate industrial quality control processes and increase production efficiency.
-
Note: The model's performance metrics on the test dataset and public/private scores from the Kaggle competition were calculated, but I'll talk about them later in the Readme. The Train code is available here or on Github.
-
Note: In order to submit a submission to the contest, the internet had to be turned off. So I overcame that part by downloading it ready to upload offline and adding it as a dataset. If you want to check it out, you can access it here or on Github.
Dataset
- Source: Severstal: Steel Defect Detection (Kaggle Competition)
- Description: The dataset consists of images taken from steel surfaces and mask information for 4 different defect types in Run-Length Encoded (RLE) format. Each image may have no defects, defects belonging to a single class, or defects belonging to multiple classes.
- Data Exploration: An examination of the
train.csvfile revealed that there are 6666 unique images and a total of 7095 defect records for these images. There is a significant imbalance among defect classes; particularly, Class 3 defects are much more common than others. This imbalance was taken into account during model training and loss function selection.
Approach and Methodology
In this project, a deep learning-based approach was adopted for the image segmentation problem.
1. AI Framework and Libraries Used
- Main Framework: TensorFlow (with Keras API) was used. Keras offers a user-friendly interface that simplifies model development and training processes.
- Segmentation Models: The
segmentation-modelslibrary (developed by Pavel Yakubovskiy) was used to easily integrate popular segmentation architectures like U-Net with various pre-trained backbones. - Auxiliary Libraries:
NumPy: For numerical operations and working with multi-dimensional arrays.Pandas: For data manipulation and reading thetrain.csvfile.OpenCV (cv2): For tasks such as image reading, basic processing, and creating masks from RLE.Matplotlib: For data visualization and presenting results.Scikit-learn: For splitting the dataset into training and validation subsets.tqdm: For displaying progress bars in loops.
2. Data Preprocessing and Preparation
- RLE Decoding: The Run-Length Encoding (RLE) formatted mask information in the
EncodedPixelscolumn of thetrain.csvfile was converted into binary masks for each defect class. This was done using a custom function calledrle_to_mask. This function takes an RLE string and produces a NumPy array mask in the original image dimensions (256x1600). - Image Tiling: Original images are large, such as
256x1600. These dimensions can create challenges for GPU memory and effective model learning. Therefore, each original image and its corresponding multi-channel mask (each channel for a defect class) were divided into 6 sub-patches (tiles) of size256x256. This provided more manageable input sizes for the model and also increased the effective number of training data. - Data Normalization: Image pixels were normalized to the
0-1range for more stable and faster model learning. SteelDataGeneratorClass:- Instead of loading all data into memory, a custom data generator (
SteelDataGenerator) inheriting from Keras'sSequenceclass was created to produce data in batches and on-the-fly during training. - For each image ID, this generator:
- Loads the original image.
- Retrieves all defect records for the respective
ImageIdfromtrain.csv. - Creates separate masks for each defect class (1-4) by decoding RLEs and combines them to form a multi-channel master mask of shape
(original_height, original_width, NUM_CLASSES). - Divides this master image and multi-channel mask into
N_TILES_PER_IMAGE(6) tiles. - Feeds these tiles to the model in batches for training.
- This structure provides memory efficiency when working with large datasets. Additionally, data augmentation techniques can be integrated at this stage (left as optional in this project).
- Instead of loading all data into memory, a custom data generator (
- Train and Validation Set Split: Using unique image IDs, the dataset was split into 85% training and 15% validation. This split is crucial for monitoring the model's performance on data it has not seen during training.
3. Model Architecture and Training
- Model Architecture: U-Net + EfficientNetB4 Backbone
- U-Net was chosen as the base segmentation architecture. U-Net is a model proven successful in biomedical and industrial image segmentation, capable of capturing both semantic information and precise positional details thanks to its encoder (contracting path) and decoder (expanding path) structure and the "skip connections" between them.
- The encoder part of U-Net was replaced with the EfficientNetB4 backbone. EfficientNetB4 has pre-trained weights on the ImageNet dataset (
encoder_weights='imagenet'). This transfer learning approach enables the model to learn more powerful and general features, can shorten training time, and generally leads to better segmentation performance. Thesegmentation-modelslibrary facilitated this integration. - The output layer of the model has a separate channel for each defect class, and the
sigmoidactivation function is used to predict the probability of each pixel belonging to that class.
- Loss Function:
- Dice Loss (
1 - Dice Coefficient) was used. Dice Loss generally performs better than Binary Cross-Entropy in segmentation tasks, especially in cases of class imbalance (as in our dataset), and directly tries to optimize segmentation overlap.
- Dice Loss (
- Optimizer:
- The Adam optimizer was used with an initial learning rate like
learning_rate=1e-4. Adam is a popular choice that performs well in many deep learning problems due to its adaptive learning rates.
- The Adam optimizer was used with an initial learning rate like
- Evaluation Metric:
- Model performance was tracked with the Dice Coefficient, which is also the main metric of the competition. This metric measures the pixel-based overlap between the predicted mask and the true mask.
- Callbacks: The following Keras callbacks were used to manage and improve the training process:
ModelCheckpoint: Saved the weights of only the best-performing model (best_unet_model_with_backbone.keras) by monitoring theval_dice_coefficientmetric at the end of each epoch.ReduceLROnPlateau: Automatically reduced the learning rate when there was no improvement in theval_dice_coefficientmetric for a certainpatienceperiod. This can help the model escape plateaus.EarlyStopping: Stopped training early when there was no improvement in theval_dice_coefficientmetric for a longerpatienceperiod and restored the weights of the best model thanks torestore_best_weights=True. This prevents overfitting and reduces unnecessary training time.
- Training Environment: The model was trained using a Tesla P100 GPU available in Kaggle notebooks.
EPOCHS = 40was set, but thanks to theEarlyStoppingcallback, the model usually stopped earlier (e.g., at the 29th epoch, with the best result obtained at the 19th epoch).
4. Prediction and Submission File Generation
- After training was completed, the best model saved by
ModelCheckpoint(loaded_best_model) was reloaded. - For each test image in the
test_images/folder:- The image was loaded and divided into 6
256x256tiles, as in training. - Each tile was normalized and fed to the model to obtain probability predictions (separate for each class). The
predictfunction was accelerated by feeding all tiles for each image in a batch. - These tile predictions were combined to create a full probability mask of the original
256x1600size. - A threshold value like
THRESHOLD = 0.5was applied to this probability mask to obtain binary segmentation masks. - The binary masks obtained for each class were converted to Run-Length Encoding (RLE) format using a custom function called
mask_to_rle. - The results were exported to a Pandas DataFrame in the Kaggle competition format (
ImageId_ClassId,EncodedPixelscolumns) and saved as a CSV file namedsubmission.csv.
- The image was loaded and divided into 6
Results
The U-Net model (with EfficientNetB4 backbone) developed in this project has shown promising results on the Severstal steel defects dataset. The model achieved a Dice Coefficient score of approximately 0.7172 on the validation set. This score indicates that the model can segment the 4 different defect types with significant accuracy.
A submission was made to Kaggle, and with this type of model, a public score of 0.82716 and a private score of 0.84220 were achieved. The competition winners achieved these scores with solutions converging around 0.90. You can review the inference notebook by clicking here or on Github.
An example of a detected defect is as follows;
In a hypothetical industrial scenario, surface images of steel parts to be inspected are first transferred to a digital environment via a camera system (e.g., integrated into the production line or located at an inspection station). These images are then fed as input to our trained deep learning model; the model analyzes each pixel and produces detailed segmentation masks that identify potential defect regions and which of the four defined types these defects belong to. Thanks to these precise masks output by the model, the exact location, size, and shape of the defects are clearly revealed, making it possible to automatically mark defective parts, separate them from the system, or notify an operator for more detailed manual inspection. This process also allows for the recording of detected defect data for continuous monitoring and improvement of production quality. As it stands, this project can detect defects with approximately 85% accuracy. Furthermore, Comet can be used for tracking when aiming for more advanced model training.
Challenges Encountered
- Class Imbalance: The significant imbalance among defect classes in the dataset could have made it difficult for the model to learn minority classes. The use of Dice Loss helped to somewhat mitigate the effect of this imbalance.
- Large Image Sizes: Directly feeding the original
256x1600sized images to the model was difficult. Dividing the images into256x256tiles solved this problem and also increased the amount of training data. Reassembling these tiles during the prediction phase required an additional step. - RLE Format: The masks being provided in Run-Length Encoding (RLE) format and predictions also being required in this format necessitated the development of custom functions for RLE decoding (
rle_to_mask) and RLE encoding (mask_to_rle). - Resource Constraints: Limited resources such as GPU time and memory in Kaggle kernels required careful selection of parameters like batch size, model complexity, and number of epochs.
Potential Real-World Applications (Industrial/Manufacturing)
The steel surface defect segmentation solution developed in this project can provide significant benefits in various industrial and manufacturing environments:
- Automated Quality Control: It can automate the time-consuming quality control processes performed by the human eye on steel production lines, offering the possibility of 100% inspection. This prevents defective products from reaching customers and raises quality standards.
- Early Defect Detection: Detecting defects in the early stages of the production process reduces scrap, saving materials and energy.
- Process Optimization: By analyzing what types of defects occur, how frequently, and at what stages of production, improvements can be made to production processes. For example, if a recurring defect type originates from a specific machine, it can be understood that the machine needs maintenance or adjustment.
- Other Surface Inspection Tasks: This approach can be adapted not only for steel but also for detecting defects (scratches, stains, cracks, color differences, etc.) on the surfaces of other materials such as textiles, ceramics, glass, wood, and plastics.
- Integration with Robotic Systems: The location information of detected defects can be transferred to robotic systems to automatically sort or mark defective products.
- Predictive Maintenance: If certain defect types or frequencies are associated with potential failures in production equipment, this system can provide data for predictive maintenance strategies.
- Data Collection and Analysis: Automatically collected defect data forms a valuable resource for long-term analysis and statistical process control.
This solution can be an important tool in achieving goals of increasing production efficiency, reducing costs, and ensuring product quality.


