Steel
Steel
Steel
CERTIFICATE
This is to certify that the following student of Gayatri Vidya Parishad College of
Engineering (Autonomous), Visakhapatnam, is engaged in the project work titled
PREDICTING CRITICAL PARAMETERS ON BLAST FURNACE AND
PREDICTING CO/CO2 RATIO USING MACHINE LEARNING from 6th May
2024 to 1st June 2024.
Place: Visakhapatnam
DECLARATION
I am the student under the guidance of Mrs Sriya Basumallik (Manager), IT & ERP
department, Visakhapatnam, steel plant, hereby declare that the project entitled
“PREDICTING CRITICAL PARAMETERS ON BLAST FURNACE AND
PREDICTING CO/CO2 RATIO USING MACHINE LEARNING” is an
original work done at Rashtriya Ispat Nigam Limited (RINL) Visakhapatnam Steel
Plant, submitted in partial fulfillment of the requirements for the award of
Industrial Training project work. I assure you that this project is not submitted in
any university or college.
Place: VISAKHAPATNAM
ACKNOWLEDGEMENT
We would like to express our deep gratitude for the valuable guidance of our
guide, Mrs Sriya Basumallik, ma’am(Manager), IT and ERP department,
Visakhapatnam, Steel plant for all her guidance, help and ongoing support
throughout the course of this work by explaining basic concepts of the yard
management system and their functioning along with industry requirements. We
are immensely grateful to you, madam. Without their inspiration and valuable
support, our training would never have taken off.
We sincerely thank the Training and Development Center (T&DC) for their
guidance during safety training and encouragement in the successful completion of
our training.
YASWANTH ARREPU
B. SAI SHASHANK
JONNA KOUSHIK
ABSTRACT
This internship project aims to apply machine learning techniques to predict
critical parameters in blast furnace operations, with a specific focus on the CO/CO₂
ratio. The CO/CO₂ ratio is a vital indicator of the efficiency and environmental
impact of the blast furnace process. Due to the complexity and interdependence of
variables in blast furnace operations, traditional modeling approaches often fall
short in accuracy and reliability. This project seeks to overcome these challenges
by utilizing advanced machine learning algorithms.
Throughout the internship, data from various sensors and control systems within
the blast furnace will be collected and preprocessed to create a robust dataset.
Multiple machine learning models, including regression algorithms and neural
networks, will be developed and validated using this dataset. Feature selection
techniques will be employed to identify the most influential parameters affecting
the CO/CO₂ ratio, enabling the creation of a more targeted and efficient predictive
model.
Visakhapatnam Steel Plant (VSP) is the integrated steel plant of Rashtriya Ispat Nigam Limited
in Visakhapatnam, founded in 1971. VSP strikes everyone with a tremendous sense of awe,
wonder, and amazement as it presents a wide array of excellence in all its facets in scenic beauty,
technology, human resources, management, and product quality. On the coast of the Bay of
Bengal and by the side of scenic Gangavaram beach has tall and massive structures of
technological architecture, the Visakhapatnam Steel Plant. Nevertheless, the vistas of excellence
do not rest with the inherent beauty of location over the sophistication of technology-they march
ahead, parading one aspect after another.
The decision of the Government of India to set up an integrated steel plant at Visakhapatnam was
announced by then Prime Minister Smt. Indira Gandhi in Parliament on 17 January 1971. VSP is
the first coastal- based integrated steel plant in India, 16 km west of the city of destiny,
Visakhapatnam, bestowed with modern technologies; VSP has an installed capacity of 3 million
tons per annum of liquid steel and 2.656 million tons of saleable steel. The saleable steel here is
in the form of wire rod coils, Structural, Special Steel, Rebar, Forged Rounds, etc. At VSP, there
lies emphasis on total automation, seamless integration, and efficient up-gradation. This results
in a wide range of long and structural products to meet stringent demands of discerning
customers in India & abroad; SP product meets exalting international Quality Standards such as
JIS, DIN, BIS, BS, etc.
VSP has become the first integrated steel plant in the country to be certified to all the three
international standards for quality (ISO -9001), Environment Management (ISO-14001) &
Occupational Health & Safety (OHSAS—18001). The certificate covers quality systems of all
operational, maintenance, and service units besides purchase systems, Training, and Marketing
functions spreading over 4 Regional Marketing offices, 20 branch offices & 22 stockyards
located all over the country. VSP, by successfully installing & operating Rs.460 crores worth of
pollution and Environment Control Equipment and converting the barren landscape more than 3
million plants have made the steel plant, steel Township, and VP export Quality Pig Iron & Steel
products Sri Lanka, Myanmar, Nepal, Middle East, USA. & South East Asia (Pig iron). RINL—
VSP was awarded ―Star Trading HOUSE‖ status during 1997-2000. Having established a fairly
dependable export market, VP Plans to make a continuous presence in the export market.
Different sections at the RINL VSP:
● Sinter plant
● Blast Furnace
● Rolling mills
○ A blast furnace is a large, vertical cylindrical structure lined with refractory materials to
withstand high temperatures. It has a series of layers, each playing a specific role in the smelting
process.
2. Raw Materials:
○ Coke: A form of carbon derived from coal, used as both a fuel and a reducing agent.
○ Hot Blast: Preheated air injected into the furnace to aid combustion and maintain high
temperatures.
3. Process Stages:
○ Charging: Raw materials are added from the top of the furnace in alternating layers.
○ Reduction Zone: As the materials descend, coke burns with the hot blast, producing
carbon monoxide (CO) which reduces the iron ore to molten iron.
○ Smelting Zone: At higher temperatures, the iron melts and collects at the bottom of the
furnace (hearth).
○ Slag Formation: Limestone reacts with impurities to form slag, which floats on the
molten iron and can be removed.
○ Tapping: Molten iron and slag are periodically tapped from the furnace for further
processing.
4. Gas Flow:
○ The upward flow of gasses (primarily CO, CO₂, and N₂) generated from the combustion
of coke and reduction reactions plays a critical role in the efficiency of the process.
● Pressure: Ensures the proper flow of gases and materials. Measurements like
CB_PRESS, O2_PRESS, and TOP_PRESS are vital for operational stability.
● Gas Composition: Parameters such as CO, CO₂, and O₂ content provide insights into
combustion efficiency and environmental impact.
● Auxiliary Conditions: Factors like atmospheric humidity (ATM_HUMID) and PCI rate
(Pulverized Coal Injection) influence the overall efficiency and need to be finely controlled.
By leveraging data from sensors and control systems, machine learning models can provide
deeper insights into the blast furnace operations, predicting critical outcomes and allowing for
proactive adjustments. This project, for example, aims to utilize such techniques to predict the
CO/CO₂ ratio and optimize other operational parameters, paving the way for more sustainable
and efficient iron production.
Analyzing the dataset
This project focuses on enhancing the efficiency and sustainability of blast furnace operations
through the application of machine learning techniques. The primary goal is to predict critical
parameters, particularly the CO/CO₂ ratio, which is a key indicator of furnace performance and
environmental impact. The project involves analyzing a dataset containing various operational
parameters collected from the blast furnace, which will be used to develop robust predictive
models.
● CB_FLOW (Coke Breeze Flow): The flow rate of coke breeze, which influences the
combustion process.
● CB_PRESS (Coke Breeze Pressure): The pressure of the coke breeze, impacting its
distribution and combustion efficiency.
● CB_TEMP (Coke Breeze Temperature): The temperature of the coke breeze, which
affects its reactivity and combustion.
● STEAM_FLOW (Steam Flow): The flow rate of steam, used to control the temperature
and reactions within the furnace.
● STEAM_PRESS (Steam Pressure): The pressure of the steam, affecting its distribution
and efficiency.
● O2_FLOW (Oxygen Flow): The flow rate of oxygen, directly impacting the combustion
process.
● O2_PER (Oxygen Percentage): The percentage of oxygen in the gas mixture, crucial
for maintaining optimal combustion.
● PCI (Pulverized Coal Injection): The rate at which pulverized coal is injected, affecting
the fuel-to-air ratio.
● HB_PRESS (Hearth Bottom Pressure): The pressure at the bottom of the hearth,
related to the internal pressure dynamics.
● TOP_PRESS (Top Pressure): The pressure at the top of the furnace, affecting gas flow
and reactions.
● TOP_SPRAY (Top Spray): The application rate of water spray at the top of the furnace,
used for cooling and controlling reactions.
By analyzing these parameters using machine learning algorithms, the project aims to develop
predictive models that can accurately forecast the CO/CO₂ ratio and other critical parameters.
This will facilitate better decision-making, improve operational efficiency.
Overview of implementation of the solution in the view of
machine learning
Data cleaning is an essential step to ensure the dataset's quality and integrity before performing
any analysis or modeling. In this project, the data cleaning process involved the following steps,
utilizing key Python libraries such as pandas, numpy, and matplotlib:
- Original Dataset:The initial dataset consisted of sensor readings recorded every 10 minutes
over a five-month period.
- Resampling to Hourly Intervals: Using the pandas library, the data was remodeled to hourly
intervals. This transformation involved aggregating the 10-minute readings into one-hour periods
using the `resample` function in pandas. This function allowed for the calculation of averages,
sums, or other relevant statistics to ensure that the transformed dataset accurately represented
hourly trends.
-Identification of Missing Data: The dataset was examined for any missing or incomplete data
points using the `isnull` and `sum` functions in pandas. These functions helped in identifying the
extent of missing data across different columns.
- Imputation or Removal: Depending on the extent and pattern of missing data, appropriate
strategies were employed. Missing values were filled using the `fillna` or `interpolate` functions,
or rows with excessive missing data were removed using the `dropna` function in pandas.
-Standardization: The numpy library, along with the `StandardScaler` from the
`sklearn.preprocessing` module, was used for standardizing the data. This ensured that all
features contributed equally to the model training process by scaling features to have a mean of
zero and a standard deviation of one.
- Identification of Outliers: Outliers in the data were identified using statistical methods or
visualization techniques. The `boxplot` function in the `matplotlib.pyplot` module was used to
create box plots, which helped in visualizing the spread of the data and detecting any outliers.
- Handling Outliers: Depending on the nature of the outliers, they were either removed or
treated. Treatment could involve capping values or using robust statistical methods to minimize
their impact.
Once the data was cleaned, the next step involved an in-depth analysis to uncover patterns and
relationships within the dataset. The data analysis phase included:
- Data Visualization: Various visualization techniques were employed to explore the data
visually using matplotlib. This included:
- Box Plots: Used to detect outliers and visualize the spread of the data.
- Time Series Plots: Employed to observe trends and patterns over time for the hourly
resampled data.
2. Correlation Analysis:
- Correlation Matrix: A correlation matrix was generated using the `corr` function in pandas to
quantify the linear relationships between different features. This helped in identifying which
features had strong positive or negative correlations with each other.
- Heatmaps: Visual representations of the correlation matrix were created using the `heatmap`
function from the seaborn library, making it easy to identify significant correlations.
3. Feature Engineering:
- Creating New Features: Based on the insights from the exploratory data analysis, new
features were engineered. For instance, to predict the CO/CO2 ratio, four new columns
representing the ratio for the 1st, 2nd, 3rd, and 4th hours were created.
- Temporal Patterns: Analysis was performed to detect temporal patterns, such as daily or
weekly cycles, in the data.
- Anomaly Detection: Any anomalies or unusual patterns were identified and investigated to
ensure they were understood and appropriately handled.
Modules used:
- pandas:
- Description: Pandas is used for data manipulation and analysis with dataframes, akin to
spreadsheets. It handles creating new columns, handling missing values, and reading/writing data
to/from Excel files.
-matplotlib.pyplot:
- Description: This module is part of the Matplotlib library and is used for creating
visualizations, like the correlation matrix plot in this case.
- seaborn:
This code imports necessary libraries (pandas, matplotlib.pyplot, and seaborn) and reads data
from an Excel file located at "/content/bf3_data_2022_01_07.xlsx" into a pandas dataframe
(data). This sets up the environment for further data analysis and visualization.
python
import pandas as pd
import numpy as np
data = pd.read_excel('/content/bf3_data_2022_01_07.xlsx')
data.drop(columns=['SKIN_TEMP_AVG'], inplace=True)
data['DATE_TIME'] = pd.to_datetime(data['DATE_TIME'])
data.dropna(inplace=True)
```
This code retrieves the column names of the dataframe (data), removes any rows with missing
values using `dropna()`, and then prints the data types of each column in the dataframe using
`dtypes`. This process ensures the dataset is clean and ready for analysis.
model.fit(X_train, y_train)
predictions = model.predict(X)
r2 = r2_score(y_test, model.predict(X_test))
```
The code creates a new dataframe (predicted_df) by copying the columns from the original
dataframe (data). It then converts the `DATE_TIME` column to datetime format. Next, it
calculates the correlation matrix (corr) of the original dataframe. Finally, it plots the correlation
matrix using matplotlib, displaying the strength of relationships between different features with a
color-coded heatmap.
X = data.drop(columns=['DATE_TIME'])
X.index = data['DATE_TIME']
y = X.copy()
```
The code creates new columns in new_df for predicting the CO/CO2 ratio at future time intervals
(1, 2, 3, and 4 hours ahead) using the `shift` function. These new columns are `Next_1hr`,
`Next_2hr`, `Next_3hr`, and `Next_4hr`. The modified dataframe is then saved to an Excel file
("data/demo3.xlsx"). Finally, the saved Excel file is read back into a new dataframe (df) and
printed.
```
predicted_df_shift_1['CO_CO2_ratio'] = predicted_df_shift_1['CO'] /
predicted_df_shift_1['CO2']
predicted_df_shift_1.to_csv('predicted_data_after_1_hour.csv', index=False)
print(predicted_df_shift_1)
X_shift_2 = predicted_df_shift_1.drop(columns=['DATE_TIME'])
X_shift_2.index = predicted_df_shift_1['DATE_TIME']
predicted_df_shift_2['CO_CO2_ratio'] = predicted_df_shift_2['CO'] /
predicted_df_shift_2['CO2']
predicted_df_shift_2.to_csv('predicted_data_after_2_hours.csv', index=False)
print(predicted_df_shift_2)
X_shift_3 = predicted_df_shift_2.drop(columns=['DATE_TIME'])
X_shift_3.index = predicted_df_shift_2['DATE_TIME']
predicted_df_shift_3['CO_CO2_ratio'] = predicted_df_shift_3['CO'] /
predicted_df_shift_3['CO2']
predicted_df_shift_3.to_csv('predicted_data_after_3_hours.csv', index=False)
predicted_df_shift_3
X_shift_4 = predicted_df_shift_3.drop(columns=['DATE_TIME'])
X_shift_4
.index = predicted_df_shift_3['DATE_TIME']
predicted_df_shift_4.to_csv('predicted_data_after_4_hours.csv', index=False)
predicted_df_shift_4
```
Model Training:
For this project, the `RandomForestRegressor` from the `sklearn.ensemble` module was chosen
for its robustness and accuracy in handling regression tasks. The training process included:
- Train-Test Split: The dataset was split into training and testing sets using the
`train_test_split` function from `sklearn.model_selection`. Typically, 80% of the data was used
for training and 20% for testing.
- Feature Selection: The relevant features for training were selected, excluding the target
variable. Features with high correlation or importance to the target variable were prioritized.
2. Model Initialization and Training:
- Training: The model was trained using the training set. The `fit` function was used to train
the model on the input features (X_train) and the target variable (y_train).
Model Evaluation:
Evaluating the model’s performance was critical to ensure it could accurately predict future
CO/CO2 ratios. The evaluation process involved:
1. Prediction:
- Making Predictions: The trained model was used to make predictions on both the training
and testing sets using the `predict` function. This generated predicted values for the CO/CO2
ratio.
2. Performance Metrics:
- Mean Squared Error (MSE): Calculated using the `mean_squared_error` function from
`sklearn.metrics`, MSE measures the average squared difference between actual and predicted
values.
- Mean Absolute Error (MAE): Calculated using the `mean_absolute_error` function from
`sklearn.metrics`, MAE measures the average absolute difference between actual and predicted
values.
- R-squared (R2) Score: Calculated using the `r2_score` function from `sklearn.metrics`, the
R2 score measures the proportion of the variance in the target variable that is predictable from
the input features.
- Shifted Predictions: The model was evaluated for different shifts (1, 2, 3, and 4 hours) to
assess its performance over varying prediction horizons. The performance metrics for each shift
were calculated and analyzed.
- Visualizing Predictions: The predicted values were plotted against actual values to visually
assess the model’s accuracy. This helped in identifying any patterns or discrepancies.
1. Correlation and Feature Importance: The correlation matrix and visualizations highlighted
the relationships between different features and the target variable, guiding the feature selection
process.
2. Model Performance: The `RandomForestRegressor` demonstrated good performance in
predicting CO/CO2 ratios, with reasonable MSE, MAE, and R2 scores across different shifts.
3. Future Work: Potential areas for improvement include experimenting with other regression
models, performing hyperparameter tuning, and incorporating additional features to enhance
predictive accuracy.
Overall, the project successfully utilized data cleaning, analysis, and machine learning
techniques to predict future CO/CO2 ratios, providing valuable insights into the behavior of the
system under study.
This detailed approach, leveraging various data manipulation and machine learning techniques,
ensures the robustness and reliability of the model, contributing to accurate predictions and
deeper understanding of the dataset
Conclusion
In conclusion, both the blast furnace prediction system and the Production Management System
highlight the transformative potential of technology in industrial settings. Leveraging Python
modules such as pandas, numpy, matplotlib, and Scikit-learn, the blast furnace prediction system
effectively cleansed data, trained predictive models, and visualized results, ultimately offering
valuable insights for optimizing operations. Similarly, the Production Management System,
employing a combination of frontend technologies like HTML, CSS, JavaScript, and backend
logic in Java and Oracle, significantly enhances efficiency, transparency, and data-driven
decision-making in production processes.
The blast furnace prediction system's utilization of Gradient Boosting and XGBoost ensemble
methods showcases Python's versatility in handling complex data analysis and machine learning
tasks. Furthermore, Matplotlib's inclusion enables dynamic graph generation, facilitating user
comprehension and interaction with prediction outcomes.
Both systems underscore the importance of technology in driving operational excellence and
productivity gains in industrial environments. By harnessing the capabilities of Python and
associated technologies, organizations can make informed decisions, optimize processes, and
achieve tangible business outcomes.