Probability and Statistics CC01 Group 3
Probability and Statistics CC01 Group 3
Probability and Statistics CC01 Group 3
GROUP 03
3D PRINTER DATASET
No. Member ID
TABLE OF CONTENTS
PART 1: DATA INTRODUCTION.............................................................................................4
1. Dataset description...................................................................................................................4
2. Variable description.................................................................................................................5
3. Summary..................................................................................................................................5
PART 2: THEORETICAL BASIS...............................................................................................6
1. Analysis of one-factor variance...............................................................................................6
2. Multiple Linear Regression......................................................................................................9
a. Multiple linear regression.....................................................................................................9
b. Population regression function (PRF)................................................................................11
c. Sample regression function (SRF)......................................................................................12
d. Assumptions of the least squares method for multiple linear regression models..............13
e. Model fit metrics................................................................................................................14
f. Confidence interval and Hypothesis testing........................................................................16
g. Testing the overall significance of the model (Special Case of WALD Test)...................17
PART 3: DATA PREPROCEEDING........................................................................................18
1. Data importing.......................................................................................................................19
2. Data cleaning..........................................................................................................................24
a. Handling missing values.....................................................................................................24
b. Handling outliers and inconsistent data..............................................................................28
c. Data transformation............................................................................................................29
3. Feature Engineering...............................................................................................................30
PART 4: DESCRIPTIVE STATISTICS...................................................................................32
1. Data summary........................................................................................................................32
2. Plot data..................................................................................................................................34
a. Box plot..............................................................................................................................34
b. Correlation coefficients between variables........................................................................41
PART 5: INFERENTIAL STATISTICS...................................................................................44
1. Using Two-way ANOVA to evaluate how qualitative affect output parameters..................44
Key components of the dataset include nine input parameters, such as infill
density, nozzle temperature, and print speed, alongside three output parameters:
roughness, tension strength, and elongation. With 50 samples, the dataset provides
insights into optimizing printing settings for producing high-quality, durable parts.
1. Dataset description
The dataset consists of experimental results from 3D printing tests performed on
the Ultimaker S5 3D printer, with material strength tested using a Sincotec GMBH
tester capable of applying up to 20 kN. Each observation represents a unique 3D-
printed sample.
2. Variable description
Variable Data Type Units Description
Setting parameter
Null hypothesis (H0) assumes that all groups have the same mean (i.e., the
printing parameters do not affect the mechanical properties).
The alternative hypothesis (H1) assumes that at least one of the groups has a
different mean (i.e., the printing parameters do affect the mechanical
properties).
The ANOVA test calculates the F-statistic which compares the variance between
the groups to the variance within the groups. If the F-statistic is large enough, it
suggests that there is a significant difference between the means, and the null
hypothesis is rejected.
The test statistic used in ANOVA is the F-statistic, which is calculated as:
Between−group variance
F=
Within−group variance
Where:
For the 3D Printer dataset, Single-Factor ANOVA can examine how a single factor,
such as infill_density, influences tension_strength or elongation. For example:
Hypothesis:
H₀: Means of the groups (low, medium, high infill density) are equal.
Steps:
1. Calculate the F-statistic: Using the formula mentioned earlier, we calculate the
F-statistic to compare the between-group and within-group variances.
2. Determine the p-value: The p-value helps us assess the strength of the
evidence against the null hypothesis. A p-value smaller than 0.05 indicates that
there is a statistically significant difference in means.
After performing the ANOVA, if the F-statistic is significantly large and the p-
value is below 0.05, we can conclude that the infill density does influence the
tension strength of the printed material.
Y=β0+β1x1+β2x2+⋯+βkxk+ϵ
Where:
x1, x2, x3,…xk are the independent variables (e.g., infill_density, wall_thickness,
material, etc.).
β0 is the intercept term, which represents the value of Y when all x i variables are
equal to zero.
β1,β2,…,βk are the coefficients that represent the effect of each independent
variable on the dependent variable.
The model estimates the β coefficients, which indicate how much the dependent
variable changes for a one-unit change in each independent variable, holding the
Probability and statistics – CC01
9
10
other variables constant. The regression model is fitted to the data using methods
like Ordinary Least Squares (OLS) to minimize the sum of squared errors between
the observed values and the predicted values.
The method of linear regression involves fitting a linear equation to observed data.
This equation represents the relationship between a dependent variable (Y) and one
or more independent variables (X). Once the coefficients are estimated, the linear
regression model can be used to make predictions for the dependent variable based
on new values of the independent variables. The method is widely used in various
fields for prediction, inference, and understanding the relationship between
variables.
Y=β0+β1x1+β2x2+⋯+βkxk
Where:
x1, x2, x3,…xk represent the various printing parameters (e.g., infill_density,
wall_thickness, print_speed, etc.).
The PRF is typically unknown because we usually work with a sample of the
population. It assumes that the relationship between the dependent and independent
variables holds true across the population. This is the ideal model, and the goal of
regression analysis is to estimate the parameters β 0, β1, β2, βk as accurately as
possible using sample data.
Where:
^
β0, ^
β 1 ,… ^
β k : the estimated coefficients.
Where:
X is the matrix of independent variables (including a column of 1s for the
intercept).
Y is the vector of observed dependent variable values.
X T is the transpose of the X matrix.
(X T X )−1 is the inverse of the product of X T and X.
The SRF is derived using statistical techniques such as Ordinary Least Squares
(OLS), which minimizes the sum of squared residuals (the difference between the
observed values and the predicted values) across the sample.
In the case of the 3D Printer dataset, we may use multiple linear regression to
predict tension_strength based on various independent variables such as
infill_density, wall_thickness, material, and print_speed. The SRF might be:
If any of these assumptions are violated, the results from the least squares method
may be biased or misleading, requiring corrective actions like transforming the
variables, applying robust regression methods, or using generalized least squares.
This metric gives a more accurate measure of the model's explanatory power
when multiple predictors are included.
3. F-statistic: The F-statistic tests whether at least one of the predictors in the
model has a non-zero coefficient. A high F-statistic (with a corresponding low
p-value) suggests that the model fits the data well and that the independent
variables have explanatory power.
4. Residual Plots: Graphical plots of the residuals can also help assess the model
fit. Ideally, the residuals should be randomly scattered around zero with no
discernible pattern, indicating a good fit. If residuals show a clear pattern, the
model may not be appropriately specified.
βi±tα/2×SE(βi)
Where:
tα/2 is the critical value from the t-distribution for the desired confidence level,
Null hypothesis (H₀): βᵢ = 0 (No relationship between the predictor and the
dependent variable)
Alternative hypothesis (H₁): βᵢ ≠ 0 (A relationship exists between the
predictor and the dependent variable)
A large absolute value of t suggests that the predictor is significantly related to the
dependent variable. The corresponding p-value helps us determine whether to
reject or fail to reject the null hypothesis. If the p-value is smaller than the
significance level (e.g., 0.05), we reject the null hypothesis, indicating that the
predictor has a significant impact on the dependent variable.
H0 : β1 = β2 = ⋯ = βk = 0
H1 : At least one βk ≠ 0
Where β1, β2, ⋯ , βk are the regression coefficients. The alternative hypothesis is
that at least one of the coefficients is not zero.
W = ^β (X X ) ^β
T −1
Where:
T
(X X )
−1
is the variance-covariance matrix of the estimated coefficients.
The Wald test statistic follows a chi-square distribution with degrees of freedom
equal to the number of coefficients being tested. If the test statistic is large enough
(i.e., the p-value is small), we reject the null hypothesis, indicating that the model
has at least one predictor that significantly explains the variation in the dependent
variable.
In the context of the 3D Printer dataset, the Wald test can be applied to test whether
the predictors such as infill_density, wall_thickness, and print_speed together
contribute to explaining the mechanical properties (e.g., tension_strength) of the
printed parts. A significant result from the Wald test suggests that the regression
model is meaningful and the predictors are important for understanding the
material properties in the context of 3D printing.
Data preprocessing is an essential part of data analysis, as it ensures that the dataset
is clean, accurate, and ready for further analysis and modeling. In this section, we
will cover how to import, clean, handle missing values, and prepare the data for
analysis in R.
1. Data importing
The first step in any data analysis workflow is to import the dataset into R. Below
is the process for importing the dataset into R using the read.csv() function, after
installing and loading the necessary R packages.
r
# Install necessary packages
install.packages("dplyr")
install.packages("tidyverse")
install.packages("ggpubr")
install.packages("corrplot")
Probability and statistics – CC01
18
19
# Load libraries
library(dplyr)
library(ggplot2)
library(ggpubr)
library(corrplot)
# Create sample data for the 3D printing project
data <- data.frame(
infill_density = c(20, 30, 40, 50, 60),
print_speed = c(60, 70, 80, 90, 100),
nozzle_temperature = c(200, 210, 220, 230, 240),
tension_strength = c(40, 42, 44, 45, 46),
elongation = c(10, 11, 12, 13, 14) )
# Save the data to a CSV file
write.csv(data, "3dprinter.csv", row.names = FALSE)
# Display the current working directory
print("Current working directory:")
print(getwd())
# from google.colab import files
# uploaded = files.upload()
# Read the data from the CSV file
data <- read.csv("3dprinter.csv", header = TRUE)
# Display the first 5 rows of the dataset
head(data)
# Check the structure of the dataset
print("Dataset structure:")
str(data)
# Summarize the dataset
print("Dataset summary:")
summary(data)
# Check the correlation between numeric variables
correlation_matrix <- cor(data[, sapply(data, is.numeric)])
corrplot(correlation_matrix, method = "circle")
2. Data cleaning
Data cleaning is a crucial step in the preprocessing pipeline. This involves handling
missing values, removing duplicates, and ensuring that all variables are correctly
formatted.
In R, the is.na() function helps identify missing values. Once identified, you can
either remove or impute these missing values. For example, you can fill missing
values with the mean or median for numerical columns.
r
# Install necessary packages
required_packages <- c("dplyr", "tidyr", "caret", "writexl")
install.packages(setdiff(required_packages,
rownames(installed.packages())))
# Load libraries
library(dplyr) # For data manipulation
library(tidyr) # For handling missing data
library(caret) # For scaling and encoding categorical variables
library(writexl) # For exporting to Excel
# Step 1: Generate a small dataset (5 rows, 8 columns)
set.seed(123) # Ensures reproducibility
raw_data <- data.frame(
tension_strength = runif(5, 10, 50), # Simulated tension strength
elongation = runif(5, 5, 15), # Simulated elongation
roughness = runif(5, 0.5, 2.0), # Simulated surface roughness
infill_density = runif(5, 10, 90), # Infill density percentage
wall_thickness = runif(5, 0.5, 2.0), # Wall thickness in mm
nozzle_temperature = sample(180:240, 5, replace = TRUE), # Nozzle
temperature
print_speed = sample(20:100, 5, replace = TRUE), # Print speed
in mm/s
Outliers can distort statistical analyses, so it’s essential to identify and handle them.
One way to identify outliers is through box plots or Z-scores. Values with Z-scores
greater than 3 or less than -3 are typically considered outliers.
r
# Install and load necessary libraries
install.packages("dplyr")
install.packages("ggplot2")
library(dplyr)
library(ggplot2)
# Display working directory
print("Current Working Directory:")
print(getwd())
# Upload a CSV file in Google Colab
from google.colab import files
uploaded <- files.upload()
Probability and statistics – CC01
27
28
If outliers are valid observations, you may decide to keep them or cap them at
reasonable limits.
c. Data transformation
Data transformation is another important step, especially when dealing with
categorical variables or ensuring that continuous variables are on the same scale.
r
# Install and load necessary libraries
install.packages("dplyr")
library(dplyr)
# Display working directory
3. Feature engineering
Feature engineering involves creating new variables or transforming existing ones
to improve the analysis and modeling. # Install and load necessary libraries
r
install.packages("dplyr")
library(dplyr)
# Step 1: Create a sample dataset
Probability and statistics – CC01
29
30
# Sample dataset
data <- data.frame(
layer_height = c(0.1, 0.2, 0.15, 0.25, 0.3),
wall_thickness = c(0.8, 1.0, 0.9, 1.1, 1.2),
print_speed = c(50, 60, 55, 70, 65),
nozzle_temperature = c(200, 210, 205, 220, 215),
material = c(0, 1, 0, 1, 0), # 0 = abs, 1 = pla
infill_density = c(20, 30, 25, 35, 40),
bed_temperature = c(60, 65, 70, 75, 80),
tension_strength = c(250, 300, 270, 320, 310) )
# Descriptive statistics for quantitative variables
mean_val <- apply(data[,c(1,2,3,5,6,7,8)], 2, mean)
median_val <- apply(data[,c(1,2,3,5,6,7,8)], 2, median)
sd_val <- apply(data[,c(1,2,3,5,6,7,8)], 2, sd)
Q1_val <- apply(data[,c(1,2,3,5,6,7,8)], 2, quantile, probs = 0.25)
Q3_val <- apply(data[,c(1,2,3,5,6,7,8)], 2, quantile, probs = 0.75)
min_val <- apply(data[,c(1,2,3,5,6,7,8)], 2, min)
max_val <- apply(data[,c(1,2,3,5,6,7,8)], 2, max)
# Combine the results into a data frame
summary_stats <- data.frame(mean = mean_val,
median = median_val,
sd = sd_val,
Q1 = Q1_val,
Q3 = Q3_val,
min = min_val,
max = max_val)
# Print the summary statistics
print("Descriptive statistics for quantitative variables:")
print(summary_stats)
2. Plot data
a. Box plot
In addition to descriptive analysis, we draw a Boxplot graghs to have a better
visualization about the distribution of Roughness, Tension_strenght, Elongation
according to Material and Infill_pattern.
r
# Load necessary library
library(ggplot2)
# Create a sample data frame
my_data <- data.frame(
Material = c("ABS", "PLA", "ABS", "PLA", "ABS", "PLA", "ABS", "PLA"),
library(ggplot2)
library(reshape2)
# Create a sample data frame (add your actual data if needed)
my_data <- data.frame(
Material = c("ABS", "PLA", "ABS", "PLA", "ABS", "PLA", "ABS", "PLA"),
Infill_pattern = c("grid", "honeycomb", "grid", "honeycomb", "grid",
"honeycomb", "grid", "honeycomb"),
Roughness = c(92, 88, 200, 145, 289, 192, 368, 321),
Tension_strength = c(16, 19, 21, 25, 37, 27, 37, 34),
Elongation = c(1.2, 1.5, 1.3, 1.8, 1.6, 2.3, 3.3, 3.2) )
# Check if my_data is a valid data frame
class(my_data) # Should return "data.frame"
# Compute the correlation matrix for numeric variables
cor_matrix <- cor(my_data[, c("Roughness", "Tension_strength",
"Elongation")])
# Melt the correlation matrix for ggplot2
cor_matrix_melted <- melt(cor_matrix)
# Generate a heatmap for the correlation matrix
ggplot(cor_matrix_melted, aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0) +
theme_minimal() +
labs(title = "Correlation Matrix Heatmap", x = "Variables", y =
"Variables") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
1 2 3 … K
… … … … … …
H x1 H x2H x3 H … x KH
SSG reflects the variability of the quantitative factor of the outcome under
study due to the influence of the first causal factor, the factor used for grouping
in the column.
H
The columns sum of squares (Blocks): SB=∑ ¿ ¿
j=1
SSG reflects the variability of the quantitative factor of the outcome under
study due to the influence of the second causal factor, the factor used for
grouping in the row.
H K
¿ S ST −SSG−SSB
The total sum of squares: SST =SSG+ SSB+ SSE∨ST
H K
¿∑ ∑ ¿¿¿
j=1 i=1
MSB
F 2=
MSE
*There are two cases in the decision to reject the hypothesis H0 of ANOVA two
element
For F1 at significance level α, the hypothesis H 0 holds that the average of the
population
For F2 at significance level α, the hypothesis H0 holds that the average of the
population H according to the second causal factor (row) equality is rejected
when:
Where:
F 1> F (K−1 ,(K −1)(H −1), α ) is the lookup value in the F distribution table with K-1
By conducting a two-way ANOVA, researchers can gain insights into how two
*Result of testing
Sources of
Degree of
variation Sum of squares Mean squares F ratio
freedom
1. Nomality test
We opt for the QQ-plot as a method to confirm normality, applying it to the
residuals of data and choose Shapiro-Wilk test to verify the normality.
The QQ-plot diagram illustrates that the majority of the observed values lie on the
expected straight line of the normal distribution, so the Roughness,
Tension_Strength, Elongation variable follows a normal distribution. Additionally,
we can use the Shapiro-Wilk test function to test:
Figure 5.1. R code for variable declaration, QQ-plot and Shapiro-Wilk test r
# Load necessary libraries
library(car) # For Levene's test
library(ggplot2) # For visualization
library(dplyr) # For data manipulation
# Declare variables
infill_pattern <- as.factor(data$infill_pattern)
bed_temperature <- as.factor(data$bed_temperature)
material <- as.factor(data$material)
roughness <- data$roughness
tension_strenght <- data$tension_strenght
elongation <- data$elongation
# --- 1. Normality Test using QQ-Plot and Shapiro-Wilk Test ---
# Test normality for Roughness
residual_roughness <- rstandard(aov(roughness ~ infill_pattern *
material * bed_temperature, data = data))
# QQ-plot for Roughness
qqnorm(residual_roughness)
qqline(residual_roughness, col = "red")
title(main = "Figure 5.2: Normal QQ-plot for Residual Roughness")
# Shapiro-Wilk Test for Roughness
shapiro_roughness <- shapiro.test(residual_roughness)
print(shapiro_roughness)
# Test normality for Tension Strength
residual_tension_strength <- rstandard(aov(tension_strenght ~
infill_pattern * material * bed_temperature, data = data))
# QQ-plot for Tension Strength
qqnorm(residual_tension_strength)
qqline(residual_tension_strength, col = "red")
title(main = "Figure 5.3: Normal QQ-plot for Residual Tension
Strength")
Probability and statistics – CC01
113
114
print(result)
# Conclusion statement
cat("\nComment: We can conclude that infill_pattern, material, and
bed_temperature have an effect on", response,
"and there is an interaction between them.\n") }
# Perform ANOVA and print results for each dependent variable
anova_roughness <- aov(roughness ~ infill_pattern * material *
bed_temperature, data = data)
print_anova(anova_roughness, "roughness")
anova_tension_strength <- aov(tension_strenght ~ infill_pattern *
material * bed_temperature, data = data)
print_anova(anova_tension_strength, "tension_strength")
anova_elongation <- aov(elongation ~ infill_pattern * material *
bed_temperature, data = data)
print_anova(anova_elongation, "elongation")
Based on the ANOVA results, we can draw conclusions about how the combination
of infill pattern, bed temperature, and material type affects roughness, tension
strength, and elongation.
If the p-value is below 0.05 for any of the factors or their interactions, we
conclude that those factors or interactions significantly affect the respective
output parameter.
Comment:
Statistical analysis removed factors "wall_thickness", "infill_density", and
"infill_pattern" because they weren't significant (p-value > significance level). The
others factors (p-value < significance level) were kept for the new model,
model_Roughness_1
r
# Step 1: Create sample data and save it to a CSV file (if not already
created)
data <- data.frame(
Probability and statistics – CC01
124
125
# Full model
model_roughness <- lm(roughness ~ infill_density + wall_thickness +
nozzle_temperature + print_speed + bed_temperature + material +
layer_height, data = data)
summary(model_roughness)
# Remove non-significant factors
model_roughness_1 <- lm(roughness ~ nozzle_temperature + print_speed +
bed_temperature + material, data = data)
summary(model_roughness_1)
# Step 5: Build a second linear regression model for comparison
model_roughness_2 <- lm(roughness ~ infill_pattern + bed_temperature +
material + print_speed, data = data)
summary(model_roughness_2)
# Compare models with ANOVA
anova(model_roughness_1, model_roughness_2)
# Step 6: Visualize the results with ggplot2
install.packages("ggplot2") # Install if not already installed
library(ggplot2)
# Visualize the relationship between nozzle_temperature and roughness
ggplot(data, aes(x = nozzle_temperature, y = roughness)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Model Roughness 1: Nozzle Temperature vs Roughness")
# Step 7: Conclusion based on the R² value
cat("Comment: Based on R² value, Model_2 (R² =",
summary(model_roughness_2)$r.squared, ") is chosen as the final
model.")
The assumption of the regression model for checking validity and quality of
the model.
The assumption of the regression of the model is:
Y = β0 + β1x1 + β2x2 +…+ βnxn+ ε , i=1 , 2 ,3,4 ,… n
Probability and statistics – CC01
128
129
- There must be a linear relationship between the outcome variable and the
independent variables.
- There must be a linear relationship between the outcome variable and the
independent variables.
- The variance of the errors is constant.
- Errors ε have expectation = 0.
r
# Create synthetic data
set.seed(123)
data <- data.frame(
roughness = runif(30, 0.2, 0.8), # Random roughness values between
0.2 and 0.8
infill_pattern = sample(c("pattern1", "pattern2"), 30, replace =
TRUE), # Random fill pattern
bed_temperature = sample(60:80, 30, replace = TRUE), # Random bed
temperature between 60 and 80
material = sample(c("material1", "material2"), 30, replace = TRUE), #
Random material type
infill_density = sample(20:40, 30, replace = TRUE), # Random infill
density between 20 and 40
wall_thickness = runif(30, 0.8, 1.5), # Random wall thickness between
0.8 and 1.5
nozzle_temperature = sample(200:220, 30, replace = TRUE), # Random
nozzle temperature between 200 and 220
print_speed = sample(50:70, 30, replace = TRUE), # Random print speed
between 50 and 70
layer_height = runif(30, 0.2, 0.3) # Random layer height between 0.2
and 0.3 )
# Fit a linear model
model <- lm(roughness ~ infill_pattern + bed_temperature + material +
infill_density +
wall_thickness + nozzle_temperature + print_speed + layer_height, data
= data)
# Check assumptions
## 1. Linearity check (Residuals vs Fitted plot)
fitted_values <- fitted(model)
Figure 5.7 Results when drawing linear model regression analysis graphs.
par(mfrow = c(2, 2)) # Split the plot window into 2x2 grid
# Residuals vs Fitted (Linearity check)
fitted_values_2 <- fitted(model_tension_strength_2)
residuals_2 <- model_tension_strength_2$residuals
plot(fitted_values_2, residuals_2, main = "Residuals vs Fitted for
model_TensionStrength_2",
xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red")
# Normal Q-Q plot (Normality check)
qqnorm(residuals_2)
qqline(residuals_2, col = "red")
title(main = "Normal Q-Q Plot for model_TensionStrength_2")
# Scale-Location plot (Homoscedasticity check)
plot(fitted_values_2, sqrt(abs(residuals_2)), main = "Scale-Location
for Homoscedasticity",
xlab = "Fitted Values", ylab = "Square Root of |Residuals|")
abline(h = 0, col = "red")
# Cook's Distance plot (Influential points)
cooksd_2 <- cooks.distance(model_tension_strength_2)
plot(cooksd_2, type = "h", main = "Cook's Distance for
model_TensionStrength_2")
abline(h = 4/(nrow(data) -
length(model_tension_strength_2$coefficients)), col = "red")
# Model summary for the chosen model
summary(model_tension_strength_2)
plot(fitted_values_elongation_2, sqrt(abs(residuals_elongation_2)),
main = "Scale-Location for Homoscedasticity",
xlab = "Fitted Values", ylab = "Square Root of |Residuals|")
abline(h = 0, col = "red")
# Cook's Distance plot (Influential points)
cooksd_elongation_2 <- cooks.distance(model_elongation_2)
plot(cooksd_elongation_2, type = "h", main = "Cook's Distance for
model_elongation_2")
abline(h = 4/(nrow(data) - length(model_elongation_2$coefficients)),
col = "red")
# Model summary for the chosen model
summary(model_elongation_2)
The regression models will provide insights into which variables (e.g., infill
density, wall thickness, print speed) significantly affect the output
parameters (roughness, tension strength, elongation).
b. Disadvantage
Despite its usefulness, linear regression has limitations. It assumes a linear
relationship between dependent and independent variables, which may not always
hold. In cases like the present project, variables such as bed temperature and fan
speed may exhibit nonlinear relationships with the output variables, limiting the
model's effectiveness. Furthermore, LR models are sensitive to outliers, which can
drastically alter model coefficients and impact overall performance.
2. Extension
Since the linear regression model could explain only a limited portion of the
variability in roughness (86%) and tension strength (65%), the remaining
variability could be better explained by a polynomial regression model. Polynomial
regression models allow for more flexible relationships by fitting higher-degree
polynomials to the data. The general form of polynomial regression can be
represented as:
Comment:
This output displays the results of a multivariate linear regression where the”
regression” target variable is modeled as a second-degree polynomial function of
these variables.
The model’s “Adjusted R-squared” is 0.08572, indicating that about 8,572% of the
“Regression” variation can be explained by the selected variables.
Comment:
This output displays the results of a multivariate linear regression where the
”tension strenght” target variable is modeled as a second-degree polynomial
function of these variables.
The model’s ’Adjusted R-squared’ is 0.08572, indicating that about 8.572% of the
”tension strenght” variation can be explained by the selected variables.
Comment:
This output displays the results of a multivariate linear regression where the
”elongation” target variable is modeled as a second-degree polynomial function of
these variables.
The model’s ’Adjusted R-squared’ is 0.08572, indicating that about 8,572% of the
”elongation” variation can be explained by the selected variables.
d. Model comparison
To determine between Multiple linear regression and Multiple polynomial
regression which one is more efficient, we will consider the rate of accuracy of two
models (’Adjusted R-squared’, and then multiplying by 100 to express the
accuracy as a percentage).
Table 6.1. Comparison model result using the rate of accuracy ’Adjusted R-
squared’
Multiple linear regression Multiple polynomial regression
Roughnes 85,71% 89,98%
Tesion strenght 62,01% 73.52%
Elongation 66,42% 75.46%
3. Conclusion
The polynomial regression model proves to be more efficient than the linear
regression model for explaining the variability in roughness, tension
strength, and elongation. With higher Adjusted R-squared values, the
polynomial regression model provides a better fit to the data and enhances
the predictive power of the analysis.
Given the results, further investigation into the use of polynomial regression
in 3D printing models would likely offer more accurate predictions for the
mechanical properties of printed objects, contributing to more optimized
print processes and material choices.
PART 8: REFERENCES
[1] Douglas C. Montgomery & George C. Runger. (2010). Applied Statistics
and Probability for Engineers. The United States of America: Publisher. John
Wiley & Sons, Inc.
[2] John Verzani. (2004). Using R for Introductory Statistics. New York:
Publisher. Chapman and Hall/CRC.
[4] Nguyễn Tiến Dũng & Nguyễn Đình Huy . (2019). XÁC SUẤT – THỐNG
KÊ & PHÂN TÍCH SỐ LIỆU. TP Hồ Chí Minh: Nxb: ĐẠI HỌC QUỐC GIA.
34