BA unit 2 notes (1)
BA unit 2 notes (1)
Trendiness
- Definition: Trendiness refers to the tendency of a time series to move in a specific direction over time.
- Types of Trends:
- Upward trend: Increase in value over time
- Downward trend: Decrease in value over time
- Stationary trend: No change in value over time
- Methods to detect trendiness:
- Visual inspection of time series plots
- Trend tests (e.g. Mann-Kendall test)
- Regression analysis
Regression Analysis
- Definition: Regression analysis is a statistical method to establish a relationship between two or more
variables.
- Types of Regression:
- Simple Linear Regression (SLR): One independent variable
- Multiple Linear Regression (MLR): More than one independent variable
- Non-Linear Regression: Non-linear relationship between variables
Modeling Relationships
- Goal: To understand the relationship between two continuous variables
- Simple Linear Regression: A linear model that predicts the value of a continuous outcome variable
based on a single predictor variable
- Assumptions:
- Linearity: Relationship between variables is linear
- Independence: Observations are independent
- Homoscedasticity: Constant variance of residuals
- Normality: Residuals are normally distributed
- No multicollinearity: Predictor variable is not highly correlated with other variables
Model Evaluation
- Coefficient of Determination (R-squared): Measures goodness of fit (0-1)
- Residual Plots: Check for linearity, homoscedasticity, and normality
- Hypothesis Testing: Test for significance of coefficients (t-tests, F-tests)
Common Applications
- Predicting continuous outcomes (e.g., stock prices, temperatures)
- Identifying relationships between variables (e.g., correlation between age and income)
- Making informed decisions based on data-driven insights
These notes cover the basics of simple linear regression, including the assumptions, equation,
interpretation of coefficients, model evaluation, and common applications.
Here are some common types of data models:
1. Conceptual Data Model: A high-level, abstract model that describes the overall structure and
relationships of data.
2. Logical Data Model: A detailed, formal model that describes the relationships and constraints of data.
3. Physical Data Model: A low-level, detailed model that describes how data is stored and managed in a
specific database management system.
4. Relational Data Model: A model that organizes data into tables with well-defined relationships
between them.
5. Entity-Relationship Data Model: A model that describes data in terms of entities and relationships
between them.
6. Dimensional Data Model: A model that organizes data into facts and dimensions for analytical
querying.
7. Object-Oriented Data Model: A model that represents data as objects with properties and
relationships.
9. Network Data Model: A model that represents data as a network of interconnected records.
10. NoSQL Data Model: A model that stores data in a variety of formats such as key-value, document,
graph, and column-family stores.
Each type of data model has its strengths and weaknesses, and the choice of which one to use depends on
the specific needs of the application or organization.
Problem:
A company wants to understand the relationship between the amount spent on advertising (in thousands
of dollars) and the sales revenue (in thousands of dollars) for their product. They have collected data for
10 months, with the following values:
| Month | Advertising Spend (X) | Sales Revenue (Y) |
| --- | --- | --- |
| 1 | 10 | 50 |
| 2 | 15 | 60 |
| 3 | 20 | 70 |
| 4 | 25 | 80 |
| 5 | 30 | 90 |
| 6 | 35 | 100 |
| 7 | 40 | 110 |
| 8 | 45 | 120 |
| 9 | 50 | 130 |
| 10 | 55 | 140 |
Task:
1. Develop a linear regression model to predict sales revenue (Y) based on advertising spend (X).
2. Interpret the coefficients of the model.
3. Evaluate the performance of the model using appropriate metrics (e.g. R-squared, residual plots).
4. Use the model to predict sales revenue for a new month with an advertising spend of $60,000.
Solution:
Y = β0 + β1X + ε
β0 = 20.5
β1 = 2.3
Y = 20.5 + 2.3X
1. Interpretation of Coefficients:
1. Model Evaluation:
1. Prediction: