# Business Analytics and Decision Making

SIMPLE LINEAR REGRESSION

### Assignment Overview

You are a consultant who works for the Diligent Consulting Group. In this Case, you are engaged on a consulting basis by Loving Organic Foods. In order to get a better idea of what might have motivated customers’ buying habits you are asked to analyze the factors that impact organic food expenditures. You opt to do this using linear regression analysis.

### Case Assignment

Using Excel, generate regression estimates for the following model:

Annual Amount Spent on Organic Food = α + bAge

After you have reviewed the results from the estimation, write a report to your boss that interprets the results that you obtained. Please include the following in your report:

1. The regression output you generated in Excel.
2. Your interpretation of the coefficient of determination (r-squared).
3. Your interpretation of the coefficient estimate for the Age variable.
4. Your interpretation of the statistical significance of the coefficient estimate for the Age variable.
5. The regression equation with estimates substituted into the equation. (Note: Once the estimates are substituted into the regression equation, it should take a form similar to this: y = 10 +2x)
6. A discussion of how this equation in item 5 above can be used to estimate annual expenditures on organic food.
7. An estimate of “Annual Amount Spent on Organic Food” for the average consumer. (Note: You will need to substitute the average age into the regression equation for x, the intercept for α, and solve for y.)

Data: Download the Excel-based data file: BUS520 Module 3 Case.

### Assignment Expectations

Written Report

Length requirements: 3–4 pages minimum (not including Cover and Reference pages). NOTE: You must submit 3–4 pages of written discussion and analysis.

Provide a brief introduction to/background of the problem, similar to the introduction/background you provided in Module 1 and 2 Case submissions.

Provide a brief discussion of linear regression analysis, including the value of using this estimation technique.

Provide a written analysis that addresses each of requirements listed under the “Case Assignment” section.

Write clearly, simply, and logically. Use double-spaced, black Verdana or Times Roman font in 12 pt. type size.

Please use keywords as headings to organize the report.

Avoid redundancy and general statements such as “All organizations exist to make a profit.” Make every sentence count.

Paraphrase the facts using your own words and ideas, employing quotes sparingly. Quotes, if absolutely necessary, should rarely exceed five words.

Upload both your written report and Excel file to the Case 3 Dropbox.

______________________________________________________________________

## SIMPLE LINEAR REGRESSION

Assume once again that you are a consultant who works for the Diligent Consulting Group. You are continuing to work on the analysis of the customer database from Modules 1 and 2.

### SLP Assignment Expectations

Complete the following tasks in the Module 3 SLP assignment template:

1. Create a scatterplot in Excel with “Annual Amount Spent on Organic Food” on the y (vertical) axis and “Age” on the x (horizontal) axis.
2. Insert a trendline.
3. What does the trendline indicate about the relationship between these two variables?
4. Calculate the correlation coefficient for these two variables using the =CORRELATION() formula in Excel.
5. Interpret the correlation coefficient.
6. Does the correlation coefficient agree with the slope of the best fit line? Explain.
7. Add the equation for the best fit line on the chart.
8. Does this equation match the linear regression equation from the Case for this module? Explain.

_______________________________________________________________

## MULTIVARIATE ESTIMATION AND MODEL FIT

### Assignment Overview

You are a consultant who works for the Diligent Consulting Group. In this Case, you are engaged on a consulting basis by Loving Organic Foods. In order to get a better idea of what might have motivated customers’ buying habits you are asked to analyze the factors that impact organic food expenditures. You performed a simple linear regression analysis in the Module 3 Case. Now, you are adding a layer of complexity to that analysis and including more independent variables in your model.

### Case Assignment

Using Excel, generate regression estimates for the following model:

Annual Amount Spent on Organic Food = α + b1Age + b2AnnualIncome
+ b3Number of People in Household + b4Gender

After you have reviewed the results from the estimation, write a report to your boss that interprets the results that you obtained. Please include the following in your report:

1. The regression output you generated in Excel.
2. Your interpretation of the coefficient of determination (r-squared).
3. Your interpretation of the global test for statistical significance (the F-test).
4. Your interpretation of the coefficient estimates for all the independent variables.
5. Your interpretation of the statistical significance of the coefficient estimates for all the independent variables.
6. The regression equation with estimates substituted into the equation. (Note: Once the estimates are substituted into the regression equation, it should take a form similar to this: y = 10 +2x1 +1x2 +4x3 +0.9x4)
7. An estimate of “Annual Amount Spent on Organic Food” for the average consumer. (Note: You will need to substitute the averages for all the independent variables into the regression equation for x, the intercept for α, and solve for y.)
8. A discussion of whether or not the coefficient estimate on the Age variable in this estimation is different than it was in the simple linear regression model from Module 3 Case. Be sure to explain why it did/did not change.
9. You decide you want to generate an elasticity coefficient, so you log the following variables in Excel: Annual Amount Spent on Organic Food, Annual Income.
10. Using Excel, generate regression estimates for the following model:

Log(Annual Amount Spent on Organic Food) = α +b1Age + b2Log(AnnualIncome)
+ b3Number of People in Household + b4Gender

1. Your interpretation of the coefficient estimate for Log(AnnualIncome).
2. Your interpretation of the coefficient of determination (r-squared) for this new model.

Data: Download the Excel-based data file: BUS520 Module 4 Case.

### Assignment Expectations

Written Report

Length requirements: 3–4 pages minimum (not including Cover and Reference pages). Note: You must submit 3–4 pages of written discussion and analysis.

Provide a brief introduction to/background of the problem, similar to the introduction/background you provided in Module 1 through 3 Case submissions.

Provide a brief comparison of simple linear regression and multiple linear regression.

Provide a written analysis that addresses each of requirements listed under the “Case Assignment” section.

Write clearly, simply, and logically. Use double-spaced, black Verdana or Times Roman font in 12 pt. type size.

_____________________________________________________

Assume once again that you are a consultant who works for the Diligent Consulting Group. You are continuing to work on the analysis of the customer database from Modules 1 through 3.

### SLP Assignment Expectations

Complete the following tasks in the Module 4 SLP assignment template:

1. Compare the coefficients of determination (r-squared values) from the three linear regressions: simple linear regression from Module 3 Case, multivariate regression from Module 4 Case, and the second multivariate regression with the logged values from Module 4 Case. Which model had the “best fit”?
2. Calculate the residual for the first observation from the simple linear regression model. Recall, the Residual = Observed value – Predicted value or e = y – ŷ.
3. What happens to the overall distance between the best fit line and the coordinates in the scatterplot when the residuals shrink?
4. What happens to the coefficient of determination when the residuals shrink?
5. Consider the r-squared from the linear regression model and the r-squared from the first multivariate regression model. Why did the coefficient of determination change when more variables were added to the model? 