simple discussions(business analytics and machine learning)

Question 2: Machine Learning

a) Which of unsupervised or supervised machine learning is best suited to assessing causation? Explain your choice.

b) Your analytics team presents you with two sets of results that have improved the organization’s ability to predict customer defections. The first method uses deep learning and has a precision of 85%. The second method uses decision trees and has a precision of 70%. The previous approach had a precision of 40%.

i) Make a case for using the results of the deep learning method.

ii) Make a case for using the decision tree method.

In your answers, consider aspects of customer lifetime value and managerial decision making.

c) An analytics team used two different models to predict the likelihood of an outcome. The results from two different analysts are below:

Don’s Analysis

Actual

Positive

Negative

Predicted

Positive

220

100

Negative

650

Katie’s Analysis

Actual

Positive

Negative

Predicted

Positive

170

Negative

740

i) Use the Confusion Matrix and Index Calculation tables below to calculate the model performance measures.

Confusion Matrix

Actual

Positive

Negative

Predicted

Positive

Negative

Formula

Don Calculation

Katie Calculation

Accuracy

(completed as an example)

(TP + TN) / (TP + TN + FP + FN)

(220 + 650) / (220 + 650 + 100 + 30)

0.87

(170 + 740) / (170 + 740 + 10 + 80)

0.91

Precision

TP / (TP + FP)

Error rate

(FP + FN) / (TP + TN + FP + FN)

Recall

TP / (TP + FN)

Specificity

TN / (TN + FP)

False positive rate

FP / (TN + FP)

F-score

2* ((Precision*Recall) / (Precision + Recall))

ii) Describe a medical or business context where you would prefer to use Don’s model. Why do you prefer Don’s model?

iii) Describe a medical or business context where you would prefer to use Katie’s model. Why do you prefer Katie’s model?

Ian is an intern with the team who claims he made a breakthrough with a model that outperforms both Don’s and Katie’s. The confusion matrix for his model is below:

Ian’s Analysis

Actual

Positive

Negative

Predicted

Positive

249

Negative

748

iv) What could possibly have gone wrong that would result in his results being invalid? How could this be solved? (15 marks)

Question 3: Experiments

Jennifer was given the results of an experiment that was designed to determine if a 10% reduction in price on an online shopping portal would lead to an increase in purchases. Control and treatment group were created. These groups are described below:

Control Group

Treatment Group

Number of males

Number of females

Average age

47 years

37 years

Average spend per visit in the month BEFORE the experiment

$25.00

Average spend per visit in the month AFTER the experiment

$25.00

$29.00

a) Were the control and treatment groups effectively randomized? Why or why not?

b) What are the two most likely explanations for the treatment groups showing a higher average spend than the control group?

c) What type of analysis could be used to remove one of the possible explanations for the difference in average spend?

d) Experiments are useful in helping determine if people have responded due to a stimulus or if they would have responded even without the stimulus. Design an experiment that could demonstrate what proportion of people have responded to a stimulus. These people could be customers or employees within a company. Examples could be an advertising campaign to customers, or a policy of flexible work hours for employees. Requirements:

i) How would you pick the treatment and control groups? Fill in the table below to indicate the number of people and 3 important characteristics that describe each group

Control Group

Treatment Group

People

Characteristic 1:

Characteristic 2:

Characteristic 3:

ii) Predict the results and state the managerial conclusion you could make from this result. Use the table below to indicate the change in behavior you expect to observe.

Control Group

Treatment Group

Observed behavior before treatment:

Observed behavior after treatment:

iii) State the managerial action you could take from the results of your experiment. Briefly describe a useful follow-up experiment that would further deepen understanding of why people behaved in the manner observed.

Solution Preview

Question 2: Machine Learning
a) Which of unsupervised or supervised machine learning is best suited to assessing causation? Explain your choice.
Unsupervised learning technique is best suited for assessing causation. Unsupervised learning techniques rely on latent variables to assess for causation. With unsupervised learning, it is possible to learn larger and more complex models than with supervised learning. This is because in supervised learning one is trying to find the connection between two sets of observations.

(1,711 words)