simple discussions(business analytics and machine learning)
Question 2: Machine Learning
a) Which of unsupervised or supervised machine learning is best suited to assessing causation? Explain your choice.
b) Your analytics team presents you with two sets of results that have improved the organization’s ability to predict customer defections. The first method uses deep learning and has a precision of 85%. The second method uses decision trees and has a precision of 70%. The previous approach had a precision of 40%.
i) Make a case for using the results of the deep learning method.
ii) Make a case for using the decision tree method.
In your answers, consider aspects of customer lifetime value and managerial decision making.
c) An analytics team used two different models to predict the likelihood of an outcome. The results from two different analysts are below:
Don’s Analysis
Actual
Positive
Negative
Predicted
Positive
220
100
Negative
30
650
Katie’s Analysis
Actual
Positive
Negative
Predicted
Positive
170
10
Negative
80
740
i) Use the Confusion Matrix and Index Calculation tables below to calculate the model performance measures.
Confusion Matrix
Actual
Positive
Negative
Predicted
Positive
TP
FP
Negative
FN
TN
Formula
Don Calculation
Katie Calculation
Accuracy
(completed as an example)
(TP + TN) / (TP + TN + FP + FN)
(220 + 650) / (220 + 650 + 100 + 30)
0.87
(170 + 740) / (170 + 740 + 10 + 80)
0.91
Precision
TP / (TP + FP)
Error rate
(FP + FN) / (TP + TN + FP + FN)
Recall
TP / (TP + FN)
Specificity
TN / (TN + FP)
False positive rate
FP / (TN + FP)
F-score
2* ((Precision*Recall) / (Precision + Recall))
ii) Describe a medical or business context where you would prefer to use Don’s model. Why do you prefer Don’s model?
iii) Describe a medical or business context where you would prefer to use Katie’s model. Why do you prefer Katie’s model?
Ian is an intern with the team who claims he made a breakthrough with a model that outperforms both Don’s and Katie’s. The confusion matrix for his model is below:
Ian’s Analysis
Actual
Positive
Negative
Predicted
Positive
249
2
Negative
1
748
iv) What could possibly have gone wrong that would result in his results being invalid? How could this be solved? (15 marks)
Question 3: Experiments
Jennifer was given the results of an experiment that was designed to determine if a 10% reduction in price on an online shopping portal would lead to an increase in purchases. Control and treatment group were created. These groups are described below:
Control Group
Treatment Group
Number of males
25
25
Number of females
25
25
Average age
47 years
37 years
Average spend per visit in the month BEFORE the experiment
$25.00
$25.00
Average spend per visit in the month AFTER the experiment
$25.00
$29.00
a) Were the control and treatment groups effectively randomized? Why or why not?
b) What are the two most likely explanations for the treatment groups showing a higher average spend than the control group?
c) What type of analysis could be used to remove one of the possible explanations for the difference in average spend?
d) Experiments are useful in helping determine if people have responded due to a stimulus or if they would have responded even without the stimulus. Design an experiment that could demonstrate what proportion of people have responded to a stimulus. These people could be customers or employees within a company. Examples could be an advertising campaign to customers, or a policy of flexible work hours for employees. Requirements:
i) How would you pick the treatment and control groups? Fill in the table below to indicate the number of people and 3 important characteristics that describe each group
Control Group
Treatment Group
People
Characteristic 1:
Characteristic 2:
Characteristic 3:
ii) Predict the results and state the managerial conclusion you could make from this result. Use the table below to indicate the change in behavior you expect to observe.
Control Group
Treatment Group
Observed behavior before treatment:
Observed behavior after treatment:
iii) State the managerial action you could take from the results of your experiment. Briefly describe a useful follow-up experiment that would further deepen understanding of why people behaved in the manner observed.
Solution Preview
Question 2: Machine Learning
a) Which of unsupervised or supervised machine learning is best suited to assessing causation? Explain your choice.
Unsupervised learning technique is best suited for assessing causation. Unsupervised learning techniques rely on latent variables to assess for causation. With unsupervised learning, it is possible to learn larger and more complex models than with supervised learning. This is because in supervised learning one is trying to find the connection between two sets of observations.
(1,711 words)