Marketing Intelligence
Company_XXX Case Study
Company_XXX is an online company that meets the growing demand for independent travel information. It offers an extensive hotel meta search to travellers.
The following document details results from the ‘Marketing Intelligence’ data task.
The task involves two datasets i.e Marketing Campaigns and Session data.
The Marketing campaigns data contains weekly information about different online marketing campaigns in one market.
The Session data contains information about single visits to the Company_XXX website (= sessions). A click out is logged whenever a user clicks on a hotel and is redirected to the booking page. The booking field is binary and indicates if a hotel booking was logged after one of the click outs.
### 0.1 Install the libraries required
## Create a vector of packages to be installed
pkgs <- c("tidyverse","data.table","DT","lubridate","ggthemes","randomForest","readODS","ggcorrplot")
## Check if there are packages you want to load, that are not already installed.
miss_pkgs <- pkgs[!pkgs %in% installed.packages()[,1]]
## Installing the missing packages
if(length(miss_pkgs)>0){
install.packages(miss_pkgs)
}
## Loading all the packages
invisible(lapply(pkgs,library,character.only=TRUE))
## Remove the objects that are no longer required
rm(miss_pkgs)
rm(pkgs)
### Setting the plot theme
Company_XXX_theme<- theme_hc()+ theme(legend.position = "right",
legend.direction = "vertical",
#legend.title = element_blank(),
plot.title = element_text( size = rel(1.6), hjust = 0.5),
plot.subtitle = element_text(size = rel(1.5), hjust = 0.5),
#axis.text = element_text( size = rel(1.5)),
axis.text.x = element_text(size =rel(1.5),angle = 0),
axis.text.y = element_text(size =rel(1.5),angle = 0),
axis.title = element_text( size = rel(1.55)),
axis.line.x = element_line(size = 1.5, colour = "#c94a38"),
panel.background = element_rect(fill = NA))
### Colours that will be used for the plots
Company_XXX_blue = "#377DA9"
Company_XXX_maroon = "#BB523A"
Company_XXX_yellow = "#E79435"
## Avoidance of scientific numbers
options(scipen = 999)
## Printing function
pr_func<-function(data,cnames){
datatable(data,colnames = cnames,
extensions = 'Buttons', options = list(
dom = 'Bfrtip',
buttons = c('copy', 'print')
)
)
}
### 0.2 Read in the datasets
mc_df <- readRDS("../../../../../PersonalDevelopment/marketing_campaigns2.rds")
sessions_df <- readRDS("../../../../../PersonalDevelopment/session_data.rds")
Task 1: Marketing Campaigns
Give an overview of entire market’s development and the different campaigns. Please prepare 3-5 charts and summarize the most important findings. See 1.2 - 1.8 below
How would you assess the development of the quality of traffic, e.g. in terms of revenue per visitor. How is the overall development and how does each campaign evolve? See 1.2 - 1.8 below
You are talking with the responsible business developer for the market who wants to spend an additional 250€ per week from week 31 onwards. Please help him out with the following questions:
- What is your advice in which campaign to invest and why? See 1.6 below
- How do you expect this to impact the overall performance in the market from week 31 onwards? See 1.6 below
1.1. Clean the dataset, and generate new variables
## Convert the Campaign variable to factor
mc_df <- mc_df %>%
mutate(Campaign = fct_relevel(Campaign,"Aldebaran","Bartledan","Cottington"))
## Remove duplicates
mc_df <- mc_df %>%
unique()
## Generate a profit variable
mc_df <- mc_df %>%
mutate(Profit = Revenue - Cost)
## Weekly_RPV
mc_df <- mc_df %>%
mutate(Weekly_RPV = Revenue /Visits)
1.2 Exploring the trend of visits for each of the campaigns
The Aldebaran campaign seems to have done really well in terms of attracting visitors to the site, all through the campaign period. As much as the number of visits was quite low in the beginning (as compared to the other two campaigns), and with very few dips in the number of visits here and there, there was a good increasing trend overall.
The Bartledan campaign started off at a steady rate, until week 14, where the number of visits to the site picked up a bit till the end.
The Cottington campaign maintained a low but steady state in the number of visits all through the campaign period.
graph <- mc_df %>%
ggplot(aes(x = as.factor(Week), y=Visits, group = Campaign,color = Campaign))+
geom_line(size = 1.1)+
Company_XXX_theme+
scale_color_manual(values = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue))+
labs(title = "Distribution of the Number of Visits",
x = "Week", y="Number of Visits",color = "Campaign")
graph
1.3 Revenue Per Visitor
RPV is the average revenue per visitor to your website.
Here, we are assuming that a visit represents a unique visitor.
RPV is calculated by dividing the total income by the number of visitors during a specific time period.
We can see that as much the Cottington campaign maintained a low but steady state in the number of visits all through the campaign period (as shown in the previous section), the RPV was the highest, amongst all the three campaigns.
This means that the low number of visitors actually generated higher revenue as opposed to the revene that was generated by the higher number of visitors on the other two campaigns.
graph <- mc_df %>%
ggplot(aes(x = as.factor(Week), y=Weekly_RPV, group = Campaign,color = Campaign))+
geom_line(size = 1.1)+
Company_XXX_theme+
scale_color_manual(values = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue))+
labs(title = "Weekly Revenue Per Visitor",
x = "Week", y="Revenue Per Visitor",color = "Campaign")
graph
## Generating ROMI
RPV_df <- mc_df %>%
group_by(Campaign) %>%
summarise(RPV = round(sum(Revenue)/ sum(Visits),1))
## Generate the plot
graph <- ggplot(data = RPV_df,
mapping = aes(x = Campaign, y = RPV, fill = Campaign))+
geom_bar(stat = "identity")+
geom_text(aes(label = RPV),vjust = -0.25, size = 5)+
Company_XXX_theme+
scale_fill_manual(values = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue))+
labs(title = "Overall Revenue Per Visitor",
x = "Campaign", y="Revenue Per Visitor",color = "Campaign")
graph
1.4 Assessing profitability of each of the campaigns over the weeks
Here, Profit = Revenue - Cost
In terms of profitability, the Bartledan campaign was the worst performer as it never generated any profit.
The Cottington Campaign was doing well, until Week 20, when it started generating losses.
Coincidentally, Week 20 is the same week that Aldebaran came out of the red, and started generating profits.
graph <- mc_df %>%
ggplot(aes(x = as.factor(Week), y=Profit, group = Campaign,color = Campaign))+
geom_line(size = 1.1)+
geom_hline(yintercept = 0,color="black", linetype = "dashed")+
geom_vline(xintercept = 20,color="red", linetype = "dashed")+
Company_XXX_theme+
scale_color_manual(values = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue))+
labs(title = "Assessing Weekly Campaign Profitability",
x = "Week", y="Profitability",color = "Campaign")
graph
1.5 Assessing Overall Campaign Profitability
Throughout the campaign period, the Cottington campaign is the only one that made a significant amount of profit.
abs_df <- mc_df %>%
select(Week, Campaign, Revenue, Cost) %>%
pivot_longer(cols = Revenue:Cost, names_to ="Revenue_Cost" ,values_to = "Value") %>%
group_by(Campaign,Revenue_Cost) %>%
summarise(Value = round(sum(Value),1))
## Generate the plot
graph <- ggplot(data = abs_df,
mapping = aes(x = Campaign, y = Value, fill = Revenue_Cost))+
geom_bar(stat = "identity", position = "dodge")+
geom_text(aes(label = Value),vjust = -0.25, size = 5, position = position_dodge(width = 1))+
Company_XXX_theme+
scale_fill_manual(values = c("#377DA9","#E79435"))+
labs(title = "Assessing Overall Campaign Profitability",
x = "Campaign", y="Revenue/Cost",color = "Measure")
graph
1.6 Return on Marketing Investment (ROMI)
ROMI is an indication of return on investment in marketing.
ROMI = [Total sales - marketing campaign costs / marketing campaign costs]
There is a larger ROMI on the Cottington campaign, as compared to the Aldebaran campaign. The Bartledan resulted into a negative ROMI, even though the number of visits to the site kept on increasing, as the weeks flew by.
I would advise the business developer for the market to invest in the Cotington Campaign. This is because as much as the campaign generally attracts a smaller number of visitors, as compared to the other campaigns, the ROMI is high, and the Revenue per Visitor is also high.
## Generating ROMI
ROMI_df <- mc_df %>%
group_by(Campaign) %>%
summarise(ROMI_abs = (sum(Revenue)-sum(Cost)) / sum(Cost),
ROMI_perc = round(ROMI_abs * 100,2))
## Generate the plot
graph <- ggplot(data = ROMI_df,
mapping = aes(x = Campaign, y = ROMI_perc, fill = Campaign))+
geom_bar(stat = "identity")+
geom_text(aes(label = ROMI_perc),vjust = -0.25, size = 5)+
Company_XXX_theme+
scale_fill_manual(values = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue))+
labs(title = "Assessing Campaign Profitability \n (ROMI)",
x = "Campaign", y="ROMI (%)",color = "Campaign")+
ylim(-15,5)
graph
1.7 Does the type of campaign predict profit?
Company_XXX is likely to obtain a significant profit of 11.2 for an additional investment on the Cottington campaign, as opposed to the investment being made on the Aldebaran campaign.
Company_XXX would make a huge loss (27.6) if it invested cash on the Bartledan campaign.
model1 <- lm(Profit ~ Campaign, data = mc_df)
summary(model1)
Call:
lm(formula = Profit ~ Campaign, data = mc_df)
Residuals:
Min 1Q Median 3Q Max
-56.473 -9.864 1.269 14.199 46.555
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9631 3.6803 0.262 0.7942
CampaignBartledan -27.5866 5.2047 -5.300 0.000000864 ***
CampaignCottington 11.2372 5.2047 2.159 0.0336 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.16 on 87 degrees of freedom
Multiple R-squared: 0.4038, Adjusted R-squared: 0.3901
F-statistic: 29.47 on 2 and 87 DF, p-value: 0.0000000001693
## Generating a tidy table
model1_tidy <- broom::tidy(model1)
basevalues <- c("CampaignAldebaran",0, 5.204708, 0, 0)
model1_tidy <- rbind(model1_tidy,basevalues)
model1_tidy$term <- gsub("Campaign","",model1_tidy$term)
model1_tidy <- model1_tidy%>% filter(term !="(Intercept)")
model1_tidy <- model1_tidy%>% mutate(estimate = round(as.numeric(estimate),1))
model1_tidy <- model1_tidy%>% rename(Campaign = term)
#model1_tidy$estimate <- round(model1_tidy$estimate)
## Generate the plot
graph <- ggplot(data = model1_tidy,
mapping = aes(x = Campaign, y = estimate, fill = Campaign))+
geom_bar(stat = "identity")+
geom_text(aes(label = estimate),vjust = -0.25, size = 5)+
Company_XXX_theme+
scale_fill_manual(values = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue))+
labs(title = "Profit ~ Campaign",
x = "Campaign", y="Regression Estimate",color = "Campaign")+
ylim(-30,13)
graph
1.8 Does the type of campaign predict number of visits?
model2 <- lm(Visits ~ Campaign, data = mc_df)
summary(model2)
Call:
lm(formula = Visits ~ Campaign, data = mc_df)
Residuals:
Min 1Q Median 3Q Max
-293.667 -33.933 0.233 20.067 292.333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 320.67 19.14 16.752 < 0.0000000000000002 ***
CampaignBartledan -147.73 27.07 -5.457 0.0000004493 ***
CampaignCottington -169.90 27.07 -6.276 0.0000000131 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 104.8 on 87 degrees of freedom
Multiple R-squared: 0.3486, Adjusted R-squared: 0.3336
F-statistic: 23.28 on 2 and 87 DF, p-value: 0.000000007978
## Generating a tidy table
model2_tidy <- broom::tidy(model2)
basevalues <- c("CampaignAldebaran",0, 5.204708, 0, 0)
model2_tidy <- rbind(model2_tidy,basevalues)
model2_tidy$term <- gsub("Campaign","",model2_tidy$term)
model2_tidy <- model2_tidy%>% filter(term !="(Intercept)")
model2_tidy <- model2_tidy%>% mutate(estimate = round(as.numeric(estimate),1))
model2_tidy <- model2_tidy%>% rename(Campaign = term)
#model2_tidy$estimate <- round(model2_tidy$estimate)
## Generate the plot
graph <- ggplot(data = model2_tidy,
mapping = aes(x = Campaign, y = estimate, fill = Campaign))+
geom_bar(stat = "identity")+
geom_text(aes(label = estimate),vjust = -0.25, size = 5)+
Company_XXX_theme+
scale_fill_manual(values = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue))+
labs(title = "Number of Visits ~ Campaign",
x = "Campaign", y="Regression Estimate",color = "Campaign")
graph
Task 2: Session data
Test to see if there are any connections between the booking data and any other given information.
2.1 Create additional variables
duration: the length of time taken on the session
start_hour: the hour when the session started.
time_of_day: the time of day i.e Early Morning, Morning, Afternoon, Evening
## Session duration
sessions_df <- sessions_df%>%
mutate(duration = difftime(session_end_text, session_start_text, units = "secs",tz = "EAT"),
duration = ifelse(duration <0, (24*60*60)+duration, duration))
## Start hour
sessions_df <- sessions_df%>%
mutate(start_hour = hour(session_start_text))
## hour_of_day
sessions_df <- sessions_df %>%
mutate(start_hour = as.numeric(start_hour)) %>%
mutate(time_of_day = ifelse(start_hour >=0 & start_hour <=5,"Early Morning",
ifelse(start_hour >=6 & start_hour <=11,"Morning",
ifelse(start_hour >=12 & start_hour <=18,"Afternoon",
ifelse(start_hour >=19 & start_hour <=23,"Evening","")))))
2.2 Is there a difference in means of booking, between the different times of day?
H0: The mean of the booking variable, for all the different times = 0
Ha: At least one of the means is not 0
The P-value is very large (>0.05) meaning that the means are not really different from each other, and that this variable is not predictive of the instance of booking.
## Generate anova results
anova_test <- aov(booking ~ time_of_day, data = sessions_df)
summary(anova_test)
Df Sum Sq Mean Sq F value Pr(>F)
time_of_day 3 0.1 0.03623 0.415 0.742
Residuals 9996 873.4 0.08737
TukeyHSD(anova_test)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = booking ~ time_of_day, data = sessions_df)
$time_of_day
diff lwr upr p adj
Early Morning-Afternoon 0.001126106 -0.01967051 0.02192273 0.9990396
Evening-Afternoon 0.007688625 -0.01407098 0.02944823 0.8006404
Morning-Afternoon 0.006476426 -0.01419107 0.02714393 0.8520366
Evening-Early Morning 0.006562519 -0.01597362 0.02909866 0.8774553
Morning-Early Morning 0.005350320 -0.01613322 0.02683386 0.9190753
Morning-Evening -0.001212199 -0.02362924 0.02120484 0.9990434
2.3 What is the correlation between the continuous variables?
There is no correlation between the ‘booking variable’ and any othe variables. Meaning none of the variables can predict booking.
## Generate the correlation matrix
corr_mat <- cor(sessions_df %>% select(booking, clickouts, duration, start_hour))
corr_mat
booking clickouts duration start_hour
booking 1.0000000000 -0.049811677 0.01044032 0.0009995598
clickouts -0.0498116772 1.000000000 0.03979617 -0.0077694714
duration 0.0104403163 0.039796170 1.00000000 0.0154922952
start_hour 0.0009995598 -0.007769471 0.01549230 1.0000000000
## Generate the p-values of this correlation matrix
p.mat <- cor_pmat(corr_mat)
p.mat
booking clickouts duration start_hour
booking 0.0000000 0.5628733 0.6811493 0.6836681
clickouts 0.5628733 0.0000000 0.7530925 0.6518106
duration 0.6811493 0.7530925 0.0000000 0.6746880
start_hour 0.6836681 0.6518106 0.6746880 0.0000000
## Generate the correlation plot.
ggcorrplot(corr_mat,
outline.col = "white",lab = TRUE,
ggtheme = Company_XXX_theme,
colors = c(Company_XXX_maroon, Company_XXX_yellow, Company_XXX_blue),
title = "Correlation Plot")