Kategorien
Business Data Science Finance

Advanced web scraping techniques in R – or how to obtain data from the web

Web scraping has become one of the data scientists‘ essential skills. In this short tutorial I will teach you some awesome web scraping techniques that you can use for your academic research.

My R-Code can be accessed here.

Leave a like if this short tutorial helped you extracting the desired data.

Kategorien
Finance

Financial markets in Switzerland: A network analysis

The term Network is often associated with Social Networks, the Internet or the human brain. However, more recently Network Analysis has become an interdisciplinary field of research and can also be applied to model the interdependence of financial markets. In this short blog, I show you how you can implement your own Network in R.

As always you need to get some data to work with first. I provide here some data of Eikon of all listed stocks in Switzerland (you find the data here: SIX-Total Return Prices). The data contains total return prices of some listed stocks in Switzerland.

Step 1: Load the data and calculate returns – this can be easily done in R.

# clean global environment
rm(list=ls())
# packages
require(networkD3)
require(igraph)
library(visNetwork)
# set working directory
setwd("C:/Users/lliebi/Dropbox/Daten Luca/PhD/Research/Network Analysis/Data")
prices <- read.csv("SwitzerlandStocks.csv",sep=";")
# clean the data
prices$Name <- as.Date(as.character(prices$Name),format="%d.%m.%Y")
# calculate returns function
return.calculation <- function(price){
returns <- rep(NA,length(price))
for(i in 2:length(price)){
returns[i] <- (price[i]-price[i-1])/price[i-1]
}
return(returns)
}
# create a new dataframe with all the returns
returns<-as.data.frame(apply(prices[,2:ncol(prices)],2,return.calculation))
returns <- cbind(prices$Name,returns)
colnames(returns)[1] <- "Date"

# only stocks with complete observations
returns <- returns[-1,]

# delete col that contain missing values
final<-returns[colSums(!is.na(returns))==nrow(returns)]

Step 2: Now you can already start with your Network analysis

There are several methodologies to model the interdependence of stocks that can be used (e.g. Granger Causality,  Spillover tables, Correlations, …).
I use a very easy and intuitive measure introduced by Diebold and Yilmaz. Furthermore, a helpful package is provided in R that you don’t have to calculate the Spillover table yourself.

require(frequencyConnectedness)
number.stocks <- 50
library(stringr)
colnames(final)[2:number.stocks] <- word(colnames(final)[2:number.stocks],1,sep = "\\.")
# Step 1: Find the correct var model
VARselect(final[,2:number.stocks], lag.max = 2, type = "const")
# you can see that the lowest AIC information criterion is found within lag 2
# therefore specify a VAR(2) model
# Step 2: Implement a VAR model for all the stocks in the sample
var.lag2 <- VAR(final[,2:number.stocks], p = 2, type = "const") # the [,-1] is due to the date in the first column
# With this model you can also predict 10 days ahead returns
var.f10 <- predict(var.lag2, n.ahead = 10, ci = 0.95)
# Step 3: Calculate the spillovers
# here use the function in the frequencyConnectedness package
spillover <- spilloverDY09(var.lag2, n.ahead = 10, no.corr = F)
# get the spillover table
solution <- as.data.frame(spillover$tables)*100

Step 3: get the Net Spillover and visualize the Network


# get Net spillovers
net.spillovers <- matrix(NA,nrow=number.stocks-1,ncol=number.stocks-1)
colnames(net.spillovers) <- colnames(solution)
rownames(net.spillovers) <- rownames(solution)

net.spillovers[lower.tri(net.spillovers)] <-solution[lower.tri(solution)]-solution[upper.tri(solution)]
net.spillovers[upper.tri(net.spillovers)] <-solution[upper.tri(solution)]-solution[lower.tri(solution)]
net.spillovers<-ifelse(net.spillovers>0,net.spillovers,0)
# Step 4: Create your own network
m <- t((net.spillovers))
net=graph.adjacency(m,mode="directed",weighted=TRUE,diag=F)
set.seed(1)
plot.igraph(net,vertex.label.color="black",
edge.color="darkgrey",
edge.arrow.size=0.2,
layout=layout.fruchterman.reingold,
edge.curved=F,
edge.lty=1,
frame=F,
vertex.size=5,
vertex.color=rainbow(number.stocks),
vertex.label.dist=0.0)
degree(net,mode = "in")
degree(net,mode = "out")

 

If you wish to do another visualization you can use the „network“ package. 

links <- as.data.frame(get.edgelist(net))
net = network(links, directed = TRUE)
# network plot
require(network)
ggnet2(net, alpha = 0.75, size = 4, edge.alpha = 0.5,color = "black",
label=T,label.size = 1.5,label.color = "darkgrey")

 

A 3D visualization can be here: Network Analysis.

Kategorien
Finance Research

Option valuation using Black-Scholes

Financial options have an intrinsic and a time value. The intrinsic value for a call option is simply the spot (S) minus the strike price (X). The time value of the call option can be derived using the Black-Scholes formula. The resulting price of the option minus the intrinsic value of the option results in the time value of the option.

The following graph illustrates the intrinsic value (red line), the price of the option (grey line) and the time value of the option (dark grey area).

Using the following code you can replicate the figure:

spot <- seq(1,100,by=1)
strike <- 50
riskfree <- 0
time <- 1
standarddev <- 0.2

d1 <- (log(spot/strike)+(0+standarddev^2/2)*1)/(standarddev*time)
d2 <- d1-standarddev*time^(1/2)

value.call <- pnorm(d1,0,1)*spot-pnorm(d2,0,1)*strike*exp(-riskfree*time)

inner.value <- spot-strike
inner.value <- pmax(inner.value,0)

require(ggplot2)

ggplot()+
geom_line(aes(spot,value.call))+
geom_line(aes(spot,inner.value),colour="red")+ geom_ribbon(aes(spot,ymin=value.call,ymax=inner.value),fill="darkgrey")

Kategorien
Finance Research

Efficient frontier in Finance

In every finance class, one of the first topics students are confronted with is the efficient frontier. Even though the intuition behind the efficient frontier might be easy to grasp, I show you that it can also be very easy to derive the efficient frontier – even using multiple assets.

Using the following code you can replicate the image:


stocks <- c("TSLA","AAPL", "FB")

require(PerformanceAnalytics)
require(quantmod)

getSymbols(stocks)

x<- dailyReturn(TSLA)
y <- dailyReturn(AAPL)
z <- dailyReturn(FB)


g <- as.data.frame(cbind(x,y,z))
a <-as.data.frame(cov(g[2000:2620,]))


library(data.table)
library(scales)
library(ggplot2)

link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/fin_data.csv"
dt <- data.table(read.csv(link))
dt[, date := as.Date(date)]

# create indexed values
dt[, idx_price := price/price[1], by = ticker]

# plot the indexed values
ggplot(dt, aes(x = date, y = idx_price, color = ticker)) +
geom_line() +
# Miscellaneous Formatting
theme_bw() + ggtitle("Price Developments") +
xlab("Date") + ylab("Pricen(Indexed 2000 = 1)") +
scale_color_discrete(name = "Company")


# calculate the arithmetic returns
dt[, ret := price / shift(price, 1) - 1, by = ticker]

# summary table
# take only non-na values
tab <- dt[!is.na(ret), .(ticker, ret)]

# calculate the expected returns (historical mean of returns) and volatility (standard deviation of returns)
tab <- tab[, .(er = round(mean(ret), 4),
sd = round(sd(ret), 4)),
by = "ticker"]


ggplot(tab, aes(x = sd, y = er, color = ticker)) +
geom_point(size = 5) +
# Miscellaneous Formatting
theme_bw() + ggtitle("Risk-Return Tradeoff") +
xlab("Volatility") + ylab("Expected Returns") +
scale_y_continuous(label = percent, limits = c(0, 0.03)) +
scale_x_continuous(label = percent, limits = c(0, 0.1))

link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))

# calculate the necessary values:
# I) expected returns for the two assets
er_x <- mean(df$x)
er_y <- mean(df$y)

# II) risk (standard deviation) as a risk measure
sd_x <- sd(df$x)
sd_y <- sd(df$y)

# III) covariance
cov_xy <- cov(df$x, df$y)

# create 1000 portfolio weights (omegas)
x_weights <- seq(from = 0, to = 1, length.out = 1000)

# create a data.table that contains the weights for the two assets
two_assets <- data.table(wx = x_weights,
wy = 1 - x_weights)

# calculate the expected returns and standard deviations for the 1000 possible portfolios
two_assets[, ':=' (er_p = wx * er_x + wy * er_y,
sd_p = sqrt(wx^2 * sd_x^2 +
wy^2 * sd_y^2 +
2 * wx * (1 - wx) * cov_xy))]

# lastly plot the values
ggplot() +
geom_point(data = two_assets, aes(x = sd_p, y = er_p, color = wx)) +
geom_point(data = data.table(sd = c(sd_x, sd_y), mean = c(er_x, er_y)),
aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +
# Miscellaneous Formatting
theme_bw() + ggtitle("Possible Portfolios with Two Risky Assets") +
xlab("Volatility") + ylab("Expected Returns") +
scale_y_continuous(label = percent, limits = c(0, max(two_assets$er_p) * 1.2)) +
scale_x_continuous(label = percent, limits = c(0, max(two_assets$sd_p) * 1.2)) +
scale_color_continuous(name = expression(omega[x]), labels = percent)

##### Three assets ###################

# load the data
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))

# calculate the necessary values:
# I) expected returns for the two assets
er_x <- mean(df$x)
er_y <- mean(df$y)
er_z <- mean(df$z)

# II) risk (standard deviation) as a risk measure
sd_x <- sd(df$x)
sd_y <- sd(df$y)
sd_z <- sd(df$z)

# III) covariance
cov_xy <- cov(df$x, df$y)
cov_xz <- cov(df$x, df$z)
cov_yz <- cov(df$y, df$z)

# create portfolio weights (omegas)
x_weights <- seq(from = 0, to = 1, length.out = 1000)

# create a data.table that contains the weights for the three assets
three_assets <- data.table(wx = rep(x_weights, each = length(x_weights)),
wy = rep(x_weights, length(x_weights)))

three_assets[, wz := 1 - wx - wy]


# calculate the expected returns and standard deviations for the 1000 possible portfolios
three_assets[, ':=' (er_p = wx * er_x + wy * er_y + wz * er_z,
sd_p = sqrt(wx^2 * sd_x^2 +
wy^2 * sd_y^2 +
wz^2 * sd_z^2 +
2 * wx * wy * cov_xy +
2 * wx * wz * cov_xz +
2 * wy * wz * cov_yz))]

# take out cases where we have negative weights (shortselling)
three_assets <- three_assets[wx >= 0 & wy >= 0 & wz >= 0]
three_assets

# lastly plot the values
ggplot() +
geom_point(data = three_assets, aes(x = sd_p, y = er_p, color = wx - wz)) +
geom_point(data = data.table(sd = c(sd_x, sd_y, sd_z), mean = c(er_x, er_y, er_z)),
aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +
# Miscellaneous Formatting
theme_bw() + ggtitle("Possible Portfolios with Three Risky Assets") +
xlab("Volatility") + ylab("Expected Returns") +
scale_y_continuous(label = percent, limits = c(0, max(three_assets$er_p) * 1.2)) +
scale_x_continuous(label = percent, limits = c(0, max(three_assets$sd_p) * 1.2)) +
scale_color_gradientn(colors = c("red", "blue", "yellow"),
name = expression(omega[x] - omega[z]), labels = percent)

Kategorien
Finance Research

Create your own badman logo in R

This was a fun project to replicate the badman logo using a mathematical function (badman function).

Using the following code you can replicate the image:

require(ggplot2)
require(dplyr)

f1 <- function(x) {
y1 <- 3*sqrt(1-(x/7)^2)
y2 <- -3*sqrt(1-(x/7)^2)
y <- c(y1,y2)
d <- data.frame(x=x,y=y)
d <- d[d$y > -3*sqrt(33)/7,]
return(d)
}

x1 <- c(seq(3, 7, 0.001), seq(-7, -3, 0.001))
d1 <- f1(x1)
p1 <- ggplot(d1,aes(x,y)) + geom_point(color="green")

x2 <- seq(-4,4, 0.001)
y2 <- abs(x2/2)-(3*sqrt(33)-7)*x2^2/112-3 + sqrt(1-(abs(abs(x2)-2)-1)^2)

#only work with ggplot2 <= 0.8.9
#p2 <- p1 + geom_point(aes(x=x2,y=y2), color="yellow")

# in ggplot2 0.9.0, should be:
d2 <- data.frame(x2=x2, y2=y2)
p2 <- p1 + geom_point(data=d2, aes(x=x2,y=y2), color="green")


x3 <- c(seq(0.75,1,0.001), seq(-1,-0.75,0.001))
y3 <- 9-8*abs(x3)
#p3 <- p2+geom_point(aes(x=x3,y=y3), color="green")
d3 <- data.frame(x3=x3, y3=y3)
p3 <- p2+geom_point(data=d3, aes(x=x3,y=y3), color="green")


x4 <- c(seq(0.5,0.75,0.001), seq(-0.75,-0.5,0.001))
y4 <- 3*abs(x4)+0.75
#p4 <- p3+geom_point(aes(x=x4,y=y4), color="steelblue")
d4 <- data.frame(x4=x4,y4=y4)
p4 <- p3+geom_line(data=d4, aes(x=x4,y=y4), color="green")


x5 <- seq(-0.5,0.5,0.001)
y5 <- rep(2.25,length(x5))
#p5 <- p4+geom_point(aes(x=x5,y=y5))
d5 <- data.frame(x5=x5,y5=y5)
p5 <- p4+geom_line(data=d5, aes(x=x5,y=y5),color="green")

x6 <- c(seq(-3,-1,0.001), seq(1,3,0.001))
y6 <- 6 * sqrt(10)/7 +
(1.5 - 0.5 * abs(x6)) * sqrt(abs(abs(x6)-1)/(abs(x6)-1)) -
6 * sqrt(10) * sqrt(4-(abs(x6)-1)^2)/14
#p6 <- p5+geom_point(aes(x=x6,y=y6), colour="blue")
d6 <- data.frame(x6=x6,y6=y6)
p6 <- p5+geom_line(data=d6,aes(x=x6,y=y6), colour="green")

p <- p6

theme_black<- function (base_size = 16, base_family = ""){
theme_minimal() %+replace%
theme( line = element_line(colour = "black", size = 0.5, linetype = 1,
lineend = "butt"),
rect = element_rect(fill = "black",
colour = "black", size = 0.5, linetype = 1),
plot.background = element_rect(colour = 'black', fill = 'black'),
plot.title = element_text(size = rel(1.2)),
panel.border = element_rect(fill = NA, colour = "black"),axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.grid.major = element_blank(), panel.grid.minor = element_blank()
)
}


p+theme_black()

Kategorien
Finance Research

Heteroskedasticity-robust standard errors in R

Writing a financial paper is often associated with an OLS regression model. One major issue can be heteroskedasticity: the variance of the error terms vary. I show you how you can detect heteroskedasticity and how to implement robust standard errors in R.

Step 1: Implement a regression model in R

model1 <- lm(dist ~ speed, data=cars) # initial model using car data

Step 2: Detect heteroskedasticity using the Breush Pagan Test. When the p-value is below the 10% we can reject the null hypothesis that the variance of the residuals is constant. 

require(lmtest)  # load the package
require(vars)

The p-value is below the critical value of 10%.

lmtest::bptest(model1)
	studentized Breusch-Pagan test

data: model1
BP = 3.2149, df = 1, p-value = 0.07297

Step 3: Use Newey-West to correct the standard errors

model2 <- lm(dist~speed,data=cars)
model2$coefficients <- unclass(coeftest(model2, vcov. = NeweyWest))

Step 4: Compare the results

summary(model1)
Call:
lm(formula = dist ~ speed, data = cars)

Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
model2
Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.758e+01 7.018e+00 -2.505e+00 1.570e-02
speed 3.932e+00 5.509e-01 7.138e+00 4.526e-09

Notice that the estimated coefficients are the same but the standard errors of the two models differ! Hence, this affects the statistical significance.