Kategorien

## Advanced web scraping techniques in R – or how to obtain data from the web

Web scraping has become one of the data scientists‘ essential skills. In this short tutorial I will teach you some awesome web scraping techniques that you can use for your academic research.

My R-Code can be accessed here.

Leave a like if this short tutorial helped you extracting the desired data.

Kategorien

## Financial markets in Switzerland: A network analysis

The term Network is often associated with Social Networks, the Internet or the human brain. However, more recently Network Analysis has become an interdisciplinary field of research and can also be applied to model the interdependence of financial markets. In this short blog, I show you how you can implement your own Network in R.

As always you need to get some data to work with first. I provide here some data of Eikon of all listed stocks in Switzerland (you find the data here: SIX-Total Return Prices). The data contains total return prices of some listed stocks in Switzerland.

Step 1: Load the data and calculate returns – this can be easily done in R.

`# clean global environmentrm(list=ls())# packagesrequire(networkD3)require(igraph)library(visNetwork)# set working directorysetwd("C:/Users/lliebi/Dropbox/Daten Luca/PhD/Research/Network Analysis/Data")prices <- read.csv("SwitzerlandStocks.csv",sep=";")# clean the dataprices\$Name <- as.Date(as.character(prices\$Name),format="%d.%m.%Y")# calculate returns functionreturn.calculation <- function(price){returns <- rep(NA,length(price))for(i in 2:length(price)){returns[i] <- (price[i]-price[i-1])/price[i-1]}return(returns)}`
`# create a new dataframe with all the returnsreturns<-as.data.frame(apply(prices[,2:ncol(prices)],2,return.calculation))returns <- cbind(prices\$Name,returns)colnames(returns)[1] <- "Date"# only stocks with complete observationsreturns <- returns[-1,]# delete col that contain missing valuesfinal<-returns[colSums(!is.na(returns))==nrow(returns)]`

There are several methodologies to model the interdependence of stocks that can be used (e.g. Granger Causality,  Spillover tables, Correlations, …).
I use a very easy and intuitive measure introduced by Diebold and Yilmaz. Furthermore, a helpful package is provided in R that you don’t have to calculate the Spillover table yourself.

`require(frequencyConnectedness)number.stocks <- 50library(stringr)colnames(final)[2:number.stocks] <- word(colnames(final)[2:number.stocks],1,sep = "\\.")# Step 1: Find the correct var modelVARselect(final[,2:number.stocks], lag.max = 2, type = "const")# you can see that the lowest AIC information criterion is found within lag 2# therefore specify a VAR(2) model# Step 2: Implement a VAR model for all the stocks in the samplevar.lag2 <- VAR(final[,2:number.stocks], p = 2, type = "const") # the [,-1] is due to the date in the first column# With this model you can also predict 10 days ahead returnsvar.f10 <- predict(var.lag2, n.ahead = 10, ci = 0.95)# Step 3: Calculate the spillovers# here use the function in the frequencyConnectedness packagespillover <- spilloverDY09(var.lag2, n.ahead = 10, no.corr = F)# get the spillover tablesolution <- as.data.frame(spillover\$tables)*100`

Step 3: get the Net Spillover and visualize the Network

`# get Net spilloversnet.spillovers <- matrix(NA,nrow=number.stocks-1,ncol=number.stocks-1)colnames(net.spillovers) <- colnames(solution)rownames(net.spillovers) <- rownames(solution)net.spillovers[lower.tri(net.spillovers)] <-solution[lower.tri(solution)]-solution[upper.tri(solution)]net.spillovers[upper.tri(net.spillovers)] <-solution[upper.tri(solution)]-solution[lower.tri(solution)]net.spillovers<-ifelse(net.spillovers>0,net.spillovers,0)# Step 4: Create your own networkm <- t((net.spillovers))net=graph.adjacency(m,mode="directed",weighted=TRUE,diag=F)set.seed(1)plot.igraph(net,vertex.label.color="black",edge.color="darkgrey",edge.arrow.size=0.2,layout=layout.fruchterman.reingold,edge.curved=F,edge.lty=1,frame=F,vertex.size=5,vertex.color=rainbow(number.stocks),vertex.label.dist=0.0)degree(net,mode = "in")degree(net,mode = "out")`

If you wish to do another visualization you can use the „network“ package.

`links <- as.data.frame(get.edgelist(net))net = network(links, directed = TRUE)# network plotrequire(network)ggnet2(net, alpha = 0.75, size = 4, edge.alpha = 0.5,color = "black",label=T,label.size = 1.5,label.color = "darkgrey")`

A 3D visualization can be here: Network Analysis.

Kategorien

## Option valuation using Black-Scholes

Financial options have an intrinsic and a time value. The intrinsic value for a call option is simply the spot (S) minus the strike price (X). The time value of the call option can be derived using the Black-Scholes formula. The resulting price of the option minus the intrinsic value of the option results in the time value of the option.

The following graph illustrates the intrinsic value (red line), the price of the option (grey line) and the time value of the option (dark grey area).

Using the following code you can replicate the figure:

`spot <- seq(1,100,by=1)strike <- 50riskfree <- 0time <- 1standarddev <- 0.2d1 <- (log(spot/strike)+(0+standarddev^2/2)*1)/(standarddev*time)d2 <- d1-standarddev*time^(1/2)value.call <- pnorm(d1,0,1)*spot-pnorm(d2,0,1)*strike*exp(-riskfree*time)inner.value <- spot-strikeinner.value <- pmax(inner.value,0)require(ggplot2)ggplot()+  geom_line(aes(spot,value.call))+  geom_line(aes(spot,inner.value),colour="red")+ geom_ribbon(aes(spot,ymin=value.call,ymax=inner.value),fill="darkgrey")`

Kategorien

## Efficient frontier in Finance

In every finance class, one of the first topics students are confronted with is the efficient frontier. Even though the intuition behind the efficient frontier might be easy to grasp, I show you that it can also be very easy to derive the efficient frontier – even using multiple assets.

Using the following code you can replicate the image:

`stocks <- c("TSLA","AAPL", "FB")require(PerformanceAnalytics)require(quantmod)getSymbols(stocks)x<- dailyReturn(TSLA)y <- dailyReturn(AAPL)z <- dailyReturn(FB)g <- as.data.frame(cbind(x,y,z))a <-as.data.frame(cov(g[2000:2620,]))library(data.table)library(scales)library(ggplot2)link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/fin_data.csv"dt <- data.table(read.csv(link))dt[, date := as.Date(date)]# create indexed valuesdt[, idx_price := price/price[1], by = ticker]# plot the indexed valuesggplot(dt, aes(x = date, y = idx_price, color = ticker)) +  geom_line() +  # Miscellaneous Formatting  theme_bw() + ggtitle("Price Developments") +  xlab("Date") + ylab("Pricen(Indexed 2000 = 1)") +  scale_color_discrete(name = "Company")# calculate the arithmetic returnsdt[, ret := price / shift(price, 1) - 1, by = ticker]# summary table# take only non-na valuestab <- dt[!is.na(ret), .(ticker, ret)]# calculate the expected returns (historical mean of returns) and volatility (standard deviation of returns)tab <- tab[, .(er = round(mean(ret), 4),               sd = round(sd(ret), 4)),           by = "ticker"]ggplot(tab, aes(x = sd, y = er, color = ticker)) +  geom_point(size = 5) +  # Miscellaneous Formatting  theme_bw() + ggtitle("Risk-Return Tradeoff") +  xlab("Volatility") + ylab("Expected Returns") +  scale_y_continuous(label = percent, limits = c(0, 0.03)) +  scale_x_continuous(label = percent, limits = c(0, 0.1))link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"df <- data.table(read.csv(link))# calculate the necessary values:# I) expected returns for the two assetser_x <- mean(df\$x)er_y <- mean(df\$y)# II) risk (standard deviation) as a risk measuresd_x <- sd(df\$x)sd_y <- sd(df\$y)# III) covariancecov_xy <- cov(df\$x, df\$y)# create 1000 portfolio weights (omegas)x_weights <- seq(from = 0, to = 1, length.out = 1000)# create a data.table that contains the weights for the two assetstwo_assets <- data.table(wx = x_weights,                         wy = 1 - x_weights)# calculate the expected returns and standard deviations for the 1000 possible portfoliostwo_assets[, ':=' (er_p = wx * er_x + wy * er_y,                   sd_p = sqrt(wx^2 * sd_x^2 +                                 wy^2 * sd_y^2 +                                 2 * wx * (1 - wx) * cov_xy))]# lastly plot the valuesggplot() +  geom_point(data = two_assets, aes(x = sd_p, y = er_p, color = wx)) +  geom_point(data = data.table(sd = c(sd_x, sd_y), mean = c(er_x, er_y)),             aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +  # Miscellaneous Formatting  theme_bw() + ggtitle("Possible Portfolios with Two Risky Assets") +  xlab("Volatility") + ylab("Expected Returns") +  scale_y_continuous(label = percent, limits = c(0, max(two_assets\$er_p) * 1.2)) +  scale_x_continuous(label = percent, limits = c(0, max(two_assets\$sd_p) * 1.2)) +  scale_color_continuous(name = expression(omega[x]), labels = percent)##### Three assets #################### load the datalink <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"df <- data.table(read.csv(link))# calculate the necessary values:# I) expected returns for the two assetser_x <- mean(df\$x)er_y <- mean(df\$y)er_z <- mean(df\$z)# II) risk (standard deviation) as a risk measuresd_x <- sd(df\$x)sd_y <- sd(df\$y)sd_z <- sd(df\$z)# III) covariancecov_xy <- cov(df\$x, df\$y)cov_xz <- cov(df\$x, df\$z)cov_yz <- cov(df\$y, df\$z)# create portfolio weights (omegas)x_weights <- seq(from = 0, to = 1, length.out = 1000)# create a data.table that contains the weights for the three assetsthree_assets <- data.table(wx = rep(x_weights, each = length(x_weights)),                           wy = rep(x_weights, length(x_weights)))three_assets[, wz := 1 - wx - wy]# calculate the expected returns and standard deviations for the 1000 possible portfoliosthree_assets[, ':=' (er_p = wx * er_x + wy * er_y + wz * er_z,                     sd_p = sqrt(wx^2 * sd_x^2 +                                   wy^2 * sd_y^2 +                                   wz^2 * sd_z^2 +                                   2 * wx * wy * cov_xy +                                   2 * wx * wz * cov_xz +                                   2 * wy * wz * cov_yz))]# take out cases where we have negative weights (shortselling)three_assets <- three_assets[wx >= 0 & wy >= 0 & wz >= 0]three_assets# lastly plot the valuesggplot() +  geom_point(data = three_assets, aes(x = sd_p, y = er_p, color = wx - wz)) +  geom_point(data = data.table(sd = c(sd_x, sd_y, sd_z), mean = c(er_x, er_y, er_z)),             aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +  # Miscellaneous Formatting  theme_bw() + ggtitle("Possible Portfolios with Three Risky Assets") +  xlab("Volatility") + ylab("Expected Returns") +  scale_y_continuous(label = percent, limits = c(0, max(three_assets\$er_p) * 1.2)) +  scale_x_continuous(label = percent, limits = c(0, max(three_assets\$sd_p) * 1.2)) +  scale_color_gradientn(colors = c("red", "blue", "yellow"),                        name = expression(omega[x] - omega[z]), labels = percent)`

Kategorien

This was a fun project to replicate the badman logo using a mathematical function (badman function).

Using the following code you can replicate the image:

`require(ggplot2)require(dplyr)f1 <- function(x) {y1 <- 3*sqrt(1-(x/7)^2)y2 <- -3*sqrt(1-(x/7)^2)y <- c(y1,y2)d <- data.frame(x=x,y=y)d <- d[d\$y > -3*sqrt(33)/7,]return(d)}x1 <- c(seq(3, 7, 0.001), seq(-7, -3, 0.001))d1 <- f1(x1)p1 <- ggplot(d1,aes(x,y)) + geom_point(color="green")x2 <- seq(-4,4, 0.001)y2 <- abs(x2/2)-(3*sqrt(33)-7)*x2^2/112-3 + sqrt(1-(abs(abs(x2)-2)-1)^2)#only work with ggplot2 <= 0.8.9#p2 <- p1 + geom_point(aes(x=x2,y=y2), color="yellow")# in ggplot2 0.9.0, should be:d2 <- data.frame(x2=x2, y2=y2)p2 <- p1 + geom_point(data=d2, aes(x=x2,y=y2), color="green")x3 <- c(seq(0.75,1,0.001), seq(-1,-0.75,0.001))y3 <- 9-8*abs(x3)#p3 <- p2+geom_point(aes(x=x3,y=y3), color="green")d3 <- data.frame(x3=x3, y3=y3)p3 <- p2+geom_point(data=d3, aes(x=x3,y=y3), color="green")x4 <- c(seq(0.5,0.75,0.001), seq(-0.75,-0.5,0.001))y4 <- 3*abs(x4)+0.75#p4 <- p3+geom_point(aes(x=x4,y=y4), color="steelblue")d4 <- data.frame(x4=x4,y4=y4)p4 <- p3+geom_line(data=d4, aes(x=x4,y=y4), color="green")x5 <- seq(-0.5,0.5,0.001)y5 <- rep(2.25,length(x5))#p5 <- p4+geom_point(aes(x=x5,y=y5))d5 <- data.frame(x5=x5,y5=y5)p5 <- p4+geom_line(data=d5, aes(x=x5,y=y5),color="green")x6 <- c(seq(-3,-1,0.001), seq(1,3,0.001))y6 <- 6 * sqrt(10)/7 +(1.5 - 0.5 * abs(x6)) * sqrt(abs(abs(x6)-1)/(abs(x6)-1)) -6 * sqrt(10) * sqrt(4-(abs(x6)-1)^2)/14#p6 <- p5+geom_point(aes(x=x6,y=y6), colour="blue")d6 <- data.frame(x6=x6,y6=y6)p6 <- p5+geom_line(data=d6,aes(x=x6,y=y6), colour="green")p <- p6theme_black<- function (base_size = 16, base_family = ""){theme_minimal() %+replace% theme( line = element_line(colour = "black", size = 0.5, linetype = 1, lineend = "butt"), rect = element_rect(fill = "black", colour = "black", size = 0.5, linetype = 1), plot.background = element_rect(colour = 'black', fill = 'black'),plot.title = element_text(size = rel(1.2)),panel.border = element_rect(fill = NA, colour = "black"),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),axis.title.y=element_blank(),axis.text.y=element_blank(),axis.ticks.y=element_blank(),panel.grid.major = element_blank(), panel.grid.minor = element_blank())}p+theme_black()`

Kategorien

## Heteroskedasticity-robust standard errors in R

Writing a financial paper is often associated with an OLS regression model. One major issue can be heteroskedasticity: the variance of the error terms vary. I show you how you can detect heteroskedasticity and how to implement robust standard errors in R.

Step 1: Implement a regression model in R

`model1 <- lm(dist ~ speed, data=cars) # initial model using car data`

Step 2: Detect heteroskedasticity using the Breush Pagan Test. When the p-value is below the 10% we can reject the null hypothesis that the variance of the residuals is constant.

`require(lmtest)  # load the packagerequire(vars)`

The p-value is below the critical value of 10%.

`lmtest::bptest(model1)`
`	studentized Breusch-Pagan testdata:  model1BP = 3.2149, df = 1, p-value = 0.07297`

Step 3: Use Newey-West to correct the standard errors

`model2 <- lm(dist~speed,data=cars)model2\$coefficients <- unclass(coeftest(model2, vcov. = NeweyWest))`

Step 4: Compare the results

`summary(model1)`
`Call:lm(formula = dist ~ speed, data = cars)Residuals:    Min      1Q  Median      3Q     Max -29.069  -9.525  -2.272   9.215  43.201 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept) -17.5791     6.7584  -2.601   0.0123 *  speed         3.9324     0.4155   9.464 1.49e-12 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 15.38 on 48 degrees of freedomMultiple R-squared:  0.6511,	Adjusted R-squared:  0.6438 F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12`
`model2`
`Call:lm(formula = dist ~ speed, data = cars)Coefficients:             Estimate    Std. Error  t value     Pr(>|t|)  (Intercept)  -1.758e+01   7.018e+00  -2.505e+00   1.570e-02speed         3.932e+00   5.509e-01   7.138e+00   4.526e-09`

Notice that the estimated coefficients are the same but the standard errors of the two models differ! Hence, this affects the statistical significance.