::p_load(sf, tmap, kableExtra, tidyverse, sfdep, readxl, plyr, Kendall, plotly) pacman
Take Home Exercise 2
Take-home Exercise 2: Spatio-temporal Analysis of COVID-19 Vaccination Trends at the Sub-district Level, DKI Jakarta
Setting the Scene
Since late December 2019, an outbreak of a novel coronavirus disease (COVID-19; previously known as 2019-nCoV) was reported in Wuhan, China, which had subsequently affected 210 countries worldwide. In general, COVID-19 is an acute resolved disease but it can also be deadly, with a 2% case fatality rate.
The COVID-19 vaccination in Indonesia is an ongoing mass immunisation in response to the COVID-19 pandemic in Indonesia. On 13 January 2021, the program commenced when President Joko Widodo was vaccinated at the presidential palace. In terms of total doses given, Indonesia ranks third in Asia and fifth in the world.
According to wikipedia, as of 5 February 2023 at 18:00 WIB (UTC+7), 204,266,655 people had received the first dose of the vaccine and 175,131,893 people had been fully vaccinated; 69,597,474 of them had been inoculated with the booster or the third dose, while 1,585,164 had received the fourth dose. Jakarta has the highest percentage of population fully vaccinated with 103.46%, followed by Bali and Special Region of Yogyakarta with 85.45% and 83.02% respectively.
Despite its compactness, the cumulative vaccination rate are not evenly distributed within DKI Jakarta. The question is where are the sub-districts with relatively higher number of vaccination rate and how they changed over time.
Objectives
Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate Local Indicators of Spatial Association (LISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatio-temporal trends of COVID-19 vaccination in DKI Jakarta.
f## The Task
The specific tasks of this take-home exercise are as follows:
Choropleth Mapping and Analysis
Compute the monthly vaccination rate from July 2021 to June 2022 at sub-district (also known as kelurahan in Bahasa Indonesia) level,
Prepare the monthly vaccination rate maps by using appropriate tmap functions,
Describe the spatial patterns revealed by the choropleth maps (not more than 200 words).
Local Gi* Analysis
With reference to the vaccination rate maps prepared in ESDA:
Compute local Gi* values of the monthly vaccination rate,
Display the Gi* maps of the monthly vaccination rate. The maps should only display the significant (i.e. p-value < 0.05)
With reference to the analysis results, draw statistical conclusions (not more than 250 words).
Emerging Hot Spot Analysis (EHSA)
With reference to the local Gi* values of the vaccination rate maps prepared in the previous section:
Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,
Select three sub-districts and describe the temporal trends revealed (not more than 250 words), and
Prepared a EHSA map of the Gi* values of vaccination rate. The maps should only display the significant (i.e. p-value < 0.05).
With reference to the EHSA map prepared, describe the spatial patterns revealed. (not more than 250 words).
The Data
Aspatial data
For the purpose of this assignment, data from Riwayat File Vaksinasi DKI Jakarta will be used. Daily vaccination data are provides. You are only required to download either the first day of the month or last day of the month of the study period.
Geospatial data
For the purpose of this study, DKI Jakarta administration boundary 2019 will be used. The data set can be downloaded at Indonesia Geospatial portal, specifically at this page.
Note
The national Projected Coordinates Systems of Indonesia is DGN95 / Indonesia TM-3 zone 54.1.
Exclude all the outer islands from the DKI Jakarta sf data frame, and
Retain the first nine fields in the DKI Jakarta sf data frame. The ninth field JUMLAH_PEN = Total Population.
Reference was taken from the senior sample submissions for the code for this section, with credit to Megan - https://is415-msty.netlify.app/posts/2021-09-10-take-home-exercise-1/
1. Installing and Loading R packages
In this take home exercise 2, 9 packages will used and loaded using pacman.
1.1 Importing the Geospatial Data
The code chunk below uses st_read() of sf package to import BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA shapefile into R. The imported shapefile will be simple features Object of sf. As we can see, the assigned coordinates system is WGS 84, the ‘World Geodetic System 1984’. In the context of this dataset, this isn’t appropriate: as this is an Indonesian-specific geospatial dataset, we should be using the national CRS of Indonesia, DGN95, the ‘Datum Geodesi Nasional 1995’, ESPG code 23845. st_transform will be used to rectify the coordinate system
<- st_read(dsn = "Data/Geospatial",
bd_jakarta layer = "BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA")
From the output message we can see that there are 269 features and 161 fields. The assigned CRS is WGS 84, the ‘World Geodetic System 1984’. This is not right, and will be rectify that later.
1.2 Data Pre-Processing
1.2.1 Check for Missing Values
Now lets check if there are any missing values
rowSums(is.na(bd_jakarta))!=0,] bd_jakarta[
There are 2 rows containing ‘NA’ values. However, the data is big, we need to find columns with missing NA values to remove it.
names(which(colSums(is.na(bd_jakarta))>0))
We can see that there are two particular rows with missing values for KAB_KOTA (City), KECAMATAN (District) and DESA_KELUR (Village).
Hence, we remove rows with NA value in DESA_KELUR. There are other columns with NA present as well, however, since we are only looking at the sub-district level, it is most appropriate to remove DESA_KELUR.
<- na.omit(bd_jakarta,c("DESA_KELUR")) bd_jakarta
To double check if the rows with missing values are removed
rowSums(is.na(bd_jakarta))!=0,] bd_jakarta[
1.2.2 Transforming Coordinates
Previously as mentioned it uses the WGS 84 coordinate system. The data is using a Geographic projected system, however, this is system is not appropriate since we need to use distance and area measures.
st_crs(bd_jakarta)
Therefore, we use st_transform() and not st_set_crs() as st_set_crs() assigns the EPSG code to the data frame. And we need to transform the data frame from geographic to projected coordinate system. We will be using crs=23845 (found from the EPSG for Indonesia).
<- st_transform(bd_jakarta, 23845) bd_jakarta
Check if CRS has been assigned
st_crs(bd_jakarta)
1.2.3 Removal of the Outer Island
We have done our basic pre-processing, lets quickly visualize the data
plot(st_geometry(bd_jakarta))
As we can see from the diagram, bd_jakarta includes both mainland and outer islands. And since we don’t require the outer islands (as per the requirements), we can remove them.
We know that the date is grouped by KAB_KOTA (City), KECAMATAN (Sub-District) and DESA_KELUR (Village). Now, lets plot the map and see how we can use KAB_KOTA to remove the outer islands.
tm_shape(bd_jakarta) +
tm_polygons("KAB_KOTA")
From the map, we can see that all the cities in Jakarta start with ‘Jakarta’ as their prefix and hence, ‘Kepulauan Seribu’ are the other outer islands. When translated in English, the name means ‘Thousand Islands’. Now we know what to remove, and we shall proceed with that.
<- filter(bd_jakarta, KAB_KOTA != "KEPULAUAN SERIBU") bd_jakarta
Now, lets double check if the outer islands have been removed.
tm_shape(bd_jakarta) +
tm_polygons("KAB_KOTA")
1.2.4 To retain the first 9 columns as requested
<- bd_jakarta[, 0:9] bd_jakarta
1.2.5 Renaming columns to English
<- bd_jakarta %>%
bd_jakarta ::rename(
dplyrObject_ID=OBJECT_ID,
Village_Code=KODE_DESA,
Village=DESA,
Code=KODE,
Province=PROVINSI,
City=KAB_KOTA,
District=KECAMATAN,
Sub_District=DESA_KELUR,
Total_Population=JUMLAH_PEN
)
2. Data Wrangling for Aspatial Data
2.1 Importing EDA
For this take home exercise 2, we will be working on data from July 2021 to June 2022, as a result we will be having several excel files.
<- read_xlsx("Data/Aspatial/Data Vaksinasi Berbasis Kelurahan (31 Juli 2021).xlsx")
jul2021
glimpse(jul2021)
From opening up the excel file till February 2022, the number of columns is 27. However, from March 2022 the number of columns is 34. Upon identifying the difference between the number of columns, the data files from March 2022 has a separate column for 3rd dosage, where has all the data files before that don’t have 3rd dosage column.
2.2 Creating Aspatial Data Pre-Processing Function
For take home exercise 2, we don’t require all the columns. Only the following columns are required -
KODE KELURAHAN (Sub-District Code)
KELURAHAN (Sub-District)
SASARAN (Target)
BELUM VASKIN (Yet to be vaccinated / Not yet vaccinated)
This solves the issue of some months having extra columns. However, we need to create an ‘Date’ column that shows the month and year of the observation, which is originally the file name. Each file has the naming convention ’Data Vaksinasi Berbasis Keluarahan (DD Month YYYY).
We will be combining the mentioned steps into a function
# takes in an aspatial data filepath and returns a processed output
<- function(filepath){
aspatial_preprocess # We have to remove the first row of the file (subheader row) and hence, we use [-1,] to remove it.
<- read_xlsx(filepath)[-1,]
result_file
# We then create the Date Column, the format of our files is: Data Vaksinasi Berbasis Kelurahan (DD Month YYYY)
# While the start is technically "(", "(" is part of a regular expression and leads to a warning message, so we'll use "Kelurahan" instead. The [[1]] refers to the first element in the list.
# We're loading it as DD-Month-YYYY format
# We use the length of the filepath '6' to get the end index (which has our Date)
# as such, the most relevant functions are substr (returns a substring) and either str_locate (returns location of substring as an integer matrix) or gregexpr (returns a list of locations of substring)
# reference https://stackoverflow.com/questions/14249562/find-the-location-of-a-character-in-string
<- gregexpr(pattern="Kelurahan", filepath)[[1]] + 11
startpoint
$Date <- substr(filepath, startpoint, nchar(filepath)-6)
result_file
# Retain the Relevant Columns
<- result_file %>%
result_file select("Date",
"KODE KELURAHAN",
"KELURAHAN",
"SASARAN",
"BELUM VAKSIN")
return(result_file)
}
2.3 Feed files into Aspatial function
Instead of manually feeding the files, line by line, we will be using the function list.files() and lapply() to get our process done quicker.
# in the folder 'data/aspatial', find files with the extension '.xlsx' and add it to our fileslist
# the full.names=TRUE prepends the directory path to the file names, giving a relative file path - otherwise, only the file names (not the paths) would be returned
# reference: https://stat.ethz.ch/R-manual/R-devel/library/base/html/list.files.html
<-list.files(path = "data/aspatial", pattern = "*.xlsx", full.names=TRUE)
fileslist
# afterwards, for every element in fileslist, apply aspatial_process function
<- lapply(seq_along(fileslist), function(x) aspatial_preprocess(fileslist[x])) dflist
We will then convert the dflist into an actual dataframe with ldply() using the below code
<- ldply(dflist, data.frame) vaccination_jakarta
Now, lets take a look into our data
glimpse(vaccination_jakarta)
2.4 Formatting Date Column
The Dates are in Bahasa Indonesia, and hence, we need to translate them to English for ease of use. However, since the values in Date column were derived from sub-strings, they are in a string format and thus, first need to be converted to datetime.
# parses the 'Date' column into Month(Full Name)-YYYY datetime objects
# reference: https://stackoverflow.com/questions/53380650/b-y-date-conversion-gives-na
# locale="ind" means that the locale has been set as Indonesia
Sys.setlocale(locale="ind")
$Date <- c(vaccination_jakarta$Date) %>%
vaccination_jakartaas.Date(vaccination_jakarta$Date, format ="%d %B %Y")
glimpse(vaccination_jakarta)
2.5 Rename columns into English
# renames the columns in the style New_Name = OLD_NAME
<- vaccination_jakarta %>%
vaccination_jakarta ::rename(
dplyrDate=Date,
Sub_District_Code=KODE.KELURAHAN,
Sub_District=KELURAHAN,
Target=SASARAN,
Not_Yet_Vaccinated=BELUM.VAKSIN
)
glimpse(vaccination_jakarta)
2.6 Further data processing
Further perform any pre-processing to check out for anything we might have missed.
rowSums(is.na(vaccination_jakarta))!=0,] vaccination_jakarta[
From the output, we can see there are no missing values.
3. Geospatial Data Integration
3.1 Initial Exploratory Data Analysis
We have both our Geospatial and Aspatial data, we need to join them. However, we need to first find a common header to join them.
colnames(bd_jakarta)
colnames(vaccination_jakarta)
We can see that both have Sub_District and hence we can join them by the Sub_District and Sub_District_Code.
# joins vaccination_jakarta to jakarta based on Sub_District and Sub_District_Code
<- left_join(bd_jakarta, vaccination_jakarta,
combined_jakarta by=c(
"Village_Code"="Sub_District_Code",
"Sub_District"="Sub_District")
)
Subcategorize the data into ‘Target population to be Vaccinated’ , ‘Not Yet Vaccinated Population’ and ‘Total Population’
= tm_shape(combined_jakarta)+
target tm_fill("Target") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Target Count")
= tm_shape(combined_jakarta)+
not_yet_vaccinated tm_fill("Not_Yet_Vaccinated") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Not Yet Vaccinated Count")
= tm_shape(combined_jakarta)+
total_population tm_fill("Total_Population") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Total Population")
tmap_arrange(target, not_yet_vaccinated, total_population)
There seems to be still be a ‘Missing’ value in the Target and Not_Yet_Vaccinated maps. Even though, when we had previously checked for missing values, it didn’t show any missing values. However, we shall double check again.
rowSums(is.na(bd_jakarta))!=0,] bd_jakarta[
rowSums(is.na(vaccination_jakarta))!=0,] vaccination_jakarta[
There are no missing values in our dataframes. Therefore, the most likely reasons for the missing values must be due to mismatched values when we perform the left-join of the Geospatial and Aspatial data.
3.2 Finding mismatched sub-district records
Since, we had conducted left-join using the Sub-District, there must be a mismatch in the naming of the subdistricts. Lets check it by looking at the unique subdistrict names in both bd_jakarta and vaccination_jakarta
<- c(bd_jakarta$Sub_District)
jakarta_subdistrict <- c(vaccination_jakarta$Sub_District)
vaccination_subdistrict
unique(jakarta_subdistrict[!(jakarta_subdistrict %in% vaccination_subdistrict)])
unique(vaccination_subdistrict[!(vaccination_subdistrict %in% jakarta_subdistrict)])
From above there are same names in both but are just written in different ways. However, there are 6 words in the vaccination_subdistrict which are not in the jakarta_subdistrict. We need to take a look into that after we first correct the mismatched values.
# initialise a dataframe of our cases vs bd subdistrict spelling
<- data.frame(
spelling Aspatial_Cases=c("BALE KAMBANG", "HALIM PERDANA KUSUMAH", "JATI PULO", "KAMPUNG TENGAH", "KERENDANG", "KRAMAT JATI", "PAL MERIAM", "PINANG RANTI", "RAWA JATI"),
Geospatial_BD=c("BALEKAMBAG", "HALIM PERDANA KUSUMA", "JATIPULO", "TENGAH", "KRENDANG", "KRAMATJATI", "PALMERIAM", "PINANGRANTI", "RAWAJATI")
)
# with dataframe a input, outputs a kable
library(knitr)
library(kableExtra)
kable(spelling, caption="Mismatched Records") %>%
kable_material("hover", latex_options="scale_down")
As we can see these records have the same name, except that there is no standardization. Therefore, there is a mismatch between them. Let’s correct this mismatch
# We are replacing the mistmatched values in jakarta with the correct value
$Sub_District[bd_jakarta$Sub_District == 'BALEKAMBANG'] <- 'BALE KAMBANG'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'HALIM PERDANA KUSUMA'] <- 'HALIM PERDANA KUSUMAH'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'JATIPULO'] <- 'JATI PULO'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'KALI BARU'] <- 'KALIBARU'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'TENGAH'] <- 'KAMPUNG TENGAH'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'KRAMATJATI'] <- 'KRAMAT JATI'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'KRENDANG'] <- 'KERENDANG'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'PALMERIAM'] <- 'PAL MERIAM'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'PINANGRANTI'] <- 'PINANG RANTI'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'RAWAJATI'] <- 'RAWA JATI' bd_jakarta
There are 6 subdistrict names that we say in vaccination_jakarta which were not present in jakarta. This ideally suggests that these districts are not a part of Jakarta, Therefore we need to remove them.
<- vaccination_jakarta[!(vaccination_jakarta$Sub_District=="PULAU HARAPAN" | vaccination_jakarta$Sub_District=="PULAU KELAPA" | vaccination_jakarta$Sub_District=="PULAU PANGGANG" | vaccination_jakarta$Sub_District=="PULAU PARI" | vaccination_jakarta$Sub_District=="PULAU TIDUNG" | vaccination_jakarta$Sub_District=="PULAU UNTUNG JAWA"), ] vaccination_jakarta
3.3 Rejoin Exploratory Data Analysis
# joins vaccination_jakarta to bd_jakarta based on Sub_District and Sub_District_Code
<- left_join(bd_jakarta, vaccination_jakarta,
combined_jakarta by=c(
"Village_Code"="Sub_District_Code",
"Sub_District"="Sub_District")
)
Check if there are any further NA values
rowSums(is.na(combined_jakarta))!=0,] combined_jakarta[
Relook the data into ‘Target population to be Vaccinated’ , ‘Not Yet Vaccinated Population’ and ‘Total Population’
= tm_shape(combined_jakarta)+
target tm_fill("Target") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Target Count")
= tm_shape(combined_jakarta)+
not_yet_vaccinated tm_fill("Not_Yet_Vaccinated") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Not Yet Vaccinated Count")
= tm_shape(combined_jakarta)+
total_population tm_fill("Total_Population") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Total Population")
tmap_arrange(target, not_yet_vaccinated, total_population)
4. Calculation for monthly vaccination rate
We need to compute the monthly vaccination rate in % at the sub-district level.
We use ‘Target’ -SASARAN instead of Population because the government excludes people aged 14 and below for vaccination.
# grouping based on the sub-district and date
<- vaccination_jakarta %>%
vaccination_rate inner_join(bd_jakarta, by=c("Sub_District" = "Sub_District")) %>%
group_by(Sub_District, Date) %>%
::summarise(`vaccination_rate` = ((Target-Not_Yet_Vaccinated)/Target)*100) %>%
dplyr
#afterwards, pivots the table based on the Dates, using the cumulative case rate as the values
ungroup() %>% pivot_wider(names_from = Date,
values_from = vaccination_rate)
Let us look into how the computation is
vaccination_rate
4.1 Convert dataframe to SF
<- st_as_sf(combined_jakarta)
combined_jakarta
# need to join our previous dataframes with the geospatial data to ensure that geometry column is present
<- vaccination_rate%>% left_join(bd_jakarta, by=c("Sub_District"="Sub_District"))
vaccination_rate <- st_as_sf(vaccination_rate) vaccination_rate
5 Choropleth Mapping and performing Analysis
There are a few ways to classify data in Choropleth maps such as using Equal Interval, Quantile or Jenks.
For this take home exercise 2, I will be using Jenks as this method uses statistical algorithm to group data into classes in the distribution of values. In addition it suits low variance. As a result it will accurately reflects the distribution of values in the data.
5.1 Jenks Choropleth Mapping
# using the jenks method, with 6 classes for human eye
tmap_mode("plot")
tm_shape(vaccination_rate)+
tm_fill("2021-07-31",
n= 6,
style = "jenks",
title = "Vaccination Rate") +
tm_layout(main.title = "Vaccination Rate in July 2021",
main.title.position = "center",
main.title.size = 1,
legend.height = 0.5,
legend.width = 0.4,
frame = TRUE) +
tm_borders(alpha = 0.5)
Plot for all 12 months. Adopt a helper function to help us do it.
# input: the dataframe and the variable name - in this case, the month
# with style="jenks" for the jenks classification method
<- function(df, varname) {
jenks_plot tm_shape(vaccination_rate) +
tm_polygons() +
tm_shape(df) +
tm_fill(varname,
n= 6,
style = "jenks",
title = "Vaccination Rate") +
tm_layout(main.title = varname,
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5)
}
tmap_mode("plot")
tmap_arrange(jenks_plot(vaccination_rate, "2021-07-31"),
jenks_plot(vaccination_rate, "2021-08-31"),
jenks_plot(vaccination_rate, "2021-09-30"),
jenks_plot(vaccination_rate, "2021-10-31"))
tmap_mode("plot")
tmap_arrange(jenks_plot(vaccination_rate, "2021-11-30"),
jenks_plot(vaccination_rate, "2021-12-31"),
jenks_plot(vaccination_rate, "2022-01-31"),
jenks_plot(vaccination_rate, "2022-02-27"))
tmap_mode("plot")
tmap_arrange(jenks_plot(vaccination_rate, "2022-03-31"),
jenks_plot(vaccination_rate, "2022-04-30"),
jenks_plot(vaccination_rate, "2022-05-31"),
jenks_plot(vaccination_rate, "2022-06-30"))
Observations from the plotted map
Each map has its own relative vaccination rate: the ranges gradually grow larger over time with the greater number of people getting vaccinated. By comparing the increasing rates over the months, there are a number of observations we can make
In the early stages (July 2021 ~ October 2021), there is a visible darkly-coloured cluster around the north of Jakarta. In the section below, we learned that this is the KAMAL MUARA and HALIM PERDANA KUSUMAH sub-district with the highest vaccination rate.
Between (November 2021 ~ February 2022, other sub districts have darken in colour and the HALIM PERDANA KUSUMAH still remains the sub-district with the highest vaccination rate.
In the later stages of vaccination from March 2022, based on the observation there were more sub-districts with lower vaccination rate (lighter colour). Especially the majority of sub-districts in the North and West seems to have a low vaccination rate in comparison to others. However, HALIM PERDANA KUSUMAH still remains the sub-district with the highest vaccination rate.
Checking for sub-districts with highest vaccination rate according to month
$Sub_District[which.max(vaccination_rate$`2021-07-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-08-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-09-30`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-10-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-11-30`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-12-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-01-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-02-27`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-03-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-04-30`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-05-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-06-30`)] vaccination_rate
6. Local GI* Analysis
A Local Gi* Analysis will be conducted. It is also known as local spatial autocorrelation which will be used to identify ideas sub-districts in Jakarta with high or low vaccination rate. Time-series analysis will be conducted to understand the evolution of spatial hot spots and cold spots across time.
Interpretation of Gi* values
Gi∗ > 0 which indicates sub-districts with higher vaccination rate than average
Gi∗ < 0 which indicates sub-districts with higher vaccination rate than average
6.1 Calculation of Local GI* of monthly vaccination rate
# Make new vaccination attribute table with Date, Sub_District, Target, Not_Yet_Vaccinated
<- combined_jakarta %>% select(10, 8, 11, 12) %>% st_drop_geometry()
vaccination_table
# Adding a new field for Vaccination_Rate
$Vaccination_Rate <- ((vaccination_table$Target - vaccination_table$Not_Yet_Vaccinated) / vaccination_table$Target) *100
vaccination_table
# Vaccination attribute table with just Date, Sub_District, Vaccination_Rate
<- tibble(vaccination_table %>% select(1,2,5)) vaccination_table
6.2 Create Time Series Cube
<- spacetime(vaccination_table, bd_jakarta,
vaccination_rate_st .loc_col = "Sub_District",
.time_col = "Date")
Verify if vaccination_rate_st is indeed a space-time cube by using the is_spacetime_cube() of sfdep package.
is_spacetime_cube(vaccination_rate_st)
6.3 Deriving Spatial Weights
Calculation of local Gi* weights will be done. However, before that we need derive the spatial weights. The below code chunk is used to identify neighbors and derive an inverse distance weights.
<- vaccination_rate_st %>%
vaccination_rate_nb activate("geometry") %>%
mutate(nb = include_self(st_contiguity(geometry)),
wt = st_inverse_distance(nb, geometry,
scale=1,
alpha=1),
.before=1) %>%
set_nbs("nb") %>%
set_wts("wt")
Note that
activate()
is used to activate the geometry contextmutate()
is used to create two new columns nb and wt.Then we will activate the data context again and copy over the nb and wt columns to each time-slice using
set_nbs()
andset_wts()
- row order is very important so do not rearrange the observations after using
set_nbs()
orset_wts()
.
- row order is very important so do not rearrange the observations after using
The dataset provided has neighbours and weights for each time slicing
head(vaccination_rate_nb)
set.seed() will be use before performing simulation to ensure that the computation is reproducible. When a random number generator is used, the results can be different each time the code is run, which makes it difficult to reproduce results. By setting the seed to a specific value (e.g., set.seed(1234)), the same random numbers will be generated each time the code is run, making the results reproducible and consistent.
set.seed(1234)
6.4 Calculation of GI* value
The calculation of the Gi* value for each sub-district where we group by date
<- vaccination_rate_nb |>
gi_values group_by(Date) |>
mutate(gi_values = local_gstar_perm(
nsim=99)) |>
Vaccination_Rate, nb, wt, ::unnest(gi_values)
tidyr
gi_values
6.5 Visualise the monthly values of GI*
To be able to visualise the Gi* values of the monthly vaccination rate, we need to join it with combined_jakarta, to be able to plot the Gi* values on the map. As the gi_values do not have any coordinates
<- combined_jakarta %>%
jakarta_gi_values left_join(gi_values)
jakarta_gi_values
We will proceed with visualizing the first month (July 2021). We will be plotting both the Gi* value and the p-value of Gi* for the Vaccination Rates.
As per take home exercise 2 requirement, we will only be plotting the significant p-value < 0.05
<- function(date, title) {
gi_value_plot = tm_shape(filter(jakarta_gi_values, Date == date)) +
gi_star_map tm_fill("gi_star") +
tm_borders(alpha=0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = paste("Gi* values for vaccination rates in", title), main.title.size=0.8)
= tm_shape(filter(jakarta_gi_values, Date == date)) +
p_value_map tm_fill("p_value",
breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_borders(alpha=0.5) +
tm_layout(main.title = paste("p-values of Gi* for vaccination rates in", title), main.title.size=0.8)
tmap_arrange(gi_star_map, p_value_map)
}
Plotting Gi* for all 12 months
tmap_mode("plot")
gi_value_plot("2021-07-31", "July 2021")
tmap_mode("plot")
gi_value_plot("2021-08-31", "August 2021")
tmap_mode("plot")
gi_value_plot("2021-09-30", "September 2021")
tmap_mode("plot")
gi_value_plot("2021-10-31", "October 2021")
tmap_mode("plot")
gi_value_plot("2021-11-30", "November 2021")
tmap_mode("plot")
gi_value_plot("2021-12-31", "December 2021")
tmap_mode("plot")
gi_value_plot("2022-01-31", "January 2022")
tmap_mode("plot")
gi_value_plot("2022-02-27", "February 2022")
tmap_mode("plot")
gi_value_plot("2022-03-31", "March 2022")
tmap_mode("plot")
gi_value_plot("2022-04-30", "April 2022")
tmap_mode("plot")
gi_value_plot("2022-05-31", "May 2022")
tmap_mode("plot")
gi_value_plot("2022-06-30", "June 2022")
Statistical conclusion
The p-value represents the probability of observing a clustering. A significant p-value < 0.05 suggests that an observed pattern is unlikely to have occurred by chance and may indicate the presence of a spatial process. When Gi* value > 0, it indicates sub-districts with a higher vaccination rate than average. We can view the number of sub districts with p-value < 0.05 with the code below
= filter(jakarta_gi_values, p_sim < 0.05)
no_of_subdistricts_freq as.data.frame(table(no_of_subdistricts_freq$Sub_District))
From the table above, there are 128 sub-districts who have a significant vaccination rate p-value < 0.05 at least once during the period of 12 months. Those sub-districts that have double digits frequency have a significant p value throughout the entire 12 months.
7. Emerging Hot Spot Analysis (EHSA)
Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method for revealing and describing how hot spot and cold spot areas evolve over time. Previously we have already built a space-time cube and calculated the Gi. Therefore, we can directly conduct the Mann-Kendall trend test to evaluate 3 sub-districts for a trend. The 3 sub-districts chosen would be HALIM PERDANA KUSUMAH, TUGU UTARA and ULUJAMI.
7.1 Mann-Kendall Test
7.1.1 HALIM PERDANA KUSUMAH
<- gi_values |>
halim ungroup() |>
filter(Sub_District == "HALIM PERDANA KUSUMAH") |>
select(Sub_District, Date, gi_star)
Plotting the result by using ggplotly
<- ggplot(data = halim,
p aes(x = Date,
y = gi_star)) +
geom_line() +
theme_light()
ggplotly(p)
%>%
halim summarise(mk = list(
unclass(
::MannKendall(gi_star)))) %>%
Kendall::unnest_wider(mk) tidyr
The p-value is 0.086 which is > 0.05 hence p-value is not significant. Therefore, this is an upward but insignificant trend.
7.1.2 TUGU UTARA
<- gi_values |>
tugu ungroup() |>
filter(Sub_District == "TUGU UTARA") |>
select(Sub_District, Date, gi_star)
Plotting the result by using ggplotly
<- ggplot(data = tugu,
p aes(x = Date,
y = gi_star)) +
geom_line() +
theme_light()
ggplotly(p)
%>%
tugu summarise(mk = list(
unclass(
::MannKendall(gi_star)))) %>%
Kendall::unnest_wider(mk) tidyr
The p-value is 0.149 which is > 0.05 hence p-value is not significant. Initially, the line chart is a downtrend but from January 2022 onward it begins its upward trend but it is still insignificant.
7.1.3 ULUJAMI
<- gi_values |>
ulujami ungroup() |>
filter(Sub_District == "ULUJAMI") |>
select(Sub_District, Date, gi_star)
Plotting the result by using ggplotly
<- ggplot(data = ulujami,
p aes(x = Date,
y = gi_star)) +
geom_line() +
theme_light()
ggplotly(p)
%>%
ulujami summarise(mk = list(
unclass(
::MannKendall(gi_star)))) %>%
Kendall::unnest_wider(mk) tidyr
The p-value is 0.003 which is < 0.05 hence p-value is significant. Therefore, this is an upward but significant trend.
7.2 EHSA map of the Gi* value
For us to find the significant hot and cold spots, there is a need to conduct the Mann Kendall test on all the subdistricts out there. Therefore, the group_by() function will be used for all subdistricts.
<- gi_values %>%
ehsa group_by(Sub_District) %>%
summarise(mk = list(
unclass(
::MannKendall(gi_star)))) %>%
Kendall::unnest_wider(mk) tidyr
Show significant top 10 emerging hot/cold spots area
<- ehsa %>%
emerging arrange(sl, abs(tau)) %>%
slice(1:10)
emerging
emerging_hotspot_analysis() of sfdep package will be used to perform EHSA analysis. It takes a spacetime object x (i.e. vaccination_rate_st), and the quoted name of the variable of interest (i.e. Vaccinaton Rate) for .var argument. The k argument is used to specify the number of time lags which is set to 1 by default.
<- emerging_hotspot_analysis(
ehsa x = vaccination_rate_st,
.var = "Vaccination_Rate",
k = 1,
nsim = 99
)
Visualisation of distribution
ggplot(data = ehsa,
aes(x=classification, fill=classification)) +
geom_bar()
The barchart above shows that sporadic hot spots class has the highest numbers.
Left join of combine jakarta and ehsa together
<- bd_jakarta %>%
jakarta_ehsa left_join(ehsa, by = c("Sub_District" = "location"))
Visualisation of classification using tmap
# We use the filter to filter out values with p-value < 0.05
<- jakarta_ehsa %>%
jakarta_ehsa_sig filter(p_value < 0.05)
tmap_mode("plot")
tm_shape(jakarta_ehsa) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(jakarta_ehsa_sig) +
tm_fill("classification") +
tm_borders(alpha = 0.4)
The maps shows spordiac coldspot is spread evenly out in Jakarta. For oscillating hotspot it is lesser than spordiac coldspot. Oscilating coldspot can be found to be more around the border and in the central of Jakarta. There is also a large number of oscilating hotspot spread out evenly around Jakarta. From the map, there is no obvious pattern and lastly, sub districts shaded in grey are of p value > 0.05 which represents that the sub districts are insignificant.
End of take home exercise 2
*Note to Professor*
Some of the code chunks above are referenced from
Megan’s IS415 Journey https://is415-msty.netlify.app/posts/2021-09-10-take-home-exercise-1/
Professor In Class Exercises 7 https://is415-gaa-tskam.netlify.app/in-class_ex/in-class_ex07/in-class_ex07-glsa and https://is415-gaa-tskam.netlify.app/in-class_ex/in-class_ex07/in-class_ex07_ehsa