Take Home Exercise 2

Author

Kwek Ming Rong

Published

February 20, 2023

1. Installing and Loading R packages

In this take home exercise 2, 9 packages will used and loaded using pacman.

pacman::p_load(sf, tmap, kableExtra, tidyverse, sfdep, readxl, plyr, Kendall, plotly)

1.1 Importing the Geospatial Data

The code chunk below uses st_read() of sf package to import BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA shapefile into R. The imported shapefile will be simple features Object of sf. As we can see, the assigned coordinates system is WGS 84, the ‘World Geodetic System 1984’. In the context of this dataset, this isn’t appropriate: as this is an Indonesian-specific geospatial dataset, we should be using the national CRS of Indonesia, DGN95, the ‘Datum Geodesi Nasional 1995’, ESPG code 23845. st_transform will be used to rectify the coordinate system

bd_jakarta <- st_read(dsn = "Data/Geospatial", 
                 layer = "BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA")

From the output message we can see that there are 269 features and 161 fields. The assigned CRS is WGS 84, the ‘World Geodetic System 1984’. This is not right, and will be rectify that later.

1.2 Data Pre-Processing

1.2.1 Check for Missing Values

Now lets check if there are any missing values

bd_jakarta[rowSums(is.na(bd_jakarta))!=0,]

There are 2 rows containing ‘NA’ values. However, the data is big, we need to find columns with missing NA values to remove it.

names(which(colSums(is.na(bd_jakarta))>0))

We can see that there are two particular rows with missing values for KAB_KOTA (City), KECAMATAN (District) and DESA_KELUR (Village).

Hence, we remove rows with NA value in DESA_KELUR. There are other columns with NA present as well, however, since we are only looking at the sub-district level, it is most appropriate to remove DESA_KELUR.

bd_jakarta <- na.omit(bd_jakarta,c("DESA_KELUR"))

To double check if the rows with missing values are removed

bd_jakarta[rowSums(is.na(bd_jakarta))!=0,]

1.2.2 Transforming Coordinates

Previously as mentioned it uses the WGS 84 coordinate system. The data is using a Geographic projected system, however, this is system is not appropriate since we need to use distance and area measures.

st_crs(bd_jakarta)

Therefore, we use st_transform() and not st_set_crs() as st_set_crs() assigns the EPSG code to the data frame. And we need to transform the data frame from geographic to projected coordinate system. We will be using crs=23845 (found from the EPSG for Indonesia).

bd_jakarta <- st_transform(bd_jakarta, 23845)

Check if CRS has been assigned

st_crs(bd_jakarta)

1.2.3 Removal of the Outer Island

We have done our basic pre-processing, lets quickly visualize the data

plot(st_geometry(bd_jakarta))

As we can see from the diagram, bd_jakarta includes both mainland and outer islands. And since we don’t require the outer islands (as per the requirements), we can remove them.

We know that the date is grouped by KAB_KOTA (City), KECAMATAN (Sub-District) and DESA_KELUR (Village). Now, lets plot the map and see how we can use KAB_KOTA to remove the outer islands.

tm_shape(bd_jakarta) + 
  tm_polygons("KAB_KOTA")

From the map, we can see that all the cities in Jakarta start with ‘Jakarta’ as their prefix and hence, ‘Kepulauan Seribu’ are the other outer islands. When translated in English, the name means ‘Thousand Islands’. Now we know what to remove, and we shall proceed with that.

bd_jakarta <- filter(bd_jakarta, KAB_KOTA != "KEPULAUAN SERIBU")

Now, lets double check if the outer islands have been removed.

tm_shape(bd_jakarta) + 
  tm_polygons("KAB_KOTA")

1.2.4 To retain the first 9 columns as requested

bd_jakarta <- bd_jakarta[, 0:9]

1.2.5 Renaming columns to English

bd_jakarta <- bd_jakarta %>% 
  dplyr::rename(
    Object_ID=OBJECT_ID,
    Village_Code=KODE_DESA, 
    Village=DESA,
    Code=KODE,
    Province=PROVINSI, 
    City=KAB_KOTA, 
    District=KECAMATAN, 
    Sub_District=DESA_KELUR,
    Total_Population=JUMLAH_PEN
    )

2. Data Wrangling for Aspatial Data

2.1 Importing EDA

For this take home exercise 2, we will be working on data from July 2021 to June 2022, as a result we will be having several excel files.

jul2021 <- read_xlsx("Data/Aspatial/Data Vaksinasi Berbasis Kelurahan (31 Juli 2021).xlsx")

glimpse(jul2021)

From opening up the excel file till February 2022, the number of columns is 27. However, from March 2022 the number of columns is 34. Upon identifying the difference between the number of columns, the data files from March 2022 has a separate column for 3rd dosage, where has all the data files before that don’t have 3rd dosage column.

2.2 Creating Aspatial Data Pre-Processing Function

For take home exercise 2, we don’t require all the columns. Only the following columns are required -

KODE KELURAHAN (Sub-District Code)

KELURAHAN (Sub-District)

SASARAN (Target)

BELUM VASKIN (Yet to be vaccinated / Not yet vaccinated)

This solves the issue of some months having extra columns. However, we need to create an ‘Date’ column that shows the month and year of the observation, which is originally the file name. Each file has the naming convention ’Data Vaksinasi Berbasis Keluarahan (DD Month YYYY).

We will be combining the mentioned steps into a function

# takes in an aspatial data filepath and returns a processed output
aspatial_preprocess <- function(filepath){
  # We have to remove the first row of the file (subheader row) and hence, we use [-1,] to remove it.
  result_file <- read_xlsx(filepath)[-1,]
  
  # We then create the Date Column, the format of our files is: Data Vaksinasi Berbasis Kelurahan (DD Month YYYY)
  # While the start is technically "(", "(" is part of a regular expression and leads to a warning message, so we'll use "Kelurahan" instead. The [[1]] refers to the first element in the list.
  # We're loading it as DD-Month-YYYY format
  # We use the length of the filepath '6' to get the end index (which has our Date)
  # as such, the most relevant functions are substr (returns a substring) and either str_locate (returns location of substring as an integer matrix) or gregexpr (returns a list of locations of substring)
  # reference https://stackoverflow.com/questions/14249562/find-the-location-of-a-character-in-string
  startpoint <- gregexpr(pattern="Kelurahan", filepath)[[1]] + 11
  
  result_file$Date <- substr(filepath, startpoint, nchar(filepath)-6)
  
  # Retain the Relevant Columns
  result_file <- result_file %>% 
    select("Date", 
           "KODE KELURAHAN", 
           "KELURAHAN", 
           "SASARAN", 
           "BELUM VAKSIN")
  return(result_file)
}

2.3 Feed files into Aspatial function

Instead of manually feeding the files, line by line, we will be using the function list.files() and lapply() to get our process done quicker.

# in the folder 'data/aspatial', find files with the extension '.xlsx' and add it to our fileslist 
# the full.names=TRUE prepends the directory path to the file names, giving a relative file path - otherwise, only the file names (not the paths) would be returned 
# reference: https://stat.ethz.ch/R-manual/R-devel/library/base/html/list.files.html
fileslist <-list.files(path = "data/aspatial", pattern = "*.xlsx", full.names=TRUE)

# afterwards, for every element in fileslist, apply aspatial_process function
dflist <- lapply(seq_along(fileslist), function(x) aspatial_preprocess(fileslist[x]))

We will then convert the dflist into an actual dataframe with ldply() using the below code

vaccination_jakarta <- ldply(dflist, data.frame)

Now, lets take a look into our data

glimpse(vaccination_jakarta)

2.4 Formatting Date Column

The Dates are in Bahasa Indonesia, and hence, we need to translate them to English for ease of use. However, since the values in Date column were derived from sub-strings, they are in a string format and thus, first need to be converted to datetime.

# parses the 'Date' column into Month(Full Name)-YYYY datetime objects
# reference: https://stackoverflow.com/questions/53380650/b-y-date-conversion-gives-na

# locale="ind" means that the locale has been set as Indonesia
Sys.setlocale(locale="ind")
vaccination_jakarta$Date <- c(vaccination_jakarta$Date) %>% 
  as.Date(vaccination_jakarta$Date, format ="%d %B %Y")

glimpse(vaccination_jakarta)

2.5 Rename columns into English

# renames the columns in the style New_Name = OLD_NAME
vaccination_jakarta <- vaccination_jakarta %>% 
  dplyr::rename(
    Date=Date,
    Sub_District_Code=KODE.KELURAHAN,
    Sub_District=KELURAHAN, 
    Target=SASARAN, 
    Not_Yet_Vaccinated=BELUM.VAKSIN
    )
glimpse(vaccination_jakarta)

2.6 Further data processing

Further perform any pre-processing to check out for anything we might have missed.

vaccination_jakarta[rowSums(is.na(vaccination_jakarta))!=0,]

From the output, we can see there are no missing values.

3. Geospatial Data Integration

3.1 Initial Exploratory Data Analysis

We have both our Geospatial and Aspatial data, we need to join them. However, we need to first find a common header to join them.

colnames(bd_jakarta)
colnames(vaccination_jakarta)

We can see that both have Sub_District and hence we can join them by the Sub_District and Sub_District_Code.

# joins vaccination_jakarta to jakarta based on Sub_District and  Sub_District_Code
combined_jakarta <- left_join(bd_jakarta, vaccination_jakarta,
                              by=c(
                                "Village_Code"="Sub_District_Code", 
                                "Sub_District"="Sub_District")
                              )

Subcategorize the data into ‘Target population to be Vaccinated’ , ‘Not Yet Vaccinated Population’ and ‘Total Population’

target = tm_shape(combined_jakarta)+
  tm_fill("Target") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title="Target Count")

not_yet_vaccinated = tm_shape(combined_jakarta)+
  tm_fill("Not_Yet_Vaccinated") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title="Not Yet Vaccinated Count")

total_population = tm_shape(combined_jakarta)+
  tm_fill("Total_Population") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title="Total Population")

tmap_arrange(target, not_yet_vaccinated, total_population)

There seems to be still be a ‘Missing’ value in the Target and Not_Yet_Vaccinated maps. Even though, when we had previously checked for missing values, it didn’t show any missing values. However, we shall double check again.

bd_jakarta[rowSums(is.na(bd_jakarta))!=0,]
vaccination_jakarta[rowSums(is.na(vaccination_jakarta))!=0,]

There are no missing values in our dataframes. Therefore, the most likely reasons for the missing values must be due to mismatched values when we perform the left-join of the Geospatial and Aspatial data.

3.2 Finding mismatched sub-district records

Since, we had conducted left-join using the Sub-District, there must be a mismatch in the naming of the subdistricts. Lets check it by looking at the unique subdistrict names in both bd_jakarta and vaccination_jakarta

jakarta_subdistrict <- c(bd_jakarta$Sub_District)
vaccination_subdistrict <- c(vaccination_jakarta$Sub_District)

unique(jakarta_subdistrict[!(jakarta_subdistrict %in% vaccination_subdistrict)])
unique(vaccination_subdistrict[!(vaccination_subdistrict %in% jakarta_subdistrict)])

From above there are same names in both but are just written in different ways. However, there are 6 words in the vaccination_subdistrict which are not in the jakarta_subdistrict. We need to take a look into that after we first correct the mismatched values.

# initialise a dataframe of our cases vs bd subdistrict spelling
spelling <- data.frame(
  Aspatial_Cases=c("BALE KAMBANG", "HALIM PERDANA KUSUMAH", "JATI PULO", "KAMPUNG TENGAH", "KERENDANG", "KRAMAT JATI", "PAL MERIAM", "PINANG RANTI", "RAWA JATI"),
  Geospatial_BD=c("BALEKAMBAG", "HALIM PERDANA KUSUMA", "JATIPULO", "TENGAH", "KRENDANG", "KRAMATJATI", "PALMERIAM", "PINANGRANTI", "RAWAJATI")
  )

# with dataframe a input, outputs a kable
library(knitr)
library(kableExtra)
kable(spelling, caption="Mismatched Records") %>%
  kable_material("hover", latex_options="scale_down")

As we can see these records have the same name, except that there is no standardization. Therefore, there is a mismatch between them. Let’s correct this mismatch

# We are replacing the mistmatched values in jakarta with the correct value
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'BALEKAMBANG'] <- 'BALE KAMBANG'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'HALIM PERDANA KUSUMA'] <- 'HALIM PERDANA KUSUMAH'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'JATIPULO'] <- 'JATI PULO'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'KALI BARU'] <- 'KALIBARU'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'TENGAH'] <- 'KAMPUNG TENGAH'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'KRAMATJATI'] <- 'KRAMAT JATI'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'KRENDANG'] <- 'KERENDANG'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'PALMERIAM'] <- 'PAL MERIAM'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'PINANGRANTI'] <- 'PINANG RANTI'
bd_jakarta$Sub_District[bd_jakarta$Sub_District == 'RAWAJATI'] <- 'RAWA JATI'

There are 6 subdistrict names that we say in vaccination_jakarta which were not present in jakarta. This ideally suggests that these districts are not a part of Jakarta, Therefore we need to remove them.

vaccination_jakarta <- vaccination_jakarta[!(vaccination_jakarta$Sub_District=="PULAU HARAPAN" | vaccination_jakarta$Sub_District=="PULAU KELAPA" | vaccination_jakarta$Sub_District=="PULAU PANGGANG" | vaccination_jakarta$Sub_District=="PULAU PARI" | vaccination_jakarta$Sub_District=="PULAU TIDUNG" | vaccination_jakarta$Sub_District=="PULAU UNTUNG JAWA"), ]

3.3 Rejoin Exploratory Data Analysis

# joins vaccination_jakarta to bd_jakarta based on Sub_District and  Sub_District_Code
combined_jakarta <- left_join(bd_jakarta, vaccination_jakarta,
                              by=c(
                                "Village_Code"="Sub_District_Code", 
                                "Sub_District"="Sub_District")
                              )

Check if there are any further NA values

combined_jakarta[rowSums(is.na(combined_jakarta))!=0,]

Relook the data into ‘Target population to be Vaccinated’ , ‘Not Yet Vaccinated Population’ and ‘Total Population’

target = tm_shape(combined_jakarta)+
  tm_fill("Target") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title="Target Count")

not_yet_vaccinated = tm_shape(combined_jakarta)+
  tm_fill("Not_Yet_Vaccinated") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title="Not Yet Vaccinated Count")

total_population = tm_shape(combined_jakarta)+
  tm_fill("Total_Population") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title="Total Population")

tmap_arrange(target, not_yet_vaccinated, total_population)

4. Calculation for monthly vaccination rate

We need to compute the monthly vaccination rate in % at the sub-district level.

We use ‘Target’ -SASARAN instead of Population because the government excludes people aged 14 and below for vaccination.

# grouping based on the sub-district and date
vaccination_rate <- vaccination_jakarta %>%
  inner_join(bd_jakarta, by=c("Sub_District" = "Sub_District")) %>%
  group_by(Sub_District, Date) %>%
  dplyr::summarise(`vaccination_rate` = ((Target-Not_Yet_Vaccinated)/Target)*100) %>%
  
  #afterwards, pivots the table based on the Dates, using the cumulative case rate as the values
  ungroup() %>% pivot_wider(names_from = Date,
              values_from = vaccination_rate)

Let us look into how the computation is

vaccination_rate

4.1 Convert dataframe to SF

combined_jakarta <- st_as_sf(combined_jakarta)

# need to join our previous dataframes with the geospatial data to ensure that geometry column is present
vaccination_rate <- vaccination_rate%>% left_join(bd_jakarta, by=c("Sub_District"="Sub_District"))
vaccination_rate <- st_as_sf(vaccination_rate)

5 Choropleth Mapping and performing Analysis

There are a few ways to classify data in Choropleth maps such as using Equal Interval, Quantile or Jenks.

For this take home exercise 2, I will be using Jenks as this method uses statistical algorithm to group data into classes in the distribution of values. In addition it suits low variance. As a result it will accurately reflects the distribution of values in the data.

5.1 Jenks Choropleth Mapping

# using the jenks method, with 6 classes for human eye
tmap_mode("plot")
tm_shape(vaccination_rate)+
  tm_fill("2021-07-31", 
          n= 6,
          style = "jenks", 
          title = "Vaccination Rate") +
  tm_layout(main.title = "Vaccination Rate in July 2021",
            main.title.position = "center",
            main.title.size = 1,
            legend.height = 0.5, 
            legend.width = 0.4,
            frame = TRUE) +
  tm_borders(alpha = 0.5)

Plot for all 12 months. Adopt a helper function to help us do it.

# input: the dataframe and the variable name - in this case, the month 
# with style="jenks" for the jenks classification method
jenks_plot <- function(df, varname) {
  tm_shape(vaccination_rate) +
    tm_polygons() +
  tm_shape(df) +
    tm_fill(varname, 
          n= 6,
          style = "jenks", 
          title = "Vaccination Rate") +
    tm_layout(main.title = varname,
          main.title.position = "center",
          main.title.size = 1.2,
          legend.height = 0.45, 
          legend.width = 0.35,
          frame = TRUE) +
    tm_borders(alpha = 0.5)
}
tmap_mode("plot")
tmap_arrange(jenks_plot(vaccination_rate, "2021-07-31"),
             jenks_plot(vaccination_rate, "2021-08-31"),
             jenks_plot(vaccination_rate, "2021-09-30"),
             jenks_plot(vaccination_rate, "2021-10-31"))
tmap_mode("plot")
tmap_arrange(jenks_plot(vaccination_rate, "2021-11-30"),
             jenks_plot(vaccination_rate, "2021-12-31"),
             jenks_plot(vaccination_rate, "2022-01-31"),
             jenks_plot(vaccination_rate, "2022-02-27"))
tmap_mode("plot")
tmap_arrange(jenks_plot(vaccination_rate, "2022-03-31"),
             jenks_plot(vaccination_rate, "2022-04-30"),
             jenks_plot(vaccination_rate, "2022-05-31"),
             jenks_plot(vaccination_rate, "2022-06-30"))

Observations from the plotted map

Each map has its own relative vaccination rate: the ranges gradually grow larger over time with the greater number of people getting vaccinated. By comparing the increasing rates over the months, there are a number of observations we can make

In the early stages (July 2021 ~ October 2021), there is a visible darkly-coloured cluster around the north of Jakarta. In the section below, we learned that this is the KAMAL MUARA and HALIM PERDANA KUSUMAH sub-district with the highest vaccination rate.

Between (November 2021 ~ February 2022, other sub districts have darken in colour and the HALIM PERDANA KUSUMAH still remains the sub-district with the highest vaccination rate.

In the later stages of vaccination from March 2022, based on the observation there were more sub-districts with lower vaccination rate (lighter colour). Especially the majority of sub-districts in the North and West seems to have a low vaccination rate in comparison to others. However, HALIM PERDANA KUSUMAH still remains the sub-district with the highest vaccination rate.

Checking for sub-districts with highest vaccination rate according to month

vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-07-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-08-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-09-30`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-10-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-11-30`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2021-12-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-01-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-02-27`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-03-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-04-30`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-05-31`)]
vaccination_rate$Sub_District[which.max(vaccination_rate$`2022-06-30`)]

6. Local GI* Analysis

A Local Gi* Analysis will be conducted. It is also known as local spatial autocorrelation which will be used to identify ideas sub-districts in Jakarta with high or low vaccination rate. Time-series analysis will be conducted to understand the evolution of spatial hot spots and cold spots across time.

Interpretation of Gi* values

Gi∗ > 0 which indicates sub-districts with higher vaccination rate than average

Gi∗ < 0 which indicates sub-districts with higher vaccination rate than average

6.1 Calculation of Local GI* of monthly vaccination rate

# Make new vaccination attribute table with Date, Sub_District, Target, Not_Yet_Vaccinated
vaccination_table <- combined_jakarta %>% select(10, 8, 11, 12) %>% st_drop_geometry()

# Adding a new field for Vaccination_Rate
vaccination_table$Vaccination_Rate <- ((vaccination_table$Target - vaccination_table$Not_Yet_Vaccinated) / vaccination_table$Target) *100

# Vaccination attribute table with just Date, Sub_District, Vaccination_Rate
vaccination_table <- tibble(vaccination_table %>% select(1,2,5))

6.2 Create Time Series Cube

vaccination_rate_st <- spacetime(vaccination_table, bd_jakarta,
                          .loc_col = "Sub_District",
                          .time_col = "Date")

Verify if vaccination_rate_st is indeed a space-time cube by using the is_spacetime_cube() of sfdep package.

is_spacetime_cube(vaccination_rate_st)

6.3 Deriving Spatial Weights

Calculation of local Gi* weights will be done. However, before that we need derive the spatial weights. The below code chunk is used to identify neighbors and derive an inverse distance weights.

vaccination_rate_nb <- vaccination_rate_st %>%
  activate("geometry") %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_inverse_distance(nb, geometry,
                                  scale=1,
                                  alpha=1),
         .before=1) %>%
  set_nbs("nb") %>%
  set_wts("wt")

Note that

  • activate() is used to activate the geometry context

  • mutate() is used to create two new columns nb and wt.

  • Then we will activate the data context again and copy over the nb and wt columns to each time-slice using set_nbs() and set_wts()

    • row order is very important so do not rearrange the observations after using set_nbs() or set_wts().

The dataset provided has neighbours and weights for each time slicing

head(vaccination_rate_nb)

set.seed() will be use before performing simulation to ensure that the computation is reproducible. When a random number generator is used, the results can be different each time the code is run, which makes it difficult to reproduce results. By setting the seed to a specific value (e.g., set.seed(1234)), the same random numbers will be generated each time the code is run, making the results reproducible and consistent.

set.seed(1234)

6.4 Calculation of GI* value

The calculation of the Gi* value for each sub-district where we group by date

gi_values <- vaccination_rate_nb |>
  group_by(Date) |>
  mutate(gi_values = local_gstar_perm(
    Vaccination_Rate, nb, wt, nsim=99)) |>
      tidyr::unnest(gi_values)

gi_values

6.5 Visualise the monthly values of GI*

To be able to visualise the Gi* values of the monthly vaccination rate, we need to join it with combined_jakarta, to be able to plot the Gi* values on the map. As the gi_values do not have any coordinates

jakarta_gi_values <- combined_jakarta %>%
  left_join(gi_values)

jakarta_gi_values

We will proceed with visualizing the first month (July 2021). We will be plotting both the Gi* value and the p-value of Gi* for the Vaccination Rates.

As per take home exercise 2 requirement, we will only be plotting the significant p-value < 0.05

gi_value_plot <- function(date, title) {
  gi_star_map = tm_shape(filter(jakarta_gi_values, Date == date)) +
    tm_fill("gi_star") +
    tm_borders(alpha=0.5) +
    tm_view(set.zoom.limits = c(6,8)) +
    tm_layout(main.title = paste("Gi* values for vaccination rates in", title), main.title.size=0.8)

  p_value_map = tm_shape(filter(jakarta_gi_values, Date == date)) +
    tm_fill("p_value",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) +
    tm_borders(alpha=0.5) + 
    tm_layout(main.title = paste("p-values of Gi* for vaccination rates in", title), main.title.size=0.8)

  tmap_arrange(gi_star_map, p_value_map)
}

Plotting Gi* for all 12 months

tmap_mode("plot")
gi_value_plot("2021-07-31", "July 2021")
tmap_mode("plot")
gi_value_plot("2021-08-31", "August 2021")
tmap_mode("plot")
gi_value_plot("2021-09-30", "September 2021")
tmap_mode("plot")
gi_value_plot("2021-10-31", "October 2021")
tmap_mode("plot")
gi_value_plot("2021-11-30", "November 2021")
tmap_mode("plot")
gi_value_plot("2021-12-31", "December 2021")
tmap_mode("plot")
gi_value_plot("2022-01-31", "January 2022")
tmap_mode("plot")
gi_value_plot("2022-02-27", "February 2022")
tmap_mode("plot")
gi_value_plot("2022-03-31", "March 2022")
tmap_mode("plot")
gi_value_plot("2022-04-30", "April 2022")
tmap_mode("plot")
gi_value_plot("2022-05-31", "May 2022")
tmap_mode("plot")
gi_value_plot("2022-06-30", "June 2022")

Statistical conclusion

The p-value represents the probability of observing a clustering. A significant p-value < 0.05 suggests that an observed pattern is unlikely to have occurred by chance and may indicate the presence of a spatial process. When Gi* value > 0, it indicates sub-districts with a higher vaccination rate than average. We can view the number of sub districts with p-value < 0.05 with the code below

no_of_subdistricts_freq = filter(jakarta_gi_values, p_sim < 0.05)
as.data.frame(table(no_of_subdistricts_freq$Sub_District))

From the table above, there are 128 sub-districts who have a significant vaccination rate p-value < 0.05 at least once during the period of 12 months. Those sub-districts that have double digits frequency have a significant p value throughout the entire 12 months.

7. Emerging Hot Spot Analysis (EHSA)

Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method for revealing and describing how hot spot and cold spot areas evolve over time. Previously we have already built a space-time cube and calculated the Gi. Therefore, we can directly conduct the Mann-Kendall trend test to evaluate 3 sub-districts for a trend. The 3 sub-districts chosen would be HALIM PERDANA KUSUMAH, TUGU UTARA and ULUJAMI.

7.1 Mann-Kendall Test

7.1.1 HALIM PERDANA KUSUMAH

halim <- gi_values |>
  ungroup() |>
  filter(Sub_District == "HALIM PERDANA KUSUMAH") |>
  select(Sub_District, Date, gi_star)

Plotting the result by using ggplotly

p <- ggplot(data = halim, 
       aes(x = Date, 
           y = gi_star)) +
  geom_line() +
  theme_light()

ggplotly(p)
halim %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)

The p-value is 0.086 which is > 0.05 hence p-value is not significant. Therefore, this is an upward but insignificant trend.

7.1.2 TUGU UTARA

tugu <- gi_values |>
  ungroup() |>
  filter(Sub_District == "TUGU UTARA") |>
  select(Sub_District, Date, gi_star)

Plotting the result by using ggplotly

p <- ggplot(data = tugu, 
       aes(x = Date, 
           y = gi_star)) +
  geom_line() +
  theme_light()

ggplotly(p)
tugu %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)

The p-value is 0.149 which is > 0.05 hence p-value is not significant. Initially, the line chart is a downtrend but from January 2022 onward it begins its upward trend but it is still insignificant.

7.1.3 ULUJAMI

ulujami <- gi_values |>
  ungroup() |>
  filter(Sub_District == "ULUJAMI") |>
  select(Sub_District, Date, gi_star)

Plotting the result by using ggplotly

p <- ggplot(data = ulujami, 
       aes(x = Date, 
           y = gi_star)) +
  geom_line() +
  theme_light()

ggplotly(p)
ulujami %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)

The p-value is 0.003 which is < 0.05 hence p-value is significant. Therefore, this is an upward but significant trend.

7.2 EHSA map of the Gi* value

For us to find the significant hot and cold spots, there is a need to conduct the Mann Kendall test on all the subdistricts out there. Therefore, the group_by() function will be used for all subdistricts.

ehsa <- gi_values %>%
  group_by(Sub_District) %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>%
  tidyr::unnest_wider(mk)

Show significant top 10 emerging hot/cold spots area

emerging <- ehsa %>% 
  arrange(sl, abs(tau)) %>% 
  slice(1:10)
emerging

emerging_hotspot_analysis() of sfdep package will be used to perform EHSA analysis. It takes a spacetime object x (i.e. vaccination_rate_st), and the quoted name of the variable of interest (i.e. Vaccinaton Rate) for .var argument. The k argument is used to specify the number of time lags which is set to 1 by default.

ehsa <- emerging_hotspot_analysis(
  x = vaccination_rate_st,
  .var = "Vaccination_Rate",
  k = 1,
  nsim = 99
)

Visualisation of distribution

ggplot(data = ehsa,
       aes(x=classification, fill=classification)) + 
  geom_bar()

The barchart above shows that sporadic hot spots class has the highest numbers.

Left join of combine jakarta and ehsa together

jakarta_ehsa <- bd_jakarta %>%
  left_join(ehsa, by = c("Sub_District" = "location"))

Visualisation of classification using tmap

# We use the filter to filter out values with p-value < 0.05
jakarta_ehsa_sig <- jakarta_ehsa  %>%
  filter(p_value < 0.05)
tmap_mode("plot")
tm_shape(jakarta_ehsa) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(jakarta_ehsa_sig) +
  tm_fill("classification") + 
  tm_borders(alpha = 0.4)

The maps shows spordiac coldspot is spread evenly out in Jakarta. For oscillating hotspot it is lesser than spordiac coldspot. Oscilating coldspot can be found to be more around the border and in the central of Jakarta. There is also a large number of oscilating hotspot spread out evenly around Jakarta. From the map, there is no obvious pattern and lastly, sub districts shaded in grey are of p value > 0.05 which represents that the sub districts are insignificant.

End of take home exercise 2

*Note to Professor*

Some of the code chunks above are referenced from

  • Megan’s IS415 Journey https://is415-msty.netlify.app/posts/2021-09-10-take-home-exercise-1/

  • Professor In Class Exercises 7 https://is415-gaa-tskam.netlify.app/in-class_ex/in-class_ex07/in-class_ex07-glsa and https://is415-gaa-tskam.netlify.app/in-class_ex/in-class_ex07/in-class_ex07_ehsa