Monash University
2023-04-26
A 3rd year PhD student in the Department of Econometrics and Business Statistics, Monash University, Australia
My research centers on exploring multivariate spatio-temporal data with data wrangling and visualisation tool.
Find me on
huizezhangsh
,huizezhang-sherry
, andPeople can talk about a whole range of differnt things when they only refer to their data as spatio-temporal!
The focus of today will be on vector data
Physical sensors that measure the temperature, rainfall, wind speed & direction, water level, etc
A recent blog post on Vector Data Cubes by Edzer Pebesma
stations_dt
)# A tibble: 30 × 6
id long lat elev name wmo_id
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 ASN00060139 153. -31.4 4.2 port macquarie airport aws 94786
2 ASN00068228 151. -34.4 10 bellambi aws 94749
# … with 28 more rows
sf
(stations_sf
)Simple feature collection with 30 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 114.0967 ymin: -41.8739 xmax: 152.8655 ymax: -11.6502
Geodetic CRS: GDA94
# A tibble: 30 × 5
id elev name wmo_id geometry
<chr> <dbl> <chr> <dbl> <POINT [°]>
1 ASN00060139 4.2 port macquarie airport aws 94786 (152.8655 -31.4336)
2 ASN00068228 10 bellambi aws 94749 (150.9291 -34.3691)
# … with 28 more rows
Simple feature collection with 2 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 150.9291 ymin: -34.3691 xmax: 152.8655 ymax: -31.4336
Geodetic CRS: GDA94
# A tibble: 2 × 7
id lat long elev name wmo_id geometry
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <POINT [°]>
1 ASN00060139 -31.4 153. 4.2 port macquarie… 94786 (152.8655 -31.4336)
2 ASN00068228 -34.4 151. 10 bellambi aws 94749 (150.9291 -34.3691)
Geometry set for 2 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 150.9291 ymin: -34.3691 xmax: 152.8655 ymax: -31.4336
Geodetic CRS: GDA94
POINT (152.8655 -31.4336)
POINT (150.9291 -34.3691)
POINT (152.8655 -31.4336)
The point sfg is a paired vector with special class labels
POINT (152.8655 -31.4336)
[1] "double"
$class
[1] "XY" "POINT" "sfg"
[1] 152.8655 -31.4336
The POINT (152.8655 -31.4336)
format is called well-known text, which is a human-readable encoding used by sf
The sfc is a list of sfg with special attributes
Geometry set for 1 feature
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 152.8655 ymin: -31.4336 xmax: 152.8655 ymax: -31.4336
Geodetic CRS: GDA94
POINT (152.8655 -31.4336)
POINT (150.9291 -34.3691)
[1] "list"
[1] "class" "precision" "bbox" "crs" "n_empty"
[[1]]
POINT (152.8655 -31.4336)
[[2]]
POINT (150.9291 -34.3691)
The linestring sfg is a matrix of paired doubles
LINESTRING (149.2317 -35.222, 149.2716 -35.2708, 149.3153 -35.27623, 149.3972 -35.32425, 149.3363 -35.33988, 149.2493 -35.33013, 149.2045 -35.34761, 149.1464 -35.4153, 149.1352, ...)
[1] "double"
[,1] [,2]
[1,] 149.2317 -35.22200
[2,] 149.2716 -35.27080
[3,] 149.3153 -35.27623
[4,] 149.3972 -35.32425
[5,] 149.3363 -35.33988
[6,] 149.2493 -35.33013
[7,] 149.2045 -35.34761
[8,] 149.1464 -35.41530
[9,] 149.1352 -35.45369
[10,] 149.1516 -35.49320
[11,] 149.1325 -35.54950
[12,] 149.1405 -35.58829
[13,] 149.0781 -35.58642
[14,] 149.0950 -35.67682
[15,] 149.1094 -35.69659
[16,] 149.0907 -35.76562
[17,] 149.1011 -35.80506
[18,] 149.0488 -35.92041
[19,] 149.0122 -35.89970
[20,] 148.9584 -35.89179
[21,] 148.9127 -35.85546
[22,] 148.8894 -35.72272
[23,] 148.8347 -35.74051
[24,] 148.7884 -35.69781
[25,] 148.7985 -35.66698
[26,] 148.7689 -35.60426
[27,] 148.7628 -35.49550
[28,] 148.7858 -35.40891
[29,] 148.8087 -35.38246
[30,] 148.7935 -35.33912
[31,] 148.8101 -35.30745
[32,] 149.1209 -35.12442
[33,] 149.1891 -35.16552
[34,] 149.1903 -35.19435
[35,] 149.2317 -35.22200
POLYGON sfg is a list where each element is a matrix of paired vectors
POLYGON ((149.2317 -35.222, 149.2716 -35.2708, 149.3153 -35.27623, 149.3972 -35.32425, 149.3363 -35.33988, 149.2493 -35.33013, ...))
[1] "list"
[[1]]
[,1] [,2]
[1,] 149.2317 -35.22200
[2,] 149.2716 -35.27080
[3,] 149.3153 -35.27623
[4,] 149.3972 -35.32425
[5,] 149.3363 -35.33988
[6,] 149.2493 -35.33013
[7,] 149.2045 -35.34761
[8,] 149.1464 -35.41530
[9,] 149.1352 -35.45369
[10,] 149.1516 -35.49320
[11,] 149.1325 -35.54950
[12,] 149.1405 -35.58829
[13,] 149.0781 -35.58642
[14,] 149.0950 -35.67682
[15,] 149.1094 -35.69659
[16,] 149.0907 -35.76562
[17,] 149.1011 -35.80506
[18,] 149.0488 -35.92041
[19,] 149.0122 -35.89970
[20,] 148.9584 -35.89179
[21,] 148.9127 -35.85546
[22,] 148.8894 -35.72272
[23,] 148.8347 -35.74051
[24,] 148.7884 -35.69781
[25,] 148.7985 -35.66698
[26,] 148.7689 -35.60426
[27,] 148.7628 -35.49550
[28,] 148.7858 -35.40891
[29,] 148.8087 -35.38246
[30,] 148.7935 -35.33912
[31,] 148.8101 -35.30745
[32,] 149.1209 -35.12442
[33,] 149.1891 -35.16552
[34,] 149.1903 -35.19435
[35,] 149.2317 -35.22200
MULTIPOLYGON sfg is a nested list where each list element can contain multiple matrices of paired vectors
MULTIPOLYGON (((145.2859 -38.3962, 145.2903 -38.28367, 145.4109 -38.37078, 145.2859 -38.3962)), ((140.9657 -38.05599, 140.9739 -37.46209, 140.9693 -36.79305, 140.9631 -35.74853, 140.9656 -35.00826, 140.9617 -34.09582, ...)))
[1] "list"
[1] 2
[[1]]
[[1]][[1]]
[,1] [,2]
[1,] 145.2859 -38.39620
[2,] 145.2903 -38.28367
[3,] 145.4109 -38.37078
[4,] 145.2859 -38.39620
[[2]]
[[2]][[1]]
[,1] [,2]
[1,] 140.9657 -38.05599
[2,] 140.9739 -37.46209
[3,] 140.9693 -36.79305
[4,] 140.9631 -35.74853
[5,] 140.9656 -35.00826
[6,] 140.9617 -34.09582
[7,] 141.0089 -34.04466
[8,] 141.2610 -34.07898
[9,] 141.3576 -34.11069
[10,] 141.5176 -34.20645
[11,] 141.7049 -34.09569
[12,] 141.8424 -34.13253
[13,] 141.8991 -34.11149
[14,] 142.0745 -34.13049
[15,] 142.0821 -34.17299
[16,] 142.2203 -34.18167
[17,] 142.2385 -34.30782
[18,] 142.3337 -34.33930
[19,] 142.3684 -34.52709
[20,] 142.4347 -34.57615
[21,] 142.5262 -34.75981
[22,] 142.6968 -34.72016
[23,] 142.6995 -34.60315
[24,] 142.7635 -34.59674
[25,] 143.0400 -34.70251
[26,] 143.1041 -34.69400
[27,] 143.3515 -34.79600
[28,] 143.3196 -34.96870
[29,] 143.3470 -35.10478
[30,] 143.3964 -35.19311
[31,] 143.5639 -35.20263
[32,] 143.5637 -35.31612
[33,] 143.6423 -35.40104
[34,] 143.7465 -35.38860
[35,] 143.8666 -35.48032
[36,] 144.0479 -35.55617
[37,] 144.1175 -35.62593
[38,] 144.3757 -35.80121
[39,] 144.4082 -35.90422
[40,] 144.5833 -36.03903
[41,] 144.8058 -36.12384
[42,] 144.9236 -35.98895
[43,] 144.9750 -35.88343
[44,] 145.0729 -35.82776
[45,] 145.2465 -35.83041
[46,] 145.3524 -35.86580
[47,] 145.5340 -35.80529
[48,] 145.6847 -35.93075
[49,] 145.7934 -35.98282
[50,] 145.9080 -35.95356
[51,] 145.9891 -36.01308
[52,] 146.2618 -36.01580
[53,] 146.3877 -36.03527
[54,] 146.4143 -35.97545
[55,] 146.5179 -35.96006
[56,] 146.6870 -36.04082
[57,] 146.8363 -36.08219
[58,] 147.0162 -36.08950
[59,] 147.1372 -36.02862
[60,] 147.2043 -36.04989
[61,] 147.3522 -36.03229
[62,] 147.4043 -35.94356
[63,] 147.4986 -35.94664
[64,] 147.5552 -36.00321
[65,] 147.7084 -35.92854
[66,] 147.9140 -35.99795
[67,] 147.9976 -36.04777
[68,] 148.0289 -36.14008
[69,] 148.0383 -36.36556
[70,] 148.1247 -36.46484
[71,] 148.1244 -36.55098
[72,] 148.2174 -36.59885
[73,] 148.1851 -36.69222
[74,] 148.1949 -36.79625
[75,] 149.1998 -37.20119
[76,] 149.9763 -37.50503
[77,] 149.7393 -37.58451
[78,] 149.6747 -37.68584
[79,] 149.4948 -37.76847
[80,] 149.2516 -37.78179
[81,] 148.8087 -37.78642
[82,] 148.6260 -37.80814
[83,] 148.3662 -37.80395
[84,] 148.1792 -37.83164
[85,] 147.7884 -37.95525
[86,] 147.4672 -38.15915
[87,] 147.0672 -38.47512
[88,] 146.9320 -38.59602
[89,] 146.7857 -38.64632
[90,] 146.4438 -38.69492
[91,] 146.2675 -38.68791
[92,] 146.1856 -38.74344
[93,] 146.2656 -38.80531
[94,] 146.2919 -38.90428
[95,] 146.4170 -38.84892
[96,] 146.4873 -38.90615
[97,] 146.4367 -38.97907
[98,] 146.4209 -39.12585
[99,] 146.2739 -38.99612
[100,] 146.1842 -38.86951
[101,] 146.0632 -38.81177
[102,] 145.9189 -38.90688
[103,] 145.8715 -38.77944
[104,] 145.7308 -38.64813
[105,] 145.5647 -38.65419
[106,] 145.4237 -38.53648
[107,] 145.4329 -38.42871
[108,] 145.5525 -38.35457
[109,] 145.4883 -38.23615
[110,] 145.3442 -38.21445
[111,] 145.2550 -38.23743
[112,] 145.1299 -38.39045
[113,] 145.0257 -38.47967
[114,] 144.9244 -38.49740
[115,] 144.8369 -38.37040
[116,] 144.9750 -38.32697
[117,] 145.1213 -38.13339
[118,] 145.0906 -38.01495
[119,] 145.0307 -37.99351
[120,] 144.9638 -37.85576
[121,] 144.8085 -37.88033
[122,] 144.6502 -38.00202
[123,] 144.4874 -38.07828
[124,] 144.3979 -38.07382
[125,] 144.3561 -38.13840
[126,] 144.4807 -38.16449
[127,] 144.6318 -38.10734
[128,] 144.7197 -38.14612
[129,] 144.6142 -38.29214
[130,] 144.4357 -38.28130
[131,] 144.2516 -38.38841
[132,] 144.0365 -38.47578
[133,] 143.9731 -38.56636
[134,] 143.8421 -38.67908
[135,] 143.6854 -38.73536
[136,] 143.5557 -38.85494
[137,] 143.3569 -38.75435
[138,] 143.2203 -38.76029
[139,] 143.0217 -38.62625
[140,] 142.8611 -38.60563
[141,] 142.5633 -38.42200
[142,] 142.3722 -38.34942
[143,] 142.1482 -38.38969
[144,] 141.8849 -38.26629
[145,] 141.7420 -38.25311
[146,] 141.5992 -38.32107
[147,] 141.5575 -38.42235
[148,] 141.4067 -38.36934
[149,] 141.3574 -38.27460
[150,] 141.2278 -38.17065
[151,] 140.9657 -38.05599
The sf
, sfc
, and sfg
objects have informative header prints but they can be boiled down to basic data structures that we’re already familiar with
There are more than just introduced geometry types: MULTIPOINTS, MULTILINESTRING, etc
In practice, you don’t have to decompose/ manipulate these vectors or matrices manually, existing geometrical operations (st_*()
) and visualisation tools (geom_sf()
) will do that for you.
First cast the data into a tsibble, before using tsibble::count_gaps()
for data quality check
# A tibble: 10,632 × 5
id date prcp tmax tmin
<chr> <date> <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01 0 36.7 26.9
2 ASN00003057 2020-01-02 41 34.2 24
3 ASN00003057 2020-01-03 0 35 25.4
4 ASN00003057 2020-01-04 40 29.1 25.4
5 ASN00003057 2020-01-05 1640 27.3 24.3
6 ASN00003057 2020-01-06 1832 29.2 23.2
7 ASN00003057 2020-01-07 42 36 21
8 ASN00003057 2020-01-08 300 31.5 25
9 ASN00003057 2020-01-09 90 32.2 25.3
10 ASN00003057 2020-01-10 0 34 25.9
# … with 10,622 more rows
# A tsibble: 10,632 x 5 [1D]
# Key: id [30]
id date prcp tmax tmin
<chr> <date> <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01 0 36.7 26.9
2 ASN00003057 2020-01-02 41 34.2 24
3 ASN00003057 2020-01-03 0 35 25.4
4 ASN00003057 2020-01-04 40 29.1 25.4
5 ASN00003057 2020-01-05 1640 27.3 24.3
6 ASN00003057 2020-01-06 1832 29.2 23.2
7 ASN00003057 2020-01-07 42 36 21
8 ASN00003057 2020-01-08 300 31.5 25
9 ASN00003057 2020-01-09 90 32.2 25.3
10 ASN00003057 2020-01-10 0 34 25.9
# … with 10,622 more rows
# A tibble: 112 × 4
id .from .to .n
<chr> <date> <date> <int>
1 ASN00003057 2020-02-17 2020-02-18 2
2 ASN00003057 2020-02-25 2020-03-01 6
3 ASN00003057 2020-03-07 2020-03-09 3
4 ASN00003057 2020-03-11 2020-03-13 3
5 ASN00003057 2020-03-25 2020-03-27 3
# … with 107 more rows
# A tsibble: 271 x 5 [1D]
# Key: id [1]
id date prcp tmax tmin
<chr> <date> <dbl> <dbl> <dbl>
1 ASN00003057 2020-02-15 0 33.7 25.5
2 ASN00003057 2020-02-16 0 33.5 28.5
3 ASN00003057 2020-02-19 20 34 24.4
4 ASN00003057 2020-02-20 0 34.1 26.5
5 ASN00003057 2020-02-21 0 34.5 25.7
# … with 266 more rows
# A tsibble: 321 x 5 [1D]
# Key: id [1]
id date prcp tmax tmin
<chr> <date> <dbl> <dbl> <dbl>
1 ASN00003057 2020-02-15 0 33.7 25.5
2 ASN00003057 2020-02-16 0 33.5 28.5
3 ASN00003057 2020-02-17 NA NA NA
4 ASN00003057 2020-02-18 NA NA NA
5 ASN00003057 2020-02-19 20 34 24.4
# … with 316 more rows
# A tibble: 144 × 5
id name long lat altitude
<dbl> <chr> <dbl> <dbl> <dbl>
1 13401 KAPFENBERG-FLUGFELD 15.3 47.5 515
2 2117 HORN - WASSERWERK 15.6 48.7 308
3 1416 ROHRBACH 14.0 48.6 613
4 7956 ANDAU 17.0 47.8 117
5 500 LITSCHAU 15.0 49.0 558
6 905 RETZ/WINDMUEHLE 15.9 48.8 320
7 1415 ROHRBACH 14.0 48.6 597
# … with 137 more rows
# A tibble: 103,660 × 4
station time tmax id
<dbl> <dttm> <dbl> <dbl>
1 100 2021-01-01 00:00:00 1.7 20123
2 100 2021-01-02 00:00:00 0.4 20123
3 100 2021-01-03 00:00:00 2.5 20123
4 100 2021-01-04 00:00:00 1.4 20123
5 100 2021-01-05 00:00:00 3.4 20123
6 100 2021-01-06 00:00:00 2.9 20123
7 100 2021-01-07 00:00:00 1.2 20123
# … with 103,653 more rows
Cubble is a nested object built on tibble that allow easy pivoting between spatial and temporal form.
(cb <- as_cubble(
list(spatial = stations, temporal = ts),
key = id, index = time, coords = c(long, lat)
))
# cubble: id [144]: nested form
# bbox: [9.6, 46.44, 17.04, 48.96]
# temporal: station [dbl], time [dttm], tmax [dbl]
id name long lat altitude ts
<dbl> <chr> <dbl> <dbl> <dbl> <list>
1 500 LITSCHAU 15.0 49.0 558 <tibble [730 × 3]>
2 905 RETZ/WINDMUEHLE 15.9 48.8 320 <tibble [730 × 3]>
3 1401 KOLLERSCHLAG 13.8 48.6 714 <tibble [730 × 3]>
4 1415 ROHRBACH 14.0 48.6 597 <tibble [287 × 3]>
5 1416 ROHRBACH 14.0 48.6 613 <tibble [443 × 3]>
6 1802 WEITRA 14.9 48.7 572 <tibble [730 × 3]>
7 1906 ALLENTSTEIG 15.4 48.7 599 <tibble [730 × 3]>
# … with 137 more rows
(cb_tm <- cb_space %>%
face_temporal() %>%
group_by(month = lubridate::month(time)) %>%
summarise(tmax = mean(tmax, na.rm = TRUE))
)
# cubble: month, id [140]: long form
# bbox: [9.6, 46.44, 16.85, 48.96]
# spatial: name [chr], long [dbl], lat [dbl], altitude [dbl]
id month tmax
<dbl> <dbl> <dbl>
1 500 1 2.53
2 500 2 5.63
3 500 3 8.67
4 500 4 11.2
5 500 5 18.0
6 500 6 25.0
7 500 7 24.9
# … with 1,673 more rows
# cubble: month, id [140]: long form
# bbox: [9.6, 46.44, 16.85, 48.96]
# spatial: name [chr], long [dbl], lat [dbl], altitude [dbl]
id month tmax long lat
<dbl> <dbl> <dbl> <dbl> <dbl>
1 500 1 2.53 15.0 49.0
2 500 2 5.63 15.0 49.0
3 500 3 8.67 15.0 49.0
4 500 4 11.2 15.0 49.0
5 500 5 18.0 15.0 49.0
6 500 6 25.0 15.0 49.0
7 500 7 24.9 15.0 49.0
# … with 1,673 more rows
Modified from Glyph-maps for Visually Exploring Temporal Patterns in Climate Data and Models (Wickham, 2012)
https://huizezhang-sherry.github.io/cubble/articles/glyph.html
library(cubble)
library(tidyverse)
ts_raw <- read_csv(here::here("data/TAG Datensatz_20210101_20221231.csv"))
ts <- ts_raw %>% set_names(c("station", "time", "tmax", "id")) %>% filter(!is.na(id))
stations_raw <- read_csv(here::here("data/TAG Stations-Metadaten.csv"))
stations <- stations_raw %>% select(c(1,3:6)) %>% set_names(c("id", "name", "long", "lat", "altitude")) %>% filter(id %in% unique(ts$id))
cb <- as_cubble(
list(spatial = stations, temporal = ts),
key = id, index = time, coords = c(long, lat)
)
cb_glyph <- cb %>%
filter(nrow(ts) == 365 * 2) %>%
face_temporal() %>%
group_by(month = lubridate::month(time)) %>%
summarise(tmax = mean(tmax, na.rm = TRUE)) %>%
unfold(long, lat, altitude)
austria <- rnaturalearth::ne_countries(returnclass = "sf", scale = "medium") %>% filter(name == "Austria") %>% pull(geometry)
cb_glyph %>%
ggplot(aes(x_major = long, x_minor = month, y_major = lat, y_minor = tmax, color = altitude ),) +
geom_sf(data = austria, fill = "grey90", color = "white", inherit.aes = FALSE) +
geom_glyph_box(width = 0.2, height = 0.05) +
geom_glyph(width = 0.2, height = 0.05) +
ggthemes::theme_map()
The slides are made with Quarto
All the materials used to prepare the slides are available at https://sherryzhang-rladiesvienna2023.netlify.app