Migration as System Stabilzer

Migration
Climate Migration
Spatial Analysis
GEOG 544
Author

Sechang Kim

Published

May 27, 2026

Modified

May 30, 2026

Research question. How are cumulative and recurrent flood-inundation damages spatially distributed across Seoul’s administrative dongs, and how might these patterns relate to housing-market conditions and residential mobility?

This research notebook begins by preparing the flood-damage point records and visualizing them against Seoul’s administrative boundary layers. The purpose of this first section is exploratory: to inspect the annual spatial distribution of reported flood-inundation damage before constructing dong-year summaries, recurrence measures, or cluster typologies.

1. Basic visualization and data preparation

The raw flood-damage data are stored as annual point shapefiles. The seoul_EMD and seoul_SGG shapefiles are polygon boundary layers used as map context: EMD boundaries provide the finer administrative-dong geography, while SGG boundaries provide the higher-level district geography. The flood-damage attributes are read from the annual point files only.

The resulting object, flood_points, is a long-form sf point layer that combines all available annual flood-damage records. It retains the original raw flood fields, including ADM_CD, F_ZONE_NM, F_YR, F_AREA, F_SHIM, and F_AVG_HGT. The EMD and SGG polygon layers are retained separately as contextual boundary layers for visualization.

For the interactive map, the substantive time unit is the reported flood year, F_YR. Rather than using leaflet.extras2::addTimeslider(), which can behave poorly when many points share the same timestamp, this map uses a small custom JavaScript slider with one step per reported year. Moving the slider filters the displayed points directly by F_YR.

This map is intended as an initial diagnostic visualization. The lighter EMD boundaries show the fine-grained neighborhood geography, while the thicker SGG boundaries make the broader district structure easier to read. Later sections will use these point records to construct administrative-dong summaries and develop measures of cumulative damage and recurrence.

2. Administrative-dong flood indicators for clustering

The following indicators are dong-specific cumulative measures for later cluster analysis. The unit of analysis is the 2025 administrative dong, so flood records are first assigned to the fixed 2025 boundary framework with a spatial point-in-polygon join rather than an attribute join on historical ADM_CD. The final object has one row per dong and retains the 2025 dong geometry for mapping. These are not dong-year indicators: annual records are used only as inputs for cumulative, recurrence, and concentration summaries.

  n_dongs dongs_with_damage max_damage_count max_total_F_AREA
1     426               401             1785          2831216
  max_total_area_depth
1             847584.7
   ADM_CD_2025     ADM_NM SGG_CD total_damage_count log_total_damage_count
1     11090750     인수동  11090                 99               4.605170
2     11110760 상계3·4동  11110                 44               3.806662
3     11200630    사당1동  11200               1669               7.420579
4     11220520    서초2동  11220                215               5.375278
5     11220540    서초4동  11220                 90               4.510860
6     11200690  신대방1동  11200                459               6.131226
7     11220630    방배2동  11220                877               6.777647
8     11200680     대방동  11200                445               6.100319
9     11200730    사당2동  11200                498               6.212606
10    11210680     신사동  11210               1361               7.216709
   total_F_AREA log_total_F_AREA total_area_depth log_total_area_depth
1     2831215.9         14.85622         847584.7             13.65015
2     1716937.5         14.35605         345918.9             12.75396
3      531707.4         13.18385         304299.8             12.62577
4      528623.4         13.17803         297445.5             12.60299
5      213061.2         12.26934         200316.3             12.20766
6      250675.9         12.43192         168366.3             12.03390
7      416249.7         12.93904         165722.5             12.01808
8      236825.1         12.37508         146514.6             11.89489
9      299649.5         12.61037         145654.0             11.88900
10     373840.3         12.83159         140894.8             11.85578
   n_damage_years area_depth_evenness max_year_share_area_depth
1               6          0.02176862                 0.9948816
2               6          0.05416919                 0.9833642
3               4          0.42679072                 0.8169489
4               4          0.60653848                 0.6784982
5               3          0.34724541                 0.8777714
6               4          0.09228015                 0.9766383
7               5          0.57172648                 0.4853826
8               5          0.09224268                 0.9714274
9               3          0.46548574                 0.8562141
10              3          0.80861441                 0.6379073

3. Exploratory hierarchical clustering

The following chunk uses the dong-level flood indicators to run an exploratory hierarchical clustering analysis. All indicators are standardized before calculating distances because they are measured on different scales. The clustering result is not yet interpreted or mapped here; cluster profiles and spatial visualization will be handled in the next step.

4. Cluster-specific flood-damage profiles

The following section fixes the number of clusters at k = 5, joins the cluster membership back to the 2025 administrative-dong layer, summarizes the six clustering indicators by cluster, and assigns interpretable labels based on the hypothesized flood-damage typology.

Mean values of the six flood-damage indicators by final k = 5 cluster.
cluster_id cluster_label n_dongs log_total_damage_count log_total_F_AREA log_total_area_depth n_damage_years area_depth_evenness max_year_share_area_depth
1 3. Moderate or low damage but recurrent 121 2.720 7.827 6.604 3.347 0.707 0.651
2 4. Low damage and episodic 45 1.030 5.206 3.327 1.111 0.024 0.996
3 5. Very low or no reported damage 25 0.000 0.000 0.000 0.000 0.000 0.000
4 2. High damage but concentrated 91 4.452 10.727 9.632 3.066 0.205 0.926
5 1. High damage and recurrent 144 4.683 10.342 9.134 4.583 0.724 0.543

5. Spatial visualization of the final clusters

The final k = 5 cluster number and label are attached to each 2025 administrative dong. The map is drawn in EPSG:5179. Administrative-dong polygons are filled by cluster label with thin boundaries, while district boundaries are overlaid without fill and with thicker lines.


Call:
glm(formula = Magnitude ~ log_AtRisk + GM_Rate + Frgn_HH_R + 
    NonHous_R + Youth_R + SINGLE_HH_:Elderly_R, family = binomial(link = "logit"), 
    data = magnitude_data)

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -1.3311681  0.4800705  -2.773  0.00556 ** 
log_AtRisk            0.0892443  0.0391974   2.277  0.02280 *  
GM_Rate               0.0247485  0.0127948   1.934  0.05308 .  
Frgn_HH_R             0.0497226  0.0260452   1.909  0.05625 .  
NonHous_R             0.0506738  0.0068887   7.356 1.89e-13 ***
Youth_R              -0.0298441  0.0172342  -1.732  0.08333 .  
SINGLE_HH_:Elderly_R -0.0007586  0.0004457  -1.702  0.08872 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 586.01  on 425  degrees of freedom
Residual deviance: 496.71  on 419  degrees of freedom
AIC: 510.71

Number of Fisher Scoring iterations: 4
# A tibble: 7 × 7
  term                 estimate std.error statistic  p.value conf.low conf.high
  <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)             0.264  0.480        -2.77 5.56e- 3    0.101     0.666
2 log_AtRisk              1.09   0.0392        2.28 2.28e- 2    1.01      1.18 
3 GM_Rate                 1.03   0.0128        1.93 5.31e- 2    1.00      1.05 
4 Frgn_HH_R               1.05   0.0260        1.91 5.63e- 2    1.00      1.11 
5 NonHous_R               1.05   0.00689       7.36 1.89e-13    1.04      1.07 
6 Youth_R                 0.971  0.0172       -1.73 8.33e- 2    0.938     1.00 
7 SINGLE_HH_:Elderly_R    0.999  0.000446     -1.70 8.87e- 2    0.998     1.00 

Call:
glm(formula = Repeatability ~ GM_Rate * SINGLE_HH_ + Frgn_HH_R + 
    NonHous_R + Elderly_R, family = binomial(link = "logit"), 
    data = repeatability_data)

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)  
(Intercept)         0.6757322  1.4214470   0.475   0.6345  
GM_Rate            -0.0385913  0.0407222  -0.948   0.3433  
SINGLE_HH_         -0.0313933  0.0325346  -0.965   0.3346  
Frgn_HH_R          -0.0577300  0.0293362  -1.968   0.0491 *
NonHous_R           0.0143303  0.0076996   1.861   0.0627 .
Elderly_R           0.0421844  0.0389746   1.082   0.2791  
GM_Rate:SINGLE_HH_  0.0008805  0.0008798   1.001   0.3169  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 313.72  on 234  degrees of freedom
Residual deviance: 299.68  on 228  degrees of freedom
AIC: 313.68

Number of Fisher Scoring iterations: 4
# A tibble: 7 × 7
  term               estimate std.error statistic p.value conf.low conf.high
  <chr>                 <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 (Intercept)           1.97   1.42         0.475  0.635     0.125    34.4  
2 GM_Rate               0.962  0.0407      -0.948  0.343     0.884     1.04 
3 SINGLE_HH_            0.969  0.0325      -0.965  0.335     0.907     1.03 
4 Frgn_HH_R             0.944  0.0293      -1.97   0.0491    0.887     0.997
5 NonHous_R             1.01   0.00770      1.86   0.0627    0.999     1.03 
6 Elderly_R             1.04   0.0390       1.08   0.279     0.967     1.13 
7 GM_Rate:SINGLE_HH_    1.00   0.000880     1.00   0.317     0.999     1.00 

      5. Very low or no reported damage            1. High damage and recurrent 
                                     25                                     144 
        2. High damage but concentrated 3. Moderate or low damage but recurrent 
                                     91                                     121 
             4. Low damage and episodic 
                                     45 
Call:
nnet::multinom(formula = cluster_label ~ log_AtRisk + GM_Rate + 
    Frgn_HH_R + NonHous_R + Youth_R + single_elderly_proxy, data = multi_data, 
    trace = FALSE)

Coefficients:
                                        (Intercept) log_AtRisk     GM_Rate
1. High damage and recurrent              -2.839474 0.16093809  0.01026523
2. High damage but concentrated           -2.701559 0.08678431  0.01470461
3. Moderate or low damage but recurrent   -1.320426 0.05596986 -0.02523799
4. Low damage and episodic                -2.897001 0.02203776  0.00930180
                                           Frgn_HH_R NonHous_R    Youth_R
1. High damage and recurrent            -0.006688156 0.2487036 0.03912836
2. High damage but concentrated          0.057193670 0.2359755 0.05182756
3. Moderate or low damage but recurrent -0.023372992 0.2023860 0.09319679
4. Low damage and episodic              -0.022069328 0.1879522 0.04352842
                                        single_elderly_proxy
1. High damage and recurrent                   -0.0004407848
2. High damage but concentrated                -0.0007682506
3. Moderate or low damage but recurrent        -0.0002420736
4. Low damage and episodic                      0.0015052847

Std. Errors:
                                        (Intercept) log_AtRisk    GM_Rate
1. High damage and recurrent              0.4305650 0.07321919 0.02593154
2. High damage but concentrated           0.4433316 0.07293323 0.02654979
3. Moderate or low damage but recurrent   0.3802653 0.06435673 0.02471549
4. Low damage and episodic                0.5229202 0.07418008 0.02692299
                                         Frgn_HH_R  NonHous_R    Youth_R
1. High damage and recurrent            0.09458413 0.05918477 0.04290933
2. High damage but concentrated         0.09365739 0.05926535 0.04306223
3. Moderate or low damage but recurrent 0.09459085 0.05899376 0.04196476
4. Low damage and episodic              0.09681799 0.05940105 0.04315988
                                        single_elderly_proxy
1. High damage and recurrent                    0.0008034619
2. High damage but concentrated                 0.0008505752
3. Moderate or low damage but recurrent         0.0007421762
4. Low damage and episodic                      0.0007026179

Residual Deviance: 1077.068 
AIC: 1133.068 
# A tibble: 28 × 8
   y.level        term  estimate std.error statistic  p.value conf.low conf.high
   <chr>          <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
 1 1. High damag… (Int…   0.0585  0.431      -6.59   4.26e-11   0.0251     0.136
 2 1. High damag… log_…   1.17    0.0732      2.20   2.79e- 2   1.02       1.36 
 3 1. High damag… GM_R…   1.01    0.0259      0.396  6.92e- 1   0.960      1.06 
 4 1. High damag… Frgn…   0.993   0.0946     -0.0707 9.44e- 1   0.825      1.20 
 5 1. High damag… NonH…   1.28    0.0592      4.20   2.64e- 5   1.14       1.44 
 6 1. High damag… Yout…   1.04    0.0429      0.912  3.62e- 1   0.956      1.13 
 7 1. High damag… sing…   1.00    0.000803   -0.549  5.83e- 1   0.998      1.00 
 8 2. High damag… (Int…   0.0671  0.443      -6.09   1.10e- 9   0.0281     0.160
 9 2. High damag… log_…   1.09    0.0729      1.19   2.34e- 1   0.945      1.26 
10 2. High damag… GM_R…   1.01    0.0265      0.554  5.80e- 1   0.963      1.07 
# ℹ 18 more rows
Multinomial logistic regression results predicting flood-damage cluster membership
Outcome Predictor Odds ratio 95% CI p-value
1. High damage and recurrent Log at-risk households 1.17* [1.02, 1.36] 0.028
Gross migration rate 1.01 [0.96, 1.06] 0.692
Foreign household share 0.99 [0.83, 1.20] 0.944
Non-housing residence share 1.28*** [1.14, 1.44] <0.001
Youth population share 1.04 [0.96, 1.13] 0.362
Single elderly proxy 1.00 [1.00, 1.00] 0.583
2. High damage but concentrated Log at-risk households 1.09 [0.95, 1.26] 0.234
Gross migration rate 1.01 [0.96, 1.07] 0.580
Foreign household share 1.06 [0.88, 1.27] 0.541
Non-housing residence share 1.27*** [1.13, 1.42] <0.001
Youth population share 1.05 [0.97, 1.15] 0.229
Single elderly proxy 1.00 [1.00, 1.00] 0.366
3. Moderate or low damage but recurrent Log at-risk households 1.06 [0.93, 1.20] 0.384
Gross migration rate 0.98 [0.93, 1.02] 0.307
Foreign household share 0.98 [0.81, 1.18] 0.805
Non-housing residence share 1.22*** [1.09, 1.37] <0.001
Youth population share 1.10* [1.01, 1.19] 0.026
Single elderly proxy 1.00 [1.00, 1.00] 0.744
4. Low damage and episodic Log at-risk households 1.02 [0.88, 1.18] 0.766
Gross migration rate 1.01 [0.96, 1.06] 0.730
Foreign household share 0.98 [0.81, 1.18] 0.820
Non-housing residence share 1.21** [1.07, 1.36] 0.002
Youth population share 1.04 [0.96, 1.14] 0.313
Single elderly proxy 1.00* [1.00, 1.00] 0.032
Note: Reference category is the first level of the outcome factor. Values are odds ratios. 95% confidence intervals are shown in brackets. † p < 0.10, * p < 0.05, ** p < 0.01, *** p < 0.001.