6  Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC

Author

Shannon Joyce

6.1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC

Housing conditions are an important factor in public health and well-being. Poor residential environments, such as mold exposure, can contribute to physical health problems and increased stress. Domestic violence is also influenced by environmental and social stressors, making housing conditions a relevant area of study.

This project examines the relationship between residential mold complaints and domestic violence reports in New York City from 2010 to 2024. I am using two datasets from NYC Open Data: 311 Complaint Data to extract residential mold complaints and NYPD Complaint Data Historic to extract domestic violence reports. Using NYC 311 mold complaint data and DV report data, I explore whether these two types of reports follow similar patterns over time. The goal is not to determine causation, but to understand whether mold complaints and DV reports tend to rise and fall together.

The analysis focuses on monthly aggregated data and includes exploratory summaries, correlation analyses, and regression models. I also explore delayed (or lagged) relationships and mold complaint resolution time to better understand how timing may play a role.

6.2 Loading, Prepping, Cleaning, & Aggregating

6.2.1 Data Preparation & Cleaning

Show the code
library(tidyverse)
library(readxl)
library(ggplot2)
library(mosaic)
library(AICcmodavg)
library(knitr)
library(arrow)
Show the code
mold_data <- read_parquet("311_Service_Requests_from_2010_to_Present_20251215.parquet")
dv_data <- read_parquet("NYPD_Complaint_Data_Historic_20251218.parquet")
Show the code
# 1. Create a Lookup Table for the Cities (The "Max Squeeze" approach)
# This replaces the 100-line case_when and is much faster/stable
city_lookup <- data.frame(
  City = c("NEW YORK", "BROOKLYN", "ARVERNE", "BRONX", "JAMAICA", "STATEN ISLAND", 
           "SPRINGFIELD GARDENS", "FLUSHING", "RICHMOND HILL", "ASTORIA", "HOLLIS", 
           "RIDGEWOOD", "FOREST HILLS", "ELMHURST", "MASPETH", "SOUTH RICHMOND HILL", 
           "JACKSON HEIGHTS", "BAYSIDE", "FAR ROCKAWAY", "SAINT ALBANS", "CORONA", 
           "WOODSIDE", "QUEENS VILLAGE", "REGO PARK", "ROSEDALE", "SUNNYSIDE", 
           "OZONE PARK", "EAST ELMHURST", "MIDDLE VILLAGE", "WOODHAVEN", 
           "SOUTH OZONE PARK", "ROCKAWAY PARK", "KEW GARDENS", "FRESH MEADOWS", 
           "COLLEGE POINT", "LONG ISLAND CITY", "OAKLAND GARDENS", "WHITESTONE", 
           "HOWARD BEACH", "CAMBRIA HEIGHTS", "BELLEROSE", "LITTLE NECK", 
           "BREEZY POINT", "GLEN OAKS", "FLORAL PARK", "PELHAM", "NEW HYDE PARK", 
           "QUEENS", "MANHATTAN"),
  Borough_Mapped = c("MANHATTAN", "BROOKLYN", "QUEENS", "BRONX", "QUEENS", "STATEN ISLAND", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "QUEENS", 
                     "QUEENS", "QUEENS", "QUEENS", "BRONX", "QUEENS", 
                     "QUEENS", "MANHATTAN")
)

# 2. Process Mold Data
mold_data_clean <- mold_data %>%
  # Filter Location Types efficiently
  filter(`Location Type` %in% c("RESIDENTIAL BUILDING", "Residential Building", "Loft Residence", 
                                "Mixed Use Building", "Apartment", "3+ Family Apartment Building", 
                                "1-2 Family Dwelling", "1-2 Family Mixed Use Building", 
                                "3+ Family Mixed Use Building", "Single Room Occupancy (SRO)")) %>%
  filter(Descriptor != "Unsafe Mold Cleanup") %>%
  mutate(
    created_date = as.Date(`Created Date`),
    closed_date  = as.Date(`Closed Date`),
    # Force Year to numeric to prevent Join errors
    Year  = as.numeric(lubridate::year(created_date)),
    # Clean Month formatting
    Month = sprintf("%02d - %s", 
                    lubridate::month(created_date), 
                    lubridate::month(created_date, label = TRUE, abbr = FALSE)),
    resolution_days = as.numeric(closed_date - created_date),
    `Complaint Type` = case_when(
      `Complaint Type` %in% c("UNSANITARY CONDITION", "Unsanitary Condition", "MOLD", "GENERAL", "GENERAL CONSTRUCTION") ~ "Mold",
      TRUE ~ `Complaint Type`
    )
  ) %>%
  # Use the lookup table for City -> Borough
  left_join(city_lookup, by = "City") %>% 
  mutate(Borough = coalesce(Borough_Mapped, "UNKNOWN")) %>%
  filter(Borough != "UNKNOWN", Year != 2025)

# 3. Process DV Data
dv_data_clean <- dv_data %>% 
  rename(
    "complaint_number" = `CMPLNT_NUM`,
    "inc_occur_date" = `CMPLNT_FR_DT`,
    "inc_occur_time" = `CMPLNT_FR_TM`,
    "inc_end_date" = `CMPLNT_TO_DT`,
    "inc_end_time" = `CMPLNT_TO_TM`,
    "precinct_occur" = `ADDR_PCT_CD`,
    "report_date" = `RPT_DT`,
    "key_code" = `KY_CD`,
    "offense_type" = `OFNS_DESC`,
    "class_code" = `PD_CD`,
    "class_code_desc" = `PD_DESC`,
    "attempt_completion" = `CRM_ATPT_CPTD_CD`,
    "offense_level" = `LAW_CAT_CD`,
    "borough" = `BORO_NM`,
    "suspect_age" = `SUSP_AGE_GROUP`,
    "suspect_race" = `SUSP_RACE`,
    "suspect_sex" = `SUSP_SEX`,
    "victim_age" = `VIC_AGE_GROUP`,
    "victim_race" = `VIC_RACE`,
    "victim_sex" = `VIC_SEX`
  ) %>%
  mutate(
    inc_date_converted = as.Date(inc_occur_date),
    # Force Year to numeric to match Mold data
    Year  = as.numeric(lubridate::year(inc_date_converted)),
    Month = sprintf("%02d - %s", 
                    lubridate::month(inc_date_converted), 
                    lubridate::month(inc_date_converted, label = TRUE, abbr = FALSE)),
    complaint = "DV"
  ) %>%
  filter(Year >= 2010 & Year < 2025)

mold_data_clean <- mold_data_clean %>% mutate(Year = as.character(Year))
dv_data_clean   <- dv_data_clean   %>% mutate(Year = as.character(Year))

# 4. Final memory cleanup (optional but helpful for cloud publishing)
rm(mold_data, dv_data)
Show the code
kable(head(mold_data_clean, 3),
      caption = "Preview of cleaned 311 residential mold complaints (first 3 rows) after filtering to residential location types and creating date/resolution fields.")
Table 6.1: Preview of cleaned 311 residential mold complaints (first 3 rows) after filtering to residential location types and creating date/resolution fields.
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location created_date closed_date Year Month resolution_days Borough_Mapped
63573695 2024-12-31T23:06:14.000 2025-01-19T11:58:36.000 HPD Department of Housing Preservation and Development Mold MOLD RESIDENTIAL BUILDING 10451 283 EAST 149 STREET EAST 149 STREET NA NA NA NA ADDRESS BRONX NA NA Closed NA HPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed. 2025-01-19T00:00:00.000 01 BRONX 2023310072 BRONX 1005910 236991 PHONE Unspecified BRONX NA NA NA NA NA NA NA 40.81713 -73.92175 (40.81713468822815, -73.9217467475281) 2024-12-31 2025-01-19 2024 12 - December 19 BRONX
63583460 2024-12-31T21:19:39.000 2025-01-03T16:24:36.000 HPD Department of Housing Preservation and Development Mold MOLD RESIDENTIAL BUILDING 10468 2719 MORRIS AVENUE MORRIS AVENUE NA NA NA NA ADDRESS BRONX NA NA Closed NA HPD inspected this condition so the complaint has been closed. Violations were issued. The law provides the property owner time to correct the condition(s). Violation descriptions and the dates for the property owner to correct any violations are available at HPDONLINE. If the owner has not corrected the condition by the date provided, you may wish to bring a case in housing court seeking the correction of these conditions.To find out more about how to start a housing court case, visit HPD’s w 2025-01-03T00:00:00.000 07 BRONX 2033170043 BRONX 1013153 255563 PHONE Unspecified BRONX NA NA NA NA NA NA NA 40.86809 -73.89550 (40.86808863174485, -73.89549921281306) 2024-12-31 2025-01-03 2024 12 - December 3 BRONX
63583408 2024-12-31T20:55:28.000 2025-01-10T16:55:45.000 HPD Department of Housing Preservation and Development Mold MOLD RESIDENTIAL BUILDING 11435 85-15 139 STREET 139 STREET NA NA NA NA ADDRESS JAMAICA NA NA Closed NA HPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed. 2025-01-10T00:00:00.000 08 QUEENS 4097100002 QUEENS 1034989 197495 PHONE Unspecified QUEENS NA NA NA NA NA NA NA 40.70861 -73.81699 (40.7086094282564, -73.8169884642595) 2024-12-31 2025-01-10 2024 12 - December 10 QUEENS
Show the code
kable(head(dv_data_clean, 5),
      caption = "Preview of cleaned NYPD complaint records used to identify domestic violence reports (first 5 rows) after renaming fields and extracting date components.")
Table 6.2: Preview of cleaned NYPD complaint records used to identify domestic violence reports (first 5 rows) after renaming fields and extracting date components.
complaint_number inc_occur_date inc_occur_time inc_end_date inc_end_time precinct_occur report_date key_code offense_type class_code class_code_desc attempt_completion offense_level borough LOC_OF_OCCUR_DESC PREM_TYP_DESC JURIS_DESC JURISDICTION_CODE PARKS_NM HADEVELOPT HOUSING_PSA X_COORD_CD Y_COORD_CD suspect_age suspect_race suspect_sex TRANSIT_DISTRICT Latitude Longitude Lat_Lon PATROL_BORO STATION_NAME victim_age victim_race victim_sex inc_date_converted Year Month complaint
298690828 2024-12-31T00:00:00.000 13:00:00 2024-12-31T00:00:00.000 13:10:00 113 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1046104 187464 <18 BLACK F NA 40.68101 -73.77699 (40.681014, -73.776991) PATROL BORO QUEENS SOUTH (null) <18 BLACK M 2024-12-31 2024 12 - December DV
298698016 2024-12-31T00:00:00.000 08:00:00 2024-12-31T00:00:00.000 09:00:00 116 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1048028 178970 25-44 BLACK M NA 40.65769 -73.77013 (40.657687, -73.770132) PATROL BORO QUEENS SOUTH (null) 18-24 WHITE HISPANIC F 2024-12-31 2024 12 - December DV
298704508 2024-12-31T00:00:00.000 16:50:00 2024-12-31T00:00:00.000 16:56:00 107 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE - APT. HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1050645 203097 UNKNOWN BLACK F NA 40.72389 -73.76046 (40.723891, -73.760464) PATROL BORO QUEENS SOUTH (null) 65+ WHITE M 2024-12-31 2024 12 - December DV
298678676 2024-12-31T00:00:00.000 07:00:00 2024-12-31T00:00:00.000 07:30:00 113 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1051478 189936 25-44 BLACK M NA 40.68776 -73.75759 (40.687762, -73.757589) PATROL BORO QUEENS SOUTH (null) 45-64 BLACK M 2024-12-31 2024 12 - December DV
298672417 2024-12-31T00:00:00.000 02:50:00 2024-12-31T00:00:00.000 02:55:00 101 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1054075 157436 45-64 BLACK M NA 40.59854 -73.74856 (40.598536, -73.74856) PATROL BORO QUEENS SOUTH (null) 45-64 BLACK F 2024-12-31 2024 12 - December DV

This chunk deals with some heavy cleaning of large datasets; I am mostly standardizing column names, values, filling in NA values for Borough (using the City column), and getting rid of otherwise NA values. I also separated the dates so that Year, Month, and Day could be individually utilized in the project.

6.2.2 Aggregating Mold Data & DV Data

Show the code
aggregated_dv_data <- dv_data_clean %>%
  group_by(Year, Month, borough) %>%
  summarise(
    `complaint` = n(),
    .groups = "drop"
  )
aggregated_dv_data <- aggregated_dv_data %>%
  filter(borough != "(null)")

aggregated_dv_data <- aggregated_dv_data %>%
  mutate(Year = as.character(Year))

aggregated_dv_data<- aggregated_dv_data %>% rename(
  "Borough" = "borough")

aggregated_dv_data<- aggregated_dv_data %>% rename(
  "DV Reports" = "complaint"
)

aggregated_mold_data <- mold_data_clean %>%
  group_by(Year, Month, Borough) %>%
  summarise(
  `Complaint Type` = n(),
  .groups = "drop"
)

aggregated_mold_data<- aggregated_mold_data %>% rename(
  "Mold Complaints" = "Complaint Type"
)

aggregated_dv_mold_data <- left_join(
  aggregated_dv_data,
  aggregated_mold_data,
  by = c("Borough", "Year", "Month")
)

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  mutate(
    year_month = paste(Year, Month)
  )

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  group_by(Borough) %>%
  mutate(time_index = row_number()) %>%
  ungroup()

kable(head(aggregated_dv_mold_data, 15),
      caption = "Monthly borough-level dataset combining domestic violence report counts and residential mold complaint counts (first 15 rows).")
Table 6.3: Monthly borough-level dataset combining domestic violence report counts and residential mold complaint counts (first 15 rows).
Year Month Borough DV Reports Mold Complaints year_month time_index
2010 01 - January BRONX 910 954 2010 01 - January 1
2010 01 - January BROOKLYN 1306 779 2010 01 - January 1
2010 01 - January MANHATTAN 541 410 2010 01 - January 1
2010 01 - January QUEENS 791 315 2010 01 - January 1
2010 01 - January STATEN ISLAND 154 58 2010 01 - January 1
2010 02 - February BRONX 818 738 2010 02 - February 2
2010 02 - February BROOKLYN 938 651 2010 02 - February 2
2010 02 - February MANHATTAN 424 338 2010 02 - February 2
2010 02 - February QUEENS 603 273 2010 02 - February 2
2010 02 - February STATEN ISLAND 151 29 2010 02 - February 2
2010 03 - March BRONX 969 941 2010 03 - March 3
2010 03 - March BROOKLYN 1149 870 2010 03 - March 3
2010 03 - March MANHATTAN 500 415 2010 03 - March 3
2010 03 - March QUEENS 758 395 2010 03 - March 3
2010 03 - March STATEN ISLAND 147 55 2010 03 - March 3

In this chunk, I aggregated the domestic violence reports and mold complaints datasets into one, grouped by month and borough, in order to easily analyze trends.

6.3 Exploring the Data

6.3.1 Domestic Violence Data

6.3.1.1 Summary Stats

Show the code
dv_summary <- aggregated_dv_data %>%
  summarise(
    total_reports = sum(`DV Reports`),
    start_year = min(as.numeric(Year)),
    end_year = max(as.numeric(Year)),
    boroughs = n_distinct(Borough),
    avg_monthly = mean(`DV Reports`)
  )

kable(dv_summary,
      caption = "Summary of domestic violence reports in NYC (2010–2024), aggregated from NYPD complaint data.")
Table 6.4: Summary of domestic violence reports in NYC (2010–2024), aggregated from NYPD complaint data.
total_reports start_year end_year boroughs avg_monthly
669136 2010 2024 5 743.4844

From the beginning of the year in 2010 to the end of the year in 2024, there were a total of 669,136 domestic violence incidents reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 743 domestic violence incident reports filed each month.

6.3.1.2 Borough/Year Distribution

Show the code
dv_by_year_borough <- aggregated_dv_data %>%
  group_by(Year, Borough) %>%
  summarise(total_reports = sum(`DV Reports`)) %>%
  pivot_wider(names_from = Year,
    values_from = total_reports)

kable(dv_by_year_borough,
      caption = "Total domestic violence reports by borough and year (2010–2024).")
Table 6.5: Total domestic violence reports by borough and year (2010–2024).
Borough 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
BRONX 11274 11441 12207 12630 12924 12594 12916 12569 13662 12943 12013 12499 14193 14149 14905
BROOKLYN 14094 14344 15232 15032 15022 14101 13771 13052 13039 12250 11256 12362 13286 13180 13652
MANHATTAN 6081 6128 6471 6409 6627 6555 6791 6586 6550 6864 6117 7160 7194 6880 7215
QUEENS 8726 9182 9128 9425 9467 8906 8697 8733 9047 9324 8778 9368 10205 10484 12071
STATEN ISLAND 2015 2056 2414 2242 2364 2178 2112 2036 2048 1868 1687 1871 2099 2126 2259

From 2010 to 2024, Brooklyn and Bronx have consistently had the highest amount of domestic violence reports in NYC. Brooklyn was the borough with the most amount of reports since 2010, but sometime in 2018, Bronx took the title for highest reported incidents and has been the borough with the highest number of reports since then. Staten Island has had the lowest reported incidents each year.

For the most part, domestic violence incident reports have risen across the boroughs consistently from 2010-2024. However, Brooklyn is the only borough in which the number of reported incidents are lower in 2024 than they were in 2010. (This could be something interesting to look into!)

6.3.1.3 Heat Map

Show the code
library(ggplot2)

dv_plot_data <- aggregated_dv_data %>%
  group_by(Year, Borough) %>%
  summarise(total_reports = sum(`DV Reports`))

ggplot(dv_plot_data, aes(x = Year, y = Borough, fill = total_reports)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkred") +
  labs(
    title = "DV Reports by Borough and Year",
    x = "Year",
    y = "Borough",
    fill = "DV Reports"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))
Heat map showing yearly domestic violence report counts for each NYC borough from 2010 to 2024. Color intensity increases as report totals increase.
Figure 6.1: Domestic violence reports by borough and year (2010–2024).

Above is a heat map of domestic violence incident reports from 2010-2024. This reflects what was observed in the previous table; The Bronx and Brooklyn typically tend to have a higher volume of reports, and Staten Island has stayed largely below the average.

6.3.2 Mold Exposure Data

6.3.3 Summary Stats

Show the code
mold_summary <- aggregated_mold_data %>%
  summarise(
    total_complaints = sum(`Mold Complaints`),
    start_year = min(as.numeric(Year)),
    end_year = max(as.numeric(Year)),
    boroughs = n_distinct(Borough),
    avg_monthly = mean(`Mold Complaints`)
  )

kable(mold_summary,
      caption = "Summary of residential mold complaints in NYC (2010–2024), aggregated from 311 service requests.")
Table 6.6: Summary of residential mold complaints in NYC (2010–2024), aggregated from 311 service requests.
total_complaints start_year end_year boroughs avg_monthly
361603 2010 2024 5 445.3239

From 2010-2025, there have been a total of 412,698 mold complaints in residential buildings reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 431 residential mold complaints made to 311 every month.

6.3.4 Borough/Year Distributions

Show the code
mold_by_year_borough <- aggregated_mold_data %>%
  group_by(Year, Borough) %>%
  summarise(total_complaints = sum(`Mold Complaints`)) %>%
  pivot_wider(names_from = Year,
    values_from = total_complaints)

kable(mold_by_year_borough,
      caption = "Total residential mold complaints by borough and year (2010–2024).")
Table 6.7: Total residential mold complaints by borough and year (2010–2024).
Borough 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
BRONX 7973 8726 7384 7778 8295 8434 7946 7510 9510 6460 5953 9378 9513 11438 12847
BROOKLYN 8244 10404 8373 9299 9280 9074 7700 7908 9747 6360 5397 8199 8265 10389 11092
MANHATTAN 3977 4403 3683 3840 4787 5062 4395 4031 4942 3119 2985 5027 5347 6660 6757
QUEENS 3258 3907 2414 2 NA 2 3 1 3 12 1 872 3347 4229 4736
STATEN ISLAND 615 852 730 757 747 688 720 691 864 529 504 777 721 960 770

From 2010 to 2024, Brooklyn and Bronx seem to have the highest complaints of mold in residential buildings, and they compete for first place. Staten Island was found to have the lowest number of residential mold complaints to 311 each year.

Across all 5 boroughs, there are more mold complaints in 2024 than there were in 2010, with the overall trend being an increase in 311 complaints for residential mold.

6.3.5 Heat Map

Show the code
mold_plot_data <- aggregated_mold_data %>%
  group_by(Year, Borough) %>%
  summarise(total_complaints = sum(`Mold Complaints`))

ggplot(mold_plot_data, aes(x = Year, y = Borough, fill = total_complaints)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "wheat", high = "darkgreen") +
  labs(
    title = "Mold Complaints by Borough and Year",
    x = "Year",
    y = "Borough",
    fill = "Mold Complaints"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))
Heat map showing yearly residential mold complaint totals for each NYC borough from 2010 to 2024. Darker shading indicates higher complaint counts.
Figure 6.2: Residential mold complaints by borough and year (2010–2024).

Above is a heat map of residential mold complaints to 311 from 2010-2025. Similar in density to the domestic violence report heat map; The Bronx and Brooklyn seem to have a higher volume of complaints, and Staten Island has stayed largely below the average.

6.3.6 Preliminary Correlation

Show the code
cor.test(aggregated_dv_mold_data$`DV Reports`, 
         aggregated_dv_mold_data$`Mold Complaints`)

    Pearson's product-moment correlation

data:  x and y
t = 42.034, df = 810, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.8051174 0.8485104
sample estimates:
      cor 
0.8280499 
  • Strength: 0.82 (very strong)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

I ran a correlation between domestic violence reports and residential mold complaints to see if there was a substantial relationship between them, and there is! The relationship between the two is positive and very strong, suggesting that a higher amount of DV reports is associated with a higher amount of mold complaints, and a lower number of DV reports is associated with a lower amount of mold complaints. This tells us that both variables move together.

6.3.6.0.1 Let’s visualize this:
Show the code
ggplot(aggregated_dv_mold_data, aes(x = `Mold Complaints`, y = `DV Reports`, color = Borough)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "gray18") +
  labs(title = "DV Reports vs Mold Complaints by Borough",
       x = "Mold Complaints",
       y = "DV Reports") +
  theme_minimal()
Scatter plot of monthly borough-level observations with mold complaints on the x-axis and domestic violence reports on the y-axis. Points are colored by borough and a single linear trend line summarizes the overall relationship.
Figure 6.3: Association between monthly residential mold complaints and domestic violence reports by borough (2010–2024).

However, this only tells us that domestic violence reports and mold complaints co-occur, and it does not tell us anything about causality.

Let’s dive into how domestic violence reports and mold complaints develop over time!

6.5 Statistical Analysis

6.5.0.1 Lagged Data Correlation Analysis

Show the code
cor.test(dv_mold_lagged$DV_next_month, dv_mold_lagged$`Mold Complaints`)

    Pearson's product-moment correlation

data:  x and y
t = 40.376, df = 805, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7940136 0.8397855
sample estimates:
      cor 
0.8181918 

I conducted a Pearson’s correlation test to examine the relationship between mold complaints in one month and DV reports the following month.

  • Strength: 0.81 (very strong)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

The results show a very strong positive correlation, suggesting that months with higher mold complaint counts are associated with higher domestic violence reports in the following month. However, this result is similar to the basic correlation between mold complaints and DV reports (conducted earlier on) and does not account for other factors that might influence the relationship, such as borough.

To better understand how additional variables (such as borough and average resolution time) affect this association, I conducted regression analyses.

6.6 Regression Models

6.6.0.1 DV ~ Mold

Show the code
aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  mutate(Borough = factor(Borough)) %>%
  arrange(Year, Month)

lm_dv_mold <- lm(`DV Reports` ~ `Mold Complaints`,
  data = aggregated_dv_mold_data)

summary(lm_dv_mold)

Call:
lm(formula = `DV Reports` ~ `Mold Complaints`, data = aggregated_dv_mold_data)

Residuals:
   Min     1Q Median     3Q    Max 
-920.1 -164.7  -41.6  150.6  626.1 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       285.96840   13.20783   21.65   <2e-16 ***
`Mold Complaints`   1.02460    0.02438   42.03   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 214.4 on 810 degrees of freedom
  (88 observations deleted due to missingness)
Multiple R-squared:  0.6857,    Adjusted R-squared:  0.6853 
F-statistic:  1767 on 1 and 810 DF,  p-value: < 2.2e-16
Show the code
AIC(lm_dv_mold)
[1] 11025.83

This linear regression model tests the association between monthly residential mold complaints and domestic violence reports across all 5 boroughs and all time periods.

  • Strength: strong (R^2 = 0.67)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

  • AIC: 12160.34

Results suggest a strong and statistically significant positive association between mold complaints and DV reports. On average, months with higher numbers of mold complaints are associated with higher numbers of reported domestic violence incidents. However, this model does not account for differences across boroughs or temporal patterns.

6.6.0.2 DV ~ Mold + Borough

Show the code
lm_borough <- lm(
  `DV Reports` ~ `Mold Complaints` + Borough,
  data = aggregated_dv_mold_data
)

summary(lm_borough)

Call:
lm(formula = `DV Reports` ~ `Mold Complaints` + Borough, data = aggregated_dv_mold_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-257.74  -41.60   -3.30   37.16  396.94 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           933.14621   15.35068  60.789  < 2e-16 ***
`Mold Complaints`       0.19321    0.01945   9.935  < 2e-16 ***
BoroughBROOKLYN        59.11542    9.04790   6.534 1.14e-10 ***
BoroughMANHATTAN     -453.73897   11.13873 -40.735  < 2e-16 ***
BoroughQUEENS        -164.57858   14.30008 -11.509  < 2e-16 ***
BoroughSTATEN ISLAND -770.56770   15.65323 -49.227  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 85.83 on 806 degrees of freedom
  (88 observations deleted due to missingness)
Multiple R-squared:  0.9499,    Adjusted R-squared:  0.9496 
F-statistic:  3055 on 5 and 806 DF,  p-value: < 2.2e-16
Show the code
AIC(lm_borough)
[1] 9543.053

This linear regression model tests the association between monthly residential mold complaints and domestic violence reports within each borough rather than across the city.

  • Strength: very strong (R^2 = 0.94)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

  • AIC: 10571.03

The association between mold complaints and DV reports remains positive and statistically significant. Including boroughs greatly increases the R^2, showing that much of the variation in DV reports is explained by differences between boroughs rather than mold alone. The borough coefficients compare each borough to the Bronx and highlight that DV reporting levels differ substantially across boroughs.

6.6.0.3 DV ~ Mold + Borough + Average Resolution Days

Show the code
aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  left_join(mold_monthly_resolution, by = c("Year", "Month"))

lm_resolution_borough <- lm(
  `DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days,
  data = aggregated_dv_mold_data
)

summary(lm_resolution_borough)

Call:
lm(formula = `DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days, 
    data = aggregated_dv_mold_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-257.59  -42.77   -2.31   38.85  376.84 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           993.32749   17.99077  55.213  < 2e-16 ***
`Mold Complaints`       0.17433    0.01928   9.041  < 2e-16 ***
BoroughBROOKLYN        59.17692    8.85276   6.685 4.33e-11 ***
BoroughMANHATTAN     -460.04883   10.94783 -42.022  < 2e-16 ***
BoroughQUEENS        -177.24615   14.14608 -12.530  < 2e-16 ***
BoroughSTATEN ISLAND -782.97335   15.45107 -50.674  < 2e-16 ***
avg_resolution_days    -2.88188    0.47425  -6.077 1.89e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 83.98 on 805 degrees of freedom
  (88 observations deleted due to missingness)
Multiple R-squared:  0.9521,    Adjusted R-squared:  0.9517 
F-statistic:  2665 on 6 and 805 DF,  p-value: < 2.2e-16
Show the code
AIC(lm_resolution_borough)
[1] 9508.634

This linear regression model continues to test the association while incorporating resolution time of mold complaints.

  • Strength: very strong (R^2 = 0.95)

  • Direction: negative

  • Significance: statistically significant (p<0.05)

  • AIC: 10539.76

Even after controlling for borough and resolution time, mold complaints remain a statistically significant predictor of DV reports. Average resolution days show a statistically significant negative association with DV reports, suggesting that months with longer resolution delays are associated with fewer reported DV incidents. While this may intuitively feel like the opposite of the expected result, this could be due to many factors, such as a lack of borough/community relationship. For instance, this relationship may have been lower in times where city officials took longer to respond to mold complaints. If there was a low borough/community relationship, it is possible community members felt less confident in filing domestic violence reports, out of lack of perceived resources.

Overall, the final linear regression model is our best predictive model for domestic violence reports. With R^2 of 0.95, and the lowest AIC out of the three regression models (10539.76), this model best supports the hypothesis that domestic violence reports can be predicted by residential mold complaints in the same area at the same time.

6.7 Discussion & Insights

Overall, the results show a consistent positive association between residential mold complaints and domestic violence reports. Simple correlations suggest that months with more mold complaints tend to have more DV reports as well. This pattern appears at both yearly and monthly levels.

When borough differences are accounted for in regression models, the relationship between mold complaints and DV reports still remains statistically significant, but weaker. This suggests that while borough-level differences explain much of the variation, mold complaints still have an independent association with DV reports.

Adding average mold resolution time shows that longer resolution delays are associated with lower DV report counts. This may be due to reporting behavior or service engagement rather than a direct effect.

Lagged analyses were used to test whether mold complaints in one month are related to DV reports in the following month. Although the lagged relationship remains positive, it closely resembles the non-lagged results, suggesting that the results may not largely be due to time.

In summary, the analysis suggests a consistent positive relationship between residential mold complaints and domestic violence reports, but borough-level differences and other contextual factors appear to drive much of the variation, highlighting the complexity of environmental and social influences on public health outcomes. In the future, I would like to look at neighborhood-specific trends, or DV/mold rates instead of counts as populations vary across boroughs.