The study consisted of geospatial analysis of patients with firearm injuries, including both exploratory spatial analysis and spatial cluster analysis. Geospatial analysis was performed using neighborhoods, which were estimated using data from the crowdsourced map platform OpenStreetMap [7]. This was done to improve the resolution of the geospatial analysis beyond the level of communal sections, which is the smallest administrative boundary in the city.
SettingThis study took place in Port-au-Prince, the capital and most populous city in Haiti. The population of the city is estimated to be 987,311 people, with 2,618,894 people in the larger metropolitan area [8]. Haiti is classified as a middle-income country [9]. Despite this, due to its colonial history, frequent natural disasters, and political instability, Haiti has the lowest GDP of the Latin America and Caribbean region [10]. Within Haiti, levels of income inequality are some of the highest in the region, with most extreme poverty occurring in rural regions [10].
Data sources and variablesClinical data, crowdsourced maps, and satellite-derived population estimates constituted the data sources which were combined for this study to facilitate conducting research in a setting with a low penetration of mapping. Information for firearm injury patients and non-firearm injury patients was obtained from the emergency department logbooks from Hôpital de l’Université d’Etat d’Haïti (HUEH), the largest public hospital in Haiti. Scanned handwritten HUEH logbooks were transcribed by the PROTRA Haiti Group. Data were quality checked (< 1% error rate), cleaned, grouped, and joined using the R Statistical Software [11] through an iterative process. Variables extracted for use in geospatial analysis included the emergency department visit date (giving the resulting dataset a temporal resolution of 1 day), patient diagnosis (coded as firearm injury or not a firearm injury), and a free text patient home address field. Prior research has demonstrated that home addresses can serve as a proxy for the location of injury for trauma patients [12].
The address field was geocoded using Google’s geocoding application programming interface (API), [13] which was executed via an R script utilizing the ggmap package, [14] and then reviewed by Dr. Jean Pierre, who has local expertise in the topography of the city through prior work in urban planning. Results from the geocoding API output were not filtered based on API output parameters, given the uncertainty of the APIs performance in Port-au-Prince. Additionally, filtration was unnecessary since each output entry from the API was reviewed manually by Dr. Jean Pierre to ensure accuracy. Manual review by Dr. Jean Pierre consisted of three steps. First, the free text fields containing address information were reviewed to ensure sufficient information was available for inclusion in the analysis at the neighborhood level. Given variable completeness of addresses in the free text field, addresses which could not be confidently located at the neighborhood level were excluded. Second, for addresses which were included, Dr. Jean Pierre then reviewed the matched GPS coordinates and compared this with the free text address. To facilitate this comparison, Dr. Jean Pierre referenced a large-scale map of the generated neighborhoods, as well as digital maps including Google Maps and OpenStreetMap. For data points which matched correctly, this completed the manual review. For complete addresses which matched incorrectly, there was an additional third step. For these locations, Dr. Jean Pierre manually appended a GPS coordinate for a point corresponding to the address. This was done by Dr. Jean Pierre manually placing a pin on the correct location using Google Maps, and then appending the GPS coordinate for the pin to the dataset. In these instances, the corrected, manually appended GPS coordinate was used for all data analysis. This process of manual review allowed the correction of GPS coordinates with enough precision to be accurately aggregated at the neighborhood level, without utilizing additional field work to geocode address fields. Furthermore, manual review and correction of GPS coordinates by Dr. Jean Pierre was needed to reduce systematic exclusion of patients from informal settlements, for which it was observed that the API performed poorly. Given safety concerns in many of the areas most impacted by firearm injuries, field coding of addresses using a GPS enabled device would not have been possible.
Data which were used to construct the neighborhood estimations was derived from OpenStreetMap, [7] and was determined to be the most accurate data source available for information on neighborhoods. In Port-au-Prince, neighborhoods were encoded in OpenStreetMap as a tag of the place parameter called “suburb”, defined as “a part of a town or city with a well-known name and often a distinct identity” [15]. Suburbs were encoded as nodes in OpenStreetMap. A list of suburbs in Haiti was downloaded from OpenStreetMap using the Overpass Query Service [16] as a geojson file. The query parameters were developed with the assistance of the R osmdata package, [17] and the resulting geojson file was imported using the R sf package [18]. As recent census data were not available, population estimates for Haiti derived from satellite imagery were obtained from WorldPop. Specifically, the 2020 constrained, top-down dataset was used [19]. The 2020 dataset was selected due to the best overlap with the study period. The constrained dataset was selected over the unconstrained as WorldPop reports that the unconstrained dataset tends to underestimate the population in urban areas. Lastly, the top-down dataset was chosen to facilitate comparison with other literature that may rely on United Nations reported population estimates. Furthermore, the use of remote sensing population data enabled the construction of neighborhoods that were smaller than available administrative boundaries in Haiti. The constrained population count format from WorldPop was obtained as a raster image stored as a geotif file with a resolution of 100 m. Population data in the raster format was summed and aggregated in vector format at the neighborhood level using the function raster::extract available in the R package raster [20]. Lastly, a shapefile for existing administrative boundaries in Haiti was downloaded from the Humanitarian Data Exchange [21].
Inclusion criteria and biasAll patients which presented to the emergency department of HUEH from November 22nd, 2019, through December 31st, 2020, were considered eligible for the study. Additional inclusion criteria for the geospatial analysis were diagnosis of firearm injury and a home address within the study area. The study area was defined as 16 communal sections (the smallest administrative boundary in Haiti) selected by Dr. Jean Pierre to encompass the urban and densely populated parts of Port-au-Prince which would be susceptible to firearm violence. In contrast, use of the official definition for the Port-au-Prince metropolitan area would include a number of less densely populated communal sections encompassing suburban and rural areas. Local expertise suggests less densely populated communities are less susceptible to firearm injuries. Therefore, including less densely populated regions would falsely inflate clustering in urban areas. This is consistent with prior research in the United States, which shows that firearm assaults occur at a higher per capita rate in urban areas [22]. Another advantage of manually selecting urban communal sections for inclusion is that it avoids the exclusion of densely populated areas which do not adhere to formal definitions of the city proper, such as the densely populated informal settlements on the hillsides surrounding the city. The resulting study area is summarized in Fig. 1.
Fig. 1Definition of the study area and demonstration of the three administrative boundaries in Haiti. Boundaries within the study area reflect communal sections. The department of Ouest is shaded dark gray and shows boundaries between communes. The light blue area represents Haiti’s other departments, the highest administrative level. The red dot marks the location of HUEH. Graphic by author
Patients lacking complete address information were excluded. Additionally, there was an implicit exclusion criterion of patients who were not present in the dataset, due to a lack of recording in the logbook or a loss of the logbook page prior to scanning. For the study period (406 days), there were only 273 days with any patient data recorded. Therefore, we can estimate that the records for the study period are 67% complete. Missing data was observed primarily in consecutive periods, ranging from two weeks to four weeks in duration. These periods represent a combination of times when the hospital was closed or when there were missing logbooks. Missing data does not follow any obvious pattern to the research team; however, the frequency of missing data does appear to increase as the study period progresses. It is possible that missing logbook pages will introduce bias—for example extremely high-volume days could be less likely to have a completed logbook. Additionally, the exclusion of missing or incomplete addresses may have preferentially excluded patients with life-threatening injuries, patients from informal settlements, or patients with lower levels of formal education.
Construction of neighborhoodsTogether, the use of satellite-derived population estimates, and crowdsourced mapping allowed the construction of neighborhood estimations. Neighborhoods were estimated by converting point data representing OpenStreetMap “suburbs” to boundaries using Thiessen polygons (also known as Voronoi diagrams), which were then spatially joined to the boundaries of the study area. This is a reasonable application of Thiessen polygons, as the OpenStreetMap “suburb” point is placed in the center of the neighborhood area [15]. Additionally, Dr. Jean Pierre reviewed the maps to ensure they provided reasonable estimations. This was done by comparing existing maps and local knowledge to the generated neighborhoods, to ensure they provided a realistic representation of boundaries within the city. It was noted that the generated neighborhoods were most accurate in the center of the city, as the use of Thiessen polygons creates greater distortion near the periphery of the study area. This is due to inherent distortion near outer boundaries due to the mathematics underlying the generation of Thiessen polygons, as well as the reduced density of the “suburb” points further from the center of the city. These neighborhoods offered several advantages over existing administrative boundaries. First, they permitted analysis at a higher spatial resolution. Using neighborhood estimates, the included 16 communal sections were able to be converted to 106 neighborhood estimates. The point data obtained from OpenStreetMap was converted to Thiessen polygons in R by using functions in the sf package [18]. Specifically, the Thiessen polygons were generated using the st_voronoi function, and were split using the st_cast function. The resulting polygons were then trimmed to the study area by using the st_union function to perform a spatial join. This resolution allowed for operationally relevant results of this study. Secondly, the use of neighborhoods allowed for an increased statistical power, permitting the use of geospatial cluster analysis. Lastly, these-crowdsourced derived neighborhoods may better capture the reality of the divisions within Port-au-Prince. This is especially true for the many informal settlements of the city, which disproportionately face a high burden of firearm injuries, and often cross arbitrary administrative boundaries. The estimated neighborhoods are summarized below in Fig. 2.
Fig. 2Thiessen polygon estimation of neighborhoods using nodes from OpenStreetMap. [7] Graphic by author
AnalysisGeospatial analysis included both exploratory and cluster analysis. Exploratory analysis included presentation of a kernelled surface, prepared using the R package ggplot2’s stat_density_2d function [23]. Next, exploratory analysis presented firearm case counts and population adjusted rate for each of the communal sections included in the study area. After this point, all additional analysis was performed at the neighborhood level, which were constructed using Thiessen polygons as previously detailed. The patient dataset contained addresses which had been converted to the spatial resolution of point data as described previously. This was then aggregated to the neighborhood level using a spatial join. The patient dataset used had a temporal resolution of 1 day; however, for purely geospatial analysis, such as Moran’s tests, all cases during the study period were included in the neighborhood case total. Cluster analysis began with a global Moran’s I test for spatial autocorrelation, followed by local Moran’s I test. The global Moran’s I test was performed using the R spdep package [24,25,26]. Local Moran’s I testing was performed with an alpha of 0.10, with three different levels of correction for multiple testing (in order of least to most conservative: unadjusted, performed in the R spdep package; manually corrected for a false discovery rate (FDR) [27] corresponding for the 106 neighborhoods, performed in the R spdep package and corrected using the base R stats package[11]; corrected with a FDR corresponding to the spatial weights matrix, as implemented in the R rgeoda package. [28] Three levels of adjustment were performed due to concerns for statistical power, as well as to allow comparison with evolving statistical standards of geospatial analysis. Lastly, cluster analysis was performed in the SaTScan software [29]. While both the local Moran’s I test and SaTScan can be used to describe local patterns and detect hotspots, both were included for two reasons. First, exploratory geospatial analysis was performed in a sequential manner. The local Moran’s I test was performed first, after which the findings of spatial autocorrelation were further explored using the SaTScan analysis. Given the time and computing resources required to perform SaTScan analysis, this was felt to be a worthwhile step. Secondly, and more importantly, the local Moran’s I test and SaTScan provided subtly different insights into the underlying epidemiology—namely, that the local Moran’s I test also detects high-low and low-low spatial autocorrelation, while SaTScan was only used to report positive spatial autocorrelation of cases (which could be comparable to high-high clusters in the local Moran’s I test). Models were run using a discrete Poisson probability model scanning for areas of high rates only [30]. Clusters were limited in size to 50% of the at-risk population. The model was run twice, once using a geospatial analysis (purely spatial) and once using a geospatial-temporal analysis. Time aggregation was performed at the day level, with a limit of temporal clusters to 50% of the study period. Clusters which were purely temporal were not permitted in the SaTScan analysis. SaTScan performed 999 replications, and the threshold for statistically significant clusters was set at < 0.001. All graphics included in this paper were generated by the author by using the R Statistical Software [11]. The following R packages were used in the generation of the figures: ggplot2 [23], ggspatial [31], sf [18], egg [32], tmap [33], tmaptools [34], wesanderson [35], and basemaps [36].
Comments (0)