Methodology

Data Collection

Our automated pipeline collects news articles from major Irish news publications every 6 hours via Google News RSS feeds. We search for articles containing keywords related to road fatalities in Ireland.

Data Extraction

Each article is processed using AI (Claude) to extract structured information:

  • Date of incident
  • Location (county, town, road)
  • Number of fatalities
  • Vehicle types involved

Geocoding

Locations are geocoded using OpenStreetMap's Nominatim service to place incidents on the map. When exact locations cannot be determined, we fall back to town centroids or county centroids.

Deduplication

Multiple news articles often cover the same incident, sometimes days or even months apart. To avoid double-counting, we use a weighted scoring system that compares incidents across five signals: date proximity, fatality count, location (town, road, and coordinates), summary text similarity, and vehicle types involved.

Incidents scoring above a high-confidence threshold are automatically merged, while borderline cases are flagged for manual review. When incidents are merged, all linked source articles are preserved and the most complete data from each record is retained.

Despite these measures, some duplicates may slip through and some distinct incidents may share similar characteristics. Our figures should be treated as best-effort estimates rather than exact counts.

Limitations

  • Not all fatalities are reported in news media
  • There may be a delay between an incident and news coverage
  • Location accuracy varies based on news report detail
  • This is not official data - refer to the RSA for official statistics

Updates

Data is collected automatically every 6 hours. The site rebuilds and deploys whenever new data is available.