Challenge
Horse Sport Ireland (HSI), the national governing body for equestrian sports in Ireland, was facing significant challenges with their legacy system for managing passport registrations, endorsements, importation, and ownership applications.
The outdated, 15-year-old custom-built platform relied on manual data input with limited internal controls. This led to serious data integrity issues—such as duplicate records for animals and owners, as well as inaccurate progeny information—ultimately compromising the quality and reliability of their data.
These inefficiencies presented a critical roadblock as HSI prepared to migrate to a new, modern database platform, one that required a clean and accurate dataset for a successful transition.
Solution
The team applied advanced machine learning techniques to identify duplicate records for both the animal portfolio and accounts table. This initial analysis revealed numerous inconsistencies, including mismatched progeny data and incorrect parentage information, which were flagged for further review.
A process was developed to isolate and validate candidate duplicate records, ensuring no critical information would be lost during the clean-up process. This enabled the presentation of candidate duplicate records to HSI so they could be isolated and removed from the data table.
To prevent linking errors, the team cross-referenced progeny data with verified parent records, re-mapping incorrect relationships and making sure that all associated records were updated accurately.
Validation rules (business rules) were implemented on a holistic level to highlight anomalies that may indicate incorrect links between data. For example, where progeny are older than parents or where male horses were incorrectly identified as the Dam. This ensured that HSI had a clean data set to migrate to their new platform.
Impact
Through the data cleansing project, the team identified that 20% of the population was duplicate data. Once removed, HSI were faced with a reduced data set for review, ensuring a more efficient project for their team.
The project also successfully implemented validation rules that not only mitigated against erroneous data going forward but that set the foundations of clean data standards for migration to their new system.
What was once seen as a cumbersome project had been greatly reduced after the team took control of the data set.