Technical Report For Geospatial Analysis and Outlier Detection.
Methodology
The methodology involved several steps:
Data Loading: The dataset was loaded from the provided Excel file using pandas.
Geospatial Analysis:
Latitude and longitude coordinates were extracted to calculate distances between polling units.
The geopy library was used to compute the geodesic distance matrix between each pair of polling units.
A radius of 1 km was set to identify neighboring polling units for each unit.
Outlier Detection:
For each polling unit, the votes of its neighboring units were identified.
Outlier scores were calculated for each party (APC, LP, PDP, NNPP) by computing the absolute difference between the unit's votes and the average votes of its neighbors.
Sorting and Saving Results:
The dataset was sorted by outlier scores for each party to identify the top outliers.
The results were saved into different sheets of an Excel file for further analysis.
Summary of Findings
The sorted list of polling units by outlier scores for each party revealed significant discrepancies in votes for some units compared to their neighbors. Below is a summary:
Top 3 APC Outliers:
JAURO HAMMA, K.J. JUNGUDO had the highest APC outlier score of 79.
TUKULMA, PRI. SCH. TUKULMA and TUKULMA MAGAJI II, TUKULMA PRIMARY SCH both had an outlier score of 70.
Top 3 LP Outliers:
- Multiple units including JAURO HAMMA, K.J. JUNGUDO, JUNIOR SEC. SCH GARIN SARKI, and MAKERA, KOFAR MAI BURODI had an LP outlier score of 500.
Top 3 PDP Outliers:
MAKERA, KOFAR MAI BURODI had the highest PDP outlier score of 172.
Other significant outliers were USMAN TELA, STATE LIBRARY and JAN GERAWO - G.J. SHEHU, K. J. SHEHU.
Top 3 NNPP Outliers:
MAKERA, KOFAR MAI BURODI had the highest NNPP outlier score of 11.
Other significant outliers included JUNIOR SEC. SCH GARIN SARKI and TUKULMA, PRI. SCH. TUKULMA.
Detailed Examples of Top 3 Outliers
APC Outlier: JAURO HAMMA, K.J. JUNGUDO
Location: Latitude 10.279142, Longitude 11.173061
APC Votes: 79 (outlier score)
Neighbors: ['15-06-06-027']
Explanation: This polling unit had a significantly higher number of APC votes compared to its neighbors, indicating a potential anomaly.
LP Outlier: JUNIOR SEC. SCH GARIN SARKI
Location: Latitude 10.279142, Longitude 11.173061
LP Votes: 500 (outlier score)
Neighbors: ['15-06-06-027']
Explanation: This polling unit had an unusually high number of LP votes compared to its neighbors, making it a significant outlier.
PDP Outlier: MAKERA, KOFAR MAI BURODI
Location: Latitude 10.279142, Longitude 11.173061
PDP Votes: 172 (outlier score)
Neighbors: ['15-06-06-027']
Explanation: This unit showed a significant deviation in PDP votes, indicating it stands out from its neighboring units.
Conclusion
The geospatial analysis and outlier detection identified significant discrepancies in the voting patterns of several polling units. The methodology effectively highlighted units with votes deviating from the expected range based on neighboring units. Key insights include:
Certain polling units had consistently higher or lower votes for specific parties, suggesting potential irregularities.
The use of geodesic distances and outlier detection provided a robust framework for identifying anomalies in the dataset.