Technical Report For Geospatial Analysis and Outlier Detection.

Methodology

The methodology involved several steps:

  1. Data Loading: The dataset was loaded from the provided Excel file using pandas.

  2. Geospatial Analysis:

    • Latitude and longitude coordinates were extracted to calculate distances between polling units.

    • The geopy library was used to compute the geodesic distance matrix between each pair of polling units.

    • A radius of 1 km was set to identify neighboring polling units for each unit.

  3. Outlier Detection:

    • For each polling unit, the votes of its neighboring units were identified.

    • Outlier scores were calculated for each party (APC, LP, PDP, NNPP) by computing the absolute difference between the unit's votes and the average votes of its neighbors.

  4. Sorting and Saving Results:

    • The dataset was sorted by outlier scores for each party to identify the top outliers.

    • The results were saved into different sheets of an Excel file for further analysis.

Summary of Findings

The sorted list of polling units by outlier scores for each party revealed significant discrepancies in votes for some units compared to their neighbors. Below is a summary:

  • Top 3 APC Outliers:

    • JAURO HAMMA, K.J. JUNGUDO had the highest APC outlier score of 79.

    • TUKULMA, PRI. SCH. TUKULMA and TUKULMA MAGAJI II, TUKULMA PRIMARY SCH both had an outlier score of 70.

  • Top 3 LP Outliers:

    • Multiple units including JAURO HAMMA, K.J. JUNGUDO, JUNIOR SEC. SCH GARIN SARKI, and MAKERA, KOFAR MAI BURODI had an LP outlier score of 500.
  • Top 3 PDP Outliers:

    • MAKERA, KOFAR MAI BURODI had the highest PDP outlier score of 172.

    • Other significant outliers were USMAN TELA, STATE LIBRARY and JAN GERAWO - G.J. SHEHU, K. J. SHEHU.

  • Top 3 NNPP Outliers:

    • MAKERA, KOFAR MAI BURODI had the highest NNPP outlier score of 11.

    • Other significant outliers included JUNIOR SEC. SCH GARIN SARKI and TUKULMA, PRI. SCH. TUKULMA.

Detailed Examples of Top 3 Outliers

  1. APC Outlier: JAURO HAMMA, K.J. JUNGUDO

    • Location: Latitude 10.279142, Longitude 11.173061

    • APC Votes: 79 (outlier score)

    • Neighbors: ['15-06-06-027']

    • Explanation: This polling unit had a significantly higher number of APC votes compared to its neighbors, indicating a potential anomaly.

  2. LP Outlier: JUNIOR SEC. SCH GARIN SARKI

    • Location: Latitude 10.279142, Longitude 11.173061

    • LP Votes: 500 (outlier score)

    • Neighbors: ['15-06-06-027']

    • Explanation: This polling unit had an unusually high number of LP votes compared to its neighbors, making it a significant outlier.

  3. PDP Outlier: MAKERA, KOFAR MAI BURODI

    • Location: Latitude 10.279142, Longitude 11.173061

    • PDP Votes: 172 (outlier score)

    • Neighbors: ['15-06-06-027']

    • Explanation: This unit showed a significant deviation in PDP votes, indicating it stands out from its neighboring units.

Conclusion

The geospatial analysis and outlier detection identified significant discrepancies in the voting patterns of several polling units. The methodology effectively highlighted units with votes deviating from the expected range based on neighboring units. Key insights include:

  • Certain polling units had consistently higher or lower votes for specific parties, suggesting potential irregularities.

  • The use of geodesic distances and outlier detection provided a robust framework for identifying anomalies in the dataset.