In telecom, one of the most fundamental challenges is getting accurate and up-to-date addresses and keeping those in sync across the tech stack. This issue is more complex than it seems, as address data is riddled with inconsistencies and gaps across multiple sources. Even if a team is able to get accurate data through manual fieldwork or purchasing third-party datasets, matching up the “right” data and making corrections to all the systems is a challenge. BBI has successfully created some matching algorithms, and a key component has been the geohash.
The Problem with Address Data
Addresses can vary significantly across databases due to differing formats, outdated information, or simply human error. Even within the same dataset, there might be inconsistencies in how addresses are recorded (e.g. 123 Fake Street vs 123 fake st), making it difficult to match them accurately. This lack of reliability poses a major problem in the telecom sector, where precise address data is critical for deploying services, managing infrastructure, and delivering exceptional customer experiences.
A Solution: Leveraging Geohash
At BBI, we’ve tackled this challenge by leveraging an innovative tool: the geohash.
A geohash is a hierarchical spatial data structure that encodes geographic coordinates (latitude and longitude) into a short string of letters and digits. This encoding enables efficient storage, retrieval, and comparison of location data. Geohashes divide the Earth’s surface into a grid, with each level of the geohash providing more precise location details.
For example:
- A geohash of “dr5ru” represents a large area in New York City.
- Extending it to “dr5ru7” zooms in on a smaller area, refining the location further.
If you want to try the algorithm out yourself, there are a few online calculators such as movable-type.
This hierarchical structure makes geohashes ideal for proximity-based matching and verification. By adding more precision we can keep locations such as homes unique and easily store the datapoint.
How We Use Geohash for Address Verification
We’ve developed a system that uses geohashes to match and verify addresses between systems within specific distance thresholds. Here’s how it works:
- Threshold Matching: We begin by encoding the geographic coordinates of an address into a geohash. Then, we compare geohashes within a defined distance threshold (e.g., 50 feet, 100 feet, and 500 feet).
- Address Name Checks: If the geohashes match, we check additional information on the address name itself such as street name, road, and zipcode. This can resolve issues such as “Road” vs “RD” which seem minor but are large problems at scale.
- Data Cross-Verification: If there are still errors and we have additional data, we further cross-verify using additional data points such as a homeowner last name, or email when available. For example if a customer took a survey for an upcoming fiber build using a demand aggregation tool and later on we want to match this address in the OSS/BSS – we can attempt to match by their last name. This layered approach ensures higher accuracy while reducing false positives.
- Iterative Refinement: If no match is found within the smallest threshold, the search expands to larger geohash areas, maintaining balance between precision and inclusivity.
Why Geohash Works
Geohash provides a robust framework for simplifying and standardizing the way location data is handled. Unlike traditional address matching methods that rely solely on text-based comparisons, geohash works with spatial data, allowing us to account for minor inconsistencies in address formats while focusing on the actual geographic location.
We also get the benefit of GIS without the complexity – handling a lat/lon pair such as 40.7034503,-74.0158587 is not quite as easy as dr5re9z. The latter can be a primary key in a database, and just dropping digits (e.g. dr5re9) lets us adjust the accuracy. No GIS degree required.
By combining geohashes with GIS tools and supplementary customer data, we’ve created a solution that is not only accurate but also scalable—essential for handling the vast and dynamic datasets required in broadband analytics.
Conclusion
Accurate address data may be one of the toughest nuts to crack in the broadband industry, but innovative solutions like geohash are helping us make strides. At BBI, we’re committed to pushing the boundaries of technology to solve these challenges and deliver data-driven insights that power better decisions.
Have questions or want to learn more about our approach? Reach out to us—we’re here to help!