Introduction
When you query an IP address, you expect a precise location in return. But "precision" in IP geolocation is a spectrum, not a binary. At HostInfo, we process over 5 million geolocation queries daily — and accuracy is the metric we obsess over.
In this post we walk through the architecture of our geolocation pipeline: the data sources we ingest, how we cross-reference them, the confidence scoring we apply, and why the "last mile" (city-level accuracy) is still genuinely hard.
The Data Sources
Our geolocation engine draws from four primary streams:
- RIR allocation data — ARIN, RIPE, APNIC, LACNIC, and AFRINIC publish raw IP-to-org mappings. These are authoritative but coarse (country or ASN level only).
- BGP routing tables — Team Cymru's BGP feeds tell us which ASN announces a prefix. Combined with RIR data, this gives us the network boundary.
- Commercial GeoIP databases — MaxMind GeoLite2 and IP2Location give city-level estimates based on historical probing and user-reported data.
- Active crawl signals — Our own crawlers collect PTR records, TLS certificate SANs, and HTML metadata that anchor domains (and thus IPs) to physical locations.
No single source is authoritative below the country level. Accuracy comes from consensus, not from trusting any one vendor.
Cross-Reference Pipeline
For each IP block we receive, the pipeline runs four parallel lookups and feeds results into a weighted voter:
- RIR country code (weight: 0.9)
- BGP AS country (weight: 0.7)
- GeoIP city estimate (weight: 0.5)
- Crawl-derived anchor (weight: 0.8)
The voter produces a confidence_score (0–1) per field. Country scores above 0.8 are served directly. City scores below 0.5 are suppressed to avoid misleading users.
Why City-Level is Hard
The fundamental problem is that IP blocks are assigned to organizations, not buildings. A CDN node in Frankfurt and the same CDN's headquarters in San Jose may share adjacent IP ranges. BGP announces them from a single ASN. Without active probing or user-consent signals, distinguishing them is impossible.
We use four signals to improve city accuracy:
- PTR records: hostnames like
fra01.cloudflare.netencode datacenter location. - TLS SANs: certificates issued per-region leak geography.
- RDAP remarks: some RIRs include city in structured RDAP data.
- Latency triangulation: active pings from known PoPs can localize an IP within ~200km.
Confidence Scores in the API
Every HostInfo geolocation response includes a geo.confidence field. Values above 0.85 indicate city-level certainty; 0.5–0.85 means country-level only; below 0.5 means "best guess, use with caution."
This transparency is intentional. We'd rather you know the limits than build systems that fail silently because an IP moved datacenters.
Conclusion
IP geolocation will never be GPS-precise — the internet wasn't designed to broadcast physical location. But with layered data sources, honest confidence scores, and continuous crawling, we get to 95%+ country accuracy and ~80% city accuracy for commercial IP blocks.
Have questions about how we handle edge cases? Reach out — we love talking infrastructure.
Maria Chen
Staff writer at HostInfo. Specializes in distributed systems and network intelligence.