How Do I Hide The User’s IP Address In SGTM But Still Get Geo Data In GA4?

In this article, you'll learn how to override the user's IP address with a generic IP address from their country. This can be done with the help of a Transformation in server-side Google Tag Manager.

The question we’re going to be looking at today is inspired by our course, Server-side Tagging In Google Tag Manager.

How do I get geolocation data in Google Analytics 4, even if I choose to hide the visitor’s IP address in server-side Google Tag Manager?

Platforms like Google Analytics 4 offer details about each visitor’s geographic location, often to the accuracy of the visitor’s city.

This information is derived from the user’s device IP address.

However, due to privacy and security reasons, you might want to avoid forwarding your visitors’ IP addresses to analytics vendors.

This, then, leads to the disconnect: how would you be able to retain the benefits of IP-based geolocation without exposing the user’s actual IP address to the service?

In this article (and the accompanying video), I’ll show you how to do this by taking a leaf out of the “differential privacy” playbook.

Instead of forwarding the visitor’s actual IP address to the analytics server, you’ll instead forward a generic IP address from that country’s IP address ranges to the server. That way the user will still be geolocated correctly on the country level, but you won’t forward their actual IP address.

Video walkthrough

If you prefer this walkthrough in video format, you can check out the video below.

Don’t forget to subscribe to the Simmer YouTube channel for more content like this.

video
play-rounded-fill

How does it work?

When you collect hits to your server-side Google Tag Manager environment, chances are that there’s a load balancer or a reverse proxy in front of the setup. If you use the default App Engine installation, this is automatically handled for you. If you use Cloud Run, you can easily configure a Google Cloud Load Balancer to sit in front of the setup.

These “buffer” services often come with built-in functionality to automatically geolocate the visitor based on the IP address in the HTTP request headers.

This information is then added the the incoming request headers, and it’s available in server-side GTM, too.

Geolocation headers in an App Engine request
Geolocation headers in an HTTP request to an App Engine service

Within server-side GTM, you can then build a transformation that takes the user’s geolocated country code (e.g. FI for Finland) and maps it to a generic IP address from that same country.

This IP address is then used instead of the user’s actual IP address when dispatching data to vendor services that support IP overrides – Google Analytics 4 is such a service.

Country to IP architecture diagram

To do all this, the biggest question is how to build the map that takes the user’s (assumed) country and returns a generic IP address from that country.

There are many different ways to do this, and I discuss alternatives at the end of this article. Here, I’ll share with you a Google Tag Manager Lookup Table variable that you can use to do the mapping (with some caveats!).

Download the container export

First of all, download this container export. Right-click the link and choose Save Link As… to download the file.

country-to-ip-export

Ideally, import this in a new workspace with the Merge option. For more instructions on how to import a container file, see this guide.

This container file includes two variables and one transformation:

  • Request – X-Appengine-Country (Variable) – a Request Header variable that retrieves the two-letter country code (e.g. FI for Finland) from App Engine headers.
  • Lookup – countryCode to countryIp (Variable) – a Lookup Table variable that matches the value of the X-Appengine-Country variable with a generic IP address from that country.
  • Augment – countryCode to countryIp (Transformation) – an Augment Event transformation, that overrides the ip_override event data value with the generic IP address from the lookup.

Once you’ve imported these to your container, you are technically done! You can preview and test the setup immediately (see below for instructions on how to do this).

Preview the import/export

If you are using some other service than App Engine, you need to make adjustments so that the {{Request – X-Appengine-Country}} returns the country code from the correct HTTP request header (or other source!).

For example, if you’re using a Google Cloud Load Balancer with Cloud Run, and you’ve added the extra headers to the backend service(s), you’d need to configure the Request Header variable to reference X-Gclb-Country instead.

If the information is not available in HTTP headers, you need to figure out some other way to geolocate the user’s actual IP address within the Server container. You could utilize a commercial API for this, for example.

Test that the transformation works

Take the Server container to preview mode and send a GA4 request to it.

In this article, I’m assuming you already have a Google Analytics 4 setup running through the Server container. You can still test the setup using a dummy request, but you do need the client to generate an event, and you need a tag to fire. Otherwise you won’t be able to preview the transformation. If you’re using GA4, make sure you do not redact the visitor’s IP address in the GA4 tag settings.

First, make sure that the Incoming HTTP Request has the automatically generated geolocation headers (or whatever you decided to use as the input for the lookup table). Make note of the user’s IP address, too.

Preview the test GA4 request

In the screenshot above:

  1. The event name generated by the GA4 client
  2. The Incoming HTTP Request details under the Request tab
  3. The X-Appengine-Country header with the two-letter country code representing the user’s geolocated country
  4. The X-Forwarded-For header with the user’s (assumed) IP address and any proxy/gateway addresses that are added to this header

Next, open the Tags tab in Preview mode, and select a tag that fired on the event. Scroll down to its Transformations and click open the Augment – countryCode to countryIp transformation. You should see the user’s actual IP address replaced with a generic country IP address:

IP address replaced in the transformation

Finally, open the Request tab in Preview mode again, and look at the outgoing request to Google Analytics 4. The value of the &_uip URL parameter should match the transformed value.

Value of the _uip parameter matches the transformed IP

If you see this, then the setup works. Your visitors will now be geolocated to the “correct” country, but you are never sharing their real IP address with Google Analytics.

If you want to skip the technical bits, scroll down to the Summary chapter to read about the caveats with this approach as well as some brainstorming on how the solution could be improved.

Technical details

To build the Lookup Table, I did the following things.

First, I needed to find a list of country IP ranges. I downloaded the full database dump from the Country IP Database hosted on cable.ayra.ch. The geolocation data in this database is sourced from IP2Location.

This database dump is huge, as a single country (like the USA) can have more than 10,000 IP ranges associated with it. For my purposes, I only needed one single IP range per country.

The easy approach would have been to just take the first IP for each country. However, different geolocation services can locate the same IP to a different country, for some reason! I had to pick a service (I chose ip-api.com because it has a free API) and then an IP from the list of country IPs that matches the country in the ip-api.com database.

For this purpose, I wrote a Python script that iterates over the huge JSON file, sending requests to ip-api.com, and pulling in the first IP per country that returns a geolocation match from the API.

Python script for building the JSON

Python is great for handling huge source files!

Next, I wrote another simple script that mapped the country code and the IP address to a Lookup Table variable in the container export JSON. The end result of this effort is in the container export you downloaded earlier.

Summary

Let’s start with the obvious conclusion: I don’t know what geolocation service Google uses. I don’t know if it matches the results of ip-api.com. It’s possible that what I think is a United States IP (1.32.232.0) is instead a Canadian IP or a Vietnamese IP in Google Analytics.

In the video, I also send a custom dimension of the user’s geolocated country in App Engine. That way I can compare what the App Engine country is versus what Google Analytics resolves the generic IP address to.

Even this isn’t perfect. What if the geolocation service that App Engine uses is better than what Google Analytics uses? Or vice versa? Or what if they’re both incorrect?

So then we arrive at the second, just as obvious conclusion: IP address-based geolocation is inaccurate. It really is. And it gets worse with things like VPNs and proxies and Apple’s IP address obfuscation.

Furthermore, it seems like IP address ranges can shift from one location to another over time. The Lookup Table variable you download in this article might not be accurate any more in a month’s time, or a year’s time.

One huge improvement would be to build a custom API service for this mapping. You could do this easily with a Google Cloud Function that you communicate with through server-side Google Tag Manager. Or even a Firestore Document.

That service should have a built-in validator, which queries the range of IP addresses against a geolocation service once a week, for example. That way it would replace outdated or incorrect IP addresses with information that’s more up-to-date.

But, returning to our second conclusion above, I’m not sure if this is worth the hassle. If you go through all the trouble to build something like this and the majority of your users hide their IP addresses at the source, then your efforts will have been in vain.

However, as a technological exercise that utilizes the power of server-side Google Tag Manager’s transformations, I think this solution is pretty darn neat. Would you agree?

4 Responses

  1. I didn’t understand the necessity of this practice. Why should we hide the user’s IP address and mislead analysis platforms and ourselves?

    Example: The user entered from city x, Los Angles, from the country USA, and GA4 analyzed it and reported it to us.

    In this application: The user entered from the US country Los Angles from city x, we took the user’s IP address and replaced it with the general IP address. We no longer know which city or state the user entered from?

    Also, how does this prevent methods such as VPN or proxy and enable us to find the real IP?

    Example: The user is in the USA country -> connected to Turkey via VPN -> visited our site. In this method, how can you understand that the user is entering from the USA?

    1. The purpose is to completely anonymize the user’s source. IP address is one vector that is frequently used for fingerprinting, even if the last octet is cleared (as GA4 does).

      VPNs and similar do the same job of hiding the original IP address of the user. This solution adds another layer of protection on top of that.

      “Why should we hide the user’s IP address and mislead analysis platforms and ourselves?” Because some of us prioritize data protection & security over the data hoarding of analytics platforms.

  2. Hi Simo, very interesting video/blog post. I was thinking about augmenting geolocation with additional hints from the client side. E.g. a country selector on a website could push a custom dimension with the corresponding TLD; or a URL parameter based on the targeted country in Google Ads (if there’s only one per campaign) could also send TLD info via GTM to GA4 via event parameters or user properties, and so on. It also depends on what makes more sense (e.g. geographic location versus search-based intent or user choice). Especially because of the increase of spoofing and obfuscation methods at different levels of the device/browser/network stack as you’ve pointed out the latter may become a sensible approach for the future?

    1. Hi!

      I think country selector / locator vs. the user’s country of origin is probably a weak correlation to work with, if you want to measure the user’s geographic location. As a search intent / browsing preference it’s of course quite good.

      Geolocating users without their explicit permission (for example through the browser’s own location API or by bluntly asking the user where they’re from) is probably going to have less and less significance over time, as the final IP addresses visible to vendors are getting unlinked from the user’s actual network location.

Thoughts? Comment Below 👇

Your email address will not be published. Required fields are marked *

More from the Simmer Blog

All Google Analytics 4 events share the timestamp of their batch. With some customization, you can add individual timestamps to (almost) all events.
How to use Cloudflare to proxy your server-side Google Tag Manager subdomain to cope with WebKit’s truncation of first-party cookie lifetime.
In this article, we take a look at the window.postMessage() method and how it can be used to dispatch messages between two windows or frames.
Hide picture