Here at Arrikto our customers quite often present us with very interesting issues that can affect a lot of parts of the stack in a cloud native platform.
In this blog post we aim to expose the problems that a customer bumped into when accessing a web server that is parsing the X-Forwarded-* headers, in this particular case RStudio, when served behind multiple proxies.
The specific bugs weâll describe are specific to RStudio, similar bugs can occur with any such server that parses the X-Forwarded headers. While Iâll be talking from the perspective of an engineer focused on Kubeflow this could happen in any platform with multiple proxies.
Readers will get a slightly better understanding around the X-Forwarded headers, how to debug them, and lastly how to successfully use RStudio behind multiple proxies. In this case, in a production Kubeflow cluster.
The problem
We initially noticed that suddenly we couldnât access our RStudio servers in our cluster, with cryptic errors like the following:
The first question that came up while seeing this was, what kind of URL is this? After looking a little bit at the dev-tools in the browser we saw the following 302, which was the beginning of our debugging journey.
From the above, we immediately see that the Location header has a weird value. It has the 2 host values separated by a comma! This means that RStudio is affected by some piece of host-related information and ends up sending such a redirect to the client.
At this point I was scratching my head a little bit. Why would RStudio suddenly decide to give me such a redirect in my fresh new cluster? But of course, in the cloud world, nothing happens âsuddenlyâ. The only remotely relevant mechanism I knew that could affect the generated host was the X-Forwarded-Host header. So at least there was a next step to check, even if it was a hunch.
So, I rushed and created an echoserver in my namespace just to check the headers that end up in the workloads. And voila!
Then I tried to send a request to my RStudio Pod from another Pod in the cluster, playing around with the X-Forwarded-Host header. Indeed, when I wouldnât set a value, or set only one value and not a list, in X-Forwarded-Host, then the URL would be correct.
So this means 2 things:
- There are intermediate proxies, handling the X-Forwarded headers between my browser and the RStudio Pod.
- RStudio fails to properly handle X-Forwarded-Host, if it contains multiple values.
Intermediate proxies? X-Forwarded headers?
Before diving into the next steps of the journey we bumped into with RStudio, letâs first expose some more information regarding the nature of the X-Forwarded headers.
In the cloud, and Kubernetes, all of our applications are deployed behind proxies. No Pods are exposed directly to the outside world. But, this means that the final Pods never get access to information such as:
- The IP of the client that made the request
- The host the client used when making the request
- The protocol the client used when making the request
The most common case that this information is needed is when an app needs to generate location-dependent content or links. An example of this can be a 302 request for redirecting users to authenticate. But to do this, if the app wonât use a relative path, then it will need to know the host that the client is using to reach the server.
To mitigate this, there is a list of, non-standard headers that aim to preserve this information, when a request is forwarded from proxies. These are the X-Forwarded headers.
While these headers are the de facto standard for relaying this information, itâs important to note that they are not part of any current specification. The standardized version is the Forwarded header.
RStudio is one of the apps that relies on these headers in case it is exposed behind proxies, to be able to get that client information.
Whereâs the catch?
So at this point weâve identified
- The information that gets lost when there are proxies between the client and the server
- Why this information could be useful to the server
- How to successfully pass this information to the server
With this we took a look at RStudio and saw that it has support for running behind proxies. So if we have all the pieces, why did RStudio fail to reconstruct the correct URL?
The catch is that since these X-Forwarded headers are currently not part of a standard. This means that people can deviate a little bit on how to use them. The most common scenario, that bit us here as well, is to use the X-Forwarded-Host to contain a list of hosts rather than the original host requested by the client.
The most common use-case for this is being able to trace the chain of hosts used through routing. For example an edge proxy might use a different internal host when routing the request inside the internal infrastructure.
And this is what triggered the first bug with RStudio. I would guess the devs from RStudio would expect that the X-Forwarded-Host header would only contain a single value. Which is very fair, considering what the common understanding is around these non-standard headers.
Luckily there was an existing issue for this exact problem
https://github.com/rstudio/rstudio/issues/10965
The final obstacle
At this point I wanted to verify that if Iâd manually send a request where the X-Forwarded-Host would be correct then everything would work as expected, my understanding would be accurate and Iâd respond in a timely manner to my ticket. But, deadlines exist to be missed.
So I spin up an RStudio instance in a container and hit it with the following request:
Which to my surprise returned a 302 with Location: /auth-sign-in?appUri=%2F, but I was expecting a URL that would have the correct prefix:
Location: https://localhost:8787/rstudio/kubeflow-user/kimwnasptd/auth-sign-in?appUri=%2F
At this point, being used to the nature of the X-Forwarded headers, I tried to set a single value for X-Forwarded-Proto. And it worked. So itâs the same story with X-Forwarded-Host. The âcommonâ understanding of these non-standard headers would be to only track the first value used by the client. But in this case we were appending the different protocols as well, for tracing purposes like the X-Forwarded-Host.
So another issue:
https://github.com/rstudio/rstudio/issues/11010
The verdict
Thankfully the RStudio community was very responsive and fixed both of the issues for the 2 headers. Weâve also updated the RStudio images in Kubeflow 1.7 with the above fixes https://github.com/kubeflow/kubeflow/pull/6890.
The goal of this post was to mostly expose readers to the world of proxying, the information that could be lost between proxies as well as the tools we have to preserve this information. In this case, the non-standard X-Forwarded headers.
Also, one more lesson is that if a feature is not based on a well defined standard then itâs bound to be used in unpredicted ways. And unfortunately, the X-Forwarded headers are such a case.
So hopefully at this point youâll have a better understanding of when and how these headers could be used, as well as things to look at in case you bump into weird URLs returned by a server in the cloud native world.