summaryrefslogtreecommitdiff
path: root/issues/buggy-use-of-urljoin.gmi
blob: b6e27d66c1ed520f0d657fd3de646e6348850b19 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Buggy Use of `urllib.parse.urljoin`

## Tags

* type: bug
* priority: low
* assigned: fredm
* status: closed, completed
* keywords: url

## Description

This issue was circumvented by ensuring all the configuration settings that are URIs end with a trailing slash. This ensures that the function can continue working as documented and expected.

----

The
=> https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin `urllib.parse.urljoin` function
will extract the base url from the first argument, and will lead to subtle errors if the configurations are not set up correctly to include the trailing slash.

For example, if you call
=> https://github.com/genenetwork/genenetwork3/blob/ab354ac46bf7f84ed2504c6e0061ede808ab6ee1/gn3/authentication.py#L75-L100 this function
with the arguments
> get_highest_user_access_role("123", "456", gn_proxy_url="https://genenetwork.org/gn3-proxy")
the function does not actually access
> https://genenetwork.org/gn3-proxy/available?resource=123&user=456
as one might expect, instead, it actually accesses
> https://genenetwork.org/available?resource=123&user=456

If you compare the 2 urls, you see that the "gn3-proxy" part of the url is dropped. If you include the trailing slash as follows
> get_highest_user_access_role("123", "456", gn_proxy_url="https://genenetwork.org/gn3-proxy")
then it accesses
> https://genenetwork.org/gn3-proxy/available?resource=123&user=456
as is expected.

This failure mode is a little too subtle, and leads to time usage trying to troubleshoot the issue. We need a more robust way to join the URIs such that the system will always do the expected thing regardless of whether one remembers to add the trailing slash.