|
|
Subscribe / Log in / New account

Fingerprinting systems with TCP source-port selection

By Jonathan Corbet
October 6, 2022
Back in May 2022, a mysterious set of patches titled insufficient TCP source port randomness crossed the mailing lists and was subsequently merged (at -rc6) into the 5.18 kernel. Little information was available at the time about why significant changes to the networking stack needed to be made so late in the development cycle. That situation has finally changed with the publication of this paper by Moshe Kol, Amit Klein, and Yossi Gilad. It seems that the way the kernel chose port numbers for outgoing network connections made it possible to uniquely fingerprint users.

Selecting a source port

A TCP connection can be described as a four-tuple consisting of the source and destination IP addresses and the source and destination port numbers. The addresses and destination port number will all be fixed by the specific connection needed, but the originating side can choose any number for the source port number. It has long been understood that there is value in making those numbers unpredictable; to do otherwise would make connections more vulnerable to hazards like reset attacks or even data injection. So the Linux kernel has, since this patch by Eric Dumazet was merged for 5.12, duly implemented source-port randomization as described in RFC 6056.

The randomization algorithm needs to be difficult to predict and also fast; the Linux implementation meets those goals. But it turns out that there are other reasons to choose source-port numbers correctly. To understand why, it's worth looking quickly at how the Linux implementation, prior to 5.18, worked.

In short, the kernel calculates two hashes, which the paper calls F and G, from the three given parts of the four-tuple (the addresses and the destination port number). To ensure that different systems produce different hashes for the same tuples, it also mixes in a 32-bit random key that is generated at boot time. F can be an arbitrarily large number, but G is constrained to the size of an array of counters. The port number is chosen with a calculation that looks approximately like:

    port = (counter_table[G] + F) % port_number_range;

The given counter is then incremented. Naturally, there are a number of complications, including checks for whether the port number is already in use.

A key aspect of this algorithm is the sizing of the counter table. As the 5.17 source (just prior to the fixes) notes, RFC 6056 suggests a ten-entry table, but Dumazet decided to go with 256 entries instead "to really give more isolation and privacy".

The attack

Kol and company were able to come up with an interesting attack on this algorithm. A hostile web page (otherwise known as almost any page on today's Internet) could load a JavaScript fragment that, through a series of iterations, creates a mapping between destination port numbers and the counter-table entries used to assign source-port numbers. It is, in other words. looking for hash-table collisions on the counter table. This table, remember, has only 256 entries, so hash collisions will not be rare or hard to find.

Specifically, the attack initiates a series of outgoing connections, all to the same remote address, but each to a different destination port. It then looks at the assigned source-port number for each connection attempt (note that the connection need not actually be established). Since any given counter-table entry is incremented after being used to generate a source-port number, two connection attempts that hit that counter-table entry will result in source-port numbers that differ by one — if the source and destination addresses are the same. So the attack looks for connection attempts that resulted in sequential source-port numbers and concludes that the destination-port numbers used in those attempts map to the same counter-table entry.

The optimal number of outgoing connections for one iteration of this attack is said to be one less than the size of the counter table, or 255. A single iteration of this algorithm will produce at most a small number of collisions, which do not tell an attacker much, but it can be run over and over again to come up with more of them. So the above process is repeated until collisions have been found for each entry in the counter table. Once that is done, a second phase uses a similar technique, but mixing connections to a loopback address with connections to the remote-server destination ports found in the first phase. The purpose here is to find which destination ports, when used with a loopback destination, map to the same table cell as one of those remote-server port pairs. This second phase generates pairs of destination port numbers that, when used with the loopback address, generate collisions in the counter table; these port-number pairs are independent of any remote address.

Each pair of colliding loopback port numbers, in effect, tells the attacker a little more about the secret key that the kernel generated at boot time. The key itself is never disclosed, but there is no need for that; a sufficient number of colliding port-number pairs is sufficient to uniquely identify the system involved. The key point is that these port-number pairs are a function of the secret key — which is different for every system — and can thus be used to create a unique device identifier.

It evidently takes about 40 connection attempts per counter-table entry to generate enough collisions, so about 10,000 attempts to identify a system. (The paper describes how to calculate "enough" but doesn't give a number). The time required to carry out this attack is about ten seconds, and the resources used are small enough that chances are good that it will go undetected. (Naturally, this discussion has passed over a lot of important details and is almost certainly wrong somewhere along the way; see the paper for the full story).

This unique identifier has some interesting characteristics. It is entirely independent of the software being run, so it will remain the same even if the user switches browsers. or just fetches a page with a tool like curl. It is also the same regardless of the site that is being connected to, so it works well for tracking users across multiple sites. Different containers running on the same host will all have the same identifier. Even systems with identical hardware and software configurations will produce different identifiers.

In other words, this ability to identify a system looks like a gift to the surveillance capitalists out there. It does have a few limitations, though. It does not work through networks like Tor, since connections are terminated within the Tor network and initiated anew at an exit node. Network-address translation (NAT) systems, which reassign port numbers, can also interfere with the identification. As the authors point out, though, increasing use of IPv6 is likely to reduce the use of NAT, making NAT interference less of a problem.

The identifier will also change when a system reboots. There is, however, a widespread class of devices — those running Android — that tend not to reboot frequently. The threat to Android seems to have been of special concern to the authors. It is not described as an immediate threat; Android devices are running, at the latest, 5.10 kernels, while the vulnerable port-selection code was added in 5.12. That said, the authors may have overlooked the fact that the "improved" port-selection code was also part of the 5.10.119 stable update, and may well be running on some Android systems.

The fix

The patch set addressing the problem, posted by Willy Tarreau, makes a number of changes. One of these is to change the hash calculation to mix in yet another number that changes every ten seconds (it is, in fact, jiffies/10*HZ). That will perturb the selection of which counter to use and, as a result, will disrupt any identification attempt that is underway at the time. Another change is to increment the chosen counter by a random number (between zero and seven) rather than by one as a way of adding more noise to the chosen port numbers.

Those changes might be sufficient to thwart the described attack, but only barely. The core of the response is, instead, to increase the size of the counter table to 65,536 entries. That bloats the table from 1KB in size to 256KB, but it also makes collisions much more uncommon and, thus, much harder to find, significantly increasing the time required to carry out a successful identification. The end result is a set of defenses that prevent the identification of systems via the source-port-number selection mechanism.

The kernel's policy regarding security problems is normally to require disclosure shortly after the report is made. A brief embargo can be allowed while a fix is developed, but that is the extent of it. In this case, though, the fixes were initially posted in April, with no description of the problem that was motivating them. And, when your editor inquired into the issue at the time, the answer was that the explanation would not be forthcoming for several months.

In this case, the lengthy period of secrecy seemingly had nothing to do with security. The fixes were public and were quickly incorporated into any kernel that is being maintained with an eye toward security problems. Instead, this delay was entirely created by the requirements of the journal publishing the article describing the vulnerability. That journal's demand for exclusivity, in a way that was convenient for its own publication schedule, prohibited the posting of an explanation of the vulnerability elsewhere.

As a result, few developers were able to review the patches with regard to whether they actually fixed the problem they were targeted at. The kernel community had to rely on its trust of the developers involved (Dumazet had a hand in their creation). That is not really how the process is supposed to work. The kernel community has little patience with distributors seeking lengthy embargoes; it's not clear that academic journals merit more deference.

Be that as it may, the problem appears to be well solved, and we now have an explanation of why those patches, first posted nearly six months ago, were needed. Whether it will ever be possible to eliminate all of the ways in which individual systems can be fingerprinted is an open question, but at least one readily available mechanism has been closed off.

Index entries for this article
KernelNetworking/Security
KernelSecurity/Vulnerabilities
SecurityLinux kernel/Networking


to post comments

Fingerprinting systems with TCP source-port selection

Posted Oct 6, 2022 20:16 UTC (Thu) by amarao (subscriber, #87073) [Link] (2 responses)

Why not to use PRNG for selecting a new port?

Fingerprinting systems with TCP source-port selection

Posted Oct 6, 2022 20:50 UTC (Thu) by mfuzzey (subscriber, #57966) [Link]

The RFC linked explains this (and gives the example of using a RNG).

The two desired properties are
1) Minimize the port reuse frequency
2) Be unpredicatble

The simple classic counter does 1 but not 2 whereas a RNG does 2 but not 1.
The algorithm used does both.

Fingerprinting systems with TCP source-port selection

Posted Oct 7, 2022 21:54 UTC (Fri) by flussence (guest, #85566) [Link]

In a sense, this is what IPv6 with privacy extensions enabled would accomplish: 64 extra bits of randomness, and as every connection uses a different IP the port number can also be fully randomised without risk of collisions.

We'd still need this algorithm though because IPv4 isn't going away any time soon.

Fingerprinting systems with TCP source-port selection

Posted Oct 6, 2022 22:39 UTC (Thu) by unixbhaskar (guest, #44758) [Link] (1 responses)

Well, this stands out ...

"n this case, the lengthy period of secrecy seemingly had nothing to do with security. The fixes were public and were quickly incorporated into any kernel that is being maintained with an eye toward security problems. Instead, this delay was entirely created by the requirements of the journal publishing the article describing the vulnerability. That journal's demand for exclusivity, in a way that was convenient for its own publication schedule, prohibited the posting of an explanation of the vulnerability elsewhere. "

Bad practices mar all the good work and importantly kill the enjoyment of solving "real problems"...

Fingerprinting systems with TCP source-port selection

Posted Oct 13, 2022 3:53 UTC (Thu) by gdt (subscriber, #6284) [Link]

Academics' continued employment depends upon publication in academic journals. When universities consider employment and promotion, academics are not assessed on the quality of their interaction with the Linux kernel community, on how much urgent hassle they cause for Linux distributors, or the risk their work creates for Linux users; they are assessed on the number and impact of their academic publications.

A university employer would see no problem with an academic preferring full publication of the fault in an academic journal over following some 'Linux community responsible disclosure' process which precludes such publication.

That in turn means that if the Linux kernel community wants pre-disclosure of faults, then they have to provide a process which does not create unenviable choices for academics.

You can argue that academic publishing is broken, and that academics should be evaluated using broader criteria. Neither of those arguments is new, and the Linux community isn't going to be the group which successfully corrects either of those issues.

Fingerprinting systems with TCP source-port selection

Posted Oct 7, 2022 10:49 UTC (Fri) by scientes (guest, #83068) [Link] (6 responses)

Why not use a balanced rb-tree instead of a hash table that is now so friggen big, because balanced trees are immune to collision attacks.

You could also just use a balanced tree WHEN there is a collision, which also breaks the O(n^2) pathological case of hash collisions.

Fingerprinting systems with TCP source-port selection

Posted Oct 7, 2022 10:50 UTC (Fri) by scientes (guest, #83068) [Link] (4 responses)

Nobody less than Donald E Knuth says that balanced trees and not hash tables must be used for security considerations in many places.

Fingerprinting systems with TCP source-port selection

Posted Oct 7, 2022 13:20 UTC (Fri) by wittenberg (subscriber, #4473) [Link] (3 responses)

Could you give a reference please? I'd like to see his reasoning.

Fingerprinting systems with TCP source-port selection

Posted Oct 7, 2022 13:59 UTC (Fri) by Wol (subscriber, #4433) [Link]

My immediate reaction is that a hash tree relies on *pseudo*randomness. As such, it is always vulnerable to being cracked.

If you use a drunken walk to walk a balanced tree, then you both avoid re-using values you've already used, and you end up in a genuinely random new place every time. And as the tree grows, the number of random numbers used to get a new value grows - after 1000 values a tree with 2 nodes per branch will require a ten-step drunken walk ... if your RNG truly is random then no way is an attacker going to predict where you'll end up.

Cheers,
Wol

Fingerprinting systems with TCP source-port selection

Posted Oct 7, 2022 15:23 UTC (Fri) by epa (subscriber, #39769) [Link] (1 responses)

I don't know what Knuth wrote, but it's undeniable that hashing depends to some extent on "luck". If you are very "unlucky" then you will get lots of hash collisions, and performance degrades. In other words, while the average case performance is fine, the worst case is poor. If your input data may be chosen by an attacker, you have to worry about the worst case performance, even though in a benign environment it is so unlikely you can dismiss it. Or you have to be certain your hash function is secure enough that an attacker won't find a way to make it degrade.

Fingerprinting systems with TCP source-port selection

Posted Oct 8, 2022 23:00 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

IMHO there are a few exceptions here:

1. The hashing algorithm is salted with a CSPRNG value generated at startup. But this requires you to know what you are doing, because there are a variety of side-channel attacks that might leak this value or allow an attacker to make educated guesses about it. For example, if a collision happens, a request might take slightly longer to process, and if an attacker can observe collisions, they may be able to try different keys and figure out the possible salt values. Or maybe not, as this is probably infeasible for very large keyspaces and salts.
2. A "perfect" hashing algorithm (i.e. an algorithm that never collides - only possible if there are at least as many hash buckets as valid keys, or if you can somehow prove that no two valid keys that collide will ever be used simultaneously, so you can't do this in the general case).
3. You have hard-realtime requirements, you absolutely need O(1) performance, and it is acceptable to drop requests that cause collisions. I'm not sure why you would want that, but it is theoretically a valid combination of requirements.

Fingerprinting systems with TCP source-port selection

Posted Oct 8, 2022 12:08 UTC (Sat) by wtarreau (subscriber, #51152) [Link]

This is completely unrelated. It's not a lookup hash table, it's a hashing function, which converts a 5 tuple to an index. Nothing more. The table that is mentioned is a mapping that breaks the relation between the input and output. It can be completely random.

Fingerprinting systems with TCP source-port selection

Posted Oct 7, 2022 15:38 UTC (Fri) by bostjan (guest, #118664) [Link] (1 responses)

How can curl be affected by this? Curl does not execute any code fetched from a remote location.

Curl

Posted Oct 7, 2022 15:55 UTC (Fri) by corbet (editor, #1) [Link]

You're correct, that was a poor example, apologies for the confusion.

Fingerprinting systems with TCP source-port selection

Posted Oct 11, 2022 20:46 UTC (Tue) by meyert (subscriber, #32097) [Link]

So this attack needs to run some code locally on my machine to work, right? Another reason to disable Javascript?

Fingerprinting systems with TCP source-port selection

Posted Oct 14, 2022 0:43 UTC (Fri) by developer122 (guest, #152928) [Link]

>It has long been understood that there is value in making those numbers unpredictable; to do otherwise would make connections more vulnerable to hazards like reset attacks or even data injection.

Can someone provide background on how correct port selection specificaly helps prevent these issues?

Fingerprinting systems with TCP source-port selection

Posted Oct 19, 2022 13:42 UTC (Wed) by scientes (guest, #83068) [Link]

As only TCP and UDP, and a few ICMP packets actually can be relied to make it through the internet, the 48-bit effective address of a UDP packet is actually a good reason to not implement ipv6. And there really is only one protocol that is firmly attached to a port number: http and TLS + http, with 80 and 443, respectively; with the eSNI protocol that the same creator of RFC 7250 wrote there is no loss in using a proxy, except that you have to trust TLS==http back-end proxies, of course. DNS is irrelevent because it is a tree, and not a web. And there are simple and standard ways to map dns to ip addresses AND port numbers. IPv6 is simply not needed, and a waste of time. Just like WiFi 6, and "5G" (a technology so stupid that it generally should be avoided even mentioning it).


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy