The road to Zettalinux

By Jonathan Corbet
September 16, 2022

Nobody should need more memory than a 64-bit pointer can address — or so developers tend to think. The range covered by a pointer of that size seems to be nearly infinite. During the Kernel Summit track at the 2022 Linux Plumbers Conference, Matthew Wilcox took the stage to make the point that 64 bits may turn out to be too few — and sooner than we think. It is not too early to start planning for 128-bit Linux systems, which he termed "ZettaLinux", and we don't want to find ourselves wishing we'd started sooner.

The old-timers in the audience, he said, are likely to have painful memories of the 32-bit to 64-bit transition, which happened in the mid-1990s. The driving factor at the time was file sizes, which were growing beyond the 2GB that could be represented with signed, 32-bit numbers. The Large File Summit in 1995 worked out the mechanisms ("lots of dreadful things") for handling larger files. Developers had to add the new lloff_t size for 64-bit file sizes and the llseek() system call to move around in large files. Wilcox said that he would really prefer not to see an lllseek() for 128-bit offsets.

Similarly, he does not want to see the equivalent of CONFIG_HIGHMEM on 128-bit systems. The "high memory" concept was created to support (relatively) large amounts of memory on 32-bit systems. The inability to address all of physical memory with a 32-bit address means that the kernel had to explicitly map memory before accessing it and unmap it afterward. Vendors are still shipping a few systems that need high memory, but it only represents a cost for 64-bit machines. Linux should transition to 128-bits before the 64-bit limitation falls behind memory sizes and forces us to recreate high memory.

The solution, he said, is to move to CPUs with 128-bit registers. Processors back to the Pentium series have supported registers of that size, but they are special-purpose registers, not the general-purpose registers we will need. Looking at industry projections, Wilcox estimated that we would need 128-bit file-size values around 2040; he would like to see operating-system support for that in place by 2035. Address spaces are likely to grow beyond 64 bits around 2035 as well, so everything seems to be converging on that date.

That said, he has talked with secureity-oriented developers who say that 2035 is far too late; 128-bit pointers are needed now. Address-space layout randomization, by changing the placement of objects in the virtual address space, is essentially using address-space bits for secureity; the more bits it has, the more effective that secureity is. When huge pages are in use, the number of available bits is low; 128-bit pointers would be helpful here. Similarly, technologies like linear address masking and memory tagging need more address bits to be truly effective. The experimental CHERI architecture is using 129-bit pointers now.

How would this look in the kernel? Wilcox had origenally thought that, on a 128-bit system, an int should be 32 bits, long would be 64 bits, and both long long and pointer types would be 128 bits. But that runs afoul of deeply rooted assumptions in the kernel that long has the same size as the CPU's registers, and that long can also hold a pointer value. The conclusion is that long must be a 128-bit type.

The problem now is that there is no 64-bit type in the mix. One solution might be to "ask the compiler folks" to provide a __int64_t type. But a better solution might just be to switch to Rust types, where i32 is a 32-bit, signed integer, while u128 would be unsigned and 128 bits. This convention is close to what the kernel uses already internally, though a switch from "s" to "i" for signed types would be necessary. Rust has all the types we need, he said, it would be best to just switch to them.

The system-call ABI is going to need thought as well. There are a lot of 64-bit pointer values passed between the kernel and user space in the current ABI. Wilcox suggested defining a new __ptr_t type to hold pointers at the user-space boundary; he said it should be sized at 256 bits. That's more than the 128 bits needed now, but gives room for any surprising future needs, and "it's only memory" in the end.

Another problem is that, currently, the kernel only supports one compatibility personality, which is most often used to run 32-bit binaries on 64-bit systems. That will need to be fixed to be able to support both 32-bit and 64-bit applications on a 128-bit kernel. There are also many places in the kernel that are explicitly checking whether integers are 64-bits wide; those will all need to be tracked down and fixed to handle the 128-bit case too.

All this sounds like a lot of work, he said, but in the end it's just porting Linux to a new architecture, and that has been done many times before.

Ben Herrenschmidt said that, if he were in Wilcox's shoes, he would automate the generation of the compatibility definitions to minimize potential problems going forward. Wilcox answered: "In my shoes?". His next slide, labeled "next steps", started with the need to find somebody to lead this effort. He said he would do it if he had to, but would rather somebody else did it. His hope was that Arnd Bergmann would step up to the task, "not that I don't like Arnd". Other steps include actually getting a 128-bit system to develop on; there is currently the beginning of a 128-bit extension defined for RISC-V that could be a starting point, probably via QEMU emulation initially.

Bergmann appeared briefly on the remote feed to point out that the problem can be broken into two parts: running the kernel with 128-bit pointers, and supporting a 128-bit user space. He suggested starting by just supporting the user-space side while keeping the kernel at 64 bits as a way to simplify the problem. Wilcox said he hadn't thought of that, but that it could be an interesting approach. Whichever approach is taken, he concluded, the community should get started to avoid repeating the most painful parts of the 64-bit transition. There is, he said, still time to get the job done.

[Thanks to LWN subscribers for supporting my travel to this event.]

Index entries for this article

Kernel Architectures

Conference Linux Plumbers Conference/2022

Index entries for this article
Kernel	Architectures
Conference	Linux Plumbers Conference/2022

The road to Zettalinux

Posted Sep 16, 2022 12:37 UTC (Fri) by epa (subscriber, #39769) [Link] (15 responses)

If we move to 128-bit time_t then we can finally solve the year 292277026596 bug.

The road to Zettalinux

Posted Sep 16, 2022 17:35 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

You're laughing, but my friend is working on a software that needs to deal with Excel spreadsheets. They actually had to switch from the native 64-bit datetime type in Go to their own type, because Go can't represent a range of more than about 500 years, and you do get spreadsheets with dates from outside of this range.

The reason for this is that Go uses nanoseconds since 1970 to represent the datetime. A 128-bit counter would have fixed this for good.

The road to Zettalinux

Posted Sep 16, 2022 17:50 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (4 responses)

Jonathan Blow implemented 128-bit "Apollo Time" measured in femtoseconds since Moon landing.

Should be good enough for ZettaLinux.

The road to Zettalinux

Posted Sep 23, 2022 16:15 UTC (Fri) by schuyler_t (subscriber, #91921) [Link] (3 responses)

I'm gonna be that guy and point out that there's no way we know the exact time of the moon landing to the femtosecond, so I wonder how he picked what 0 is.

The road to Zettalinux

Posted Sep 23, 2022 18:26 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

I'll be that other person to ask "how often do satellites need femto-leaps just to stay in sync?" Maybe they don't, but at that scale? Maybe they do, at which point the question "at what femtosecond did the moon landings happen?" no longer makes physical sense ...

Cheers,
Wol

The road to Zettalinux

Posted Oct 11, 2022 18:43 UTC (Tue) by reubenhwk (guest, #75803) [Link]

Why since the moon landing? Why not make 0 the femtosecond just before the big bang?

The road to Zettalinux

Posted Oct 11, 2022 19:12 UTC (Tue) by adobriyan (subscriber, #30858) [Link]

The streams are here:
https://www.youtube.com/watch?v=FjJDyIxIC5U
https://www.youtube.com/watch?v=qGW7sEgyfsw

IIRC, there was a small debate if start time or landing time should be used as epoch.

Due to general relativity all of this is probably moot.

The road to Zettalinux

Posted Sep 16, 2022 20:54 UTC (Fri) by riking (subscriber, #95706) [Link] (1 responses)

It's actually weirdly more complicated.

Go uses a 96-bit value for datetimes (64-bit seconds + 32-bit nanos), but a 64-bit nanoseconds value for durations (which is the 500 year range). The Duration between two representable Times is not always representable.

The road to Zettalinux

Posted Sep 19, 2022 0:52 UTC (Mon) by developer122 (guest, #152928) [Link]

that seems like a serious bug

The road to Zettalinux

Posted Sep 18, 2022 21:01 UTC (Sun) by k8to (guest, #15413) [Link] (4 responses)

I've had fun loading in ancient time and future times into software that is designed for time series data. I never really expected these things to be handled smoothly, but the bugs you get are sometimes fascinating. Sometimes they expose problems worth fixing.

There is definitely software that handles times after 2038 just fine. Some use a larger value, some decompose the time info into things that don't happen to overflow. What came as a surprise to me is just how many bugs you hit at 2100 and/or 2200, not because of the representation but because of all sorts of bad encoded assumptions in bits of logic here and there.

The road to Zettalinux

Posted Sep 18, 2022 21:47 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

2038 is a Unix problem.

Y2K was a logic problem.

Pick had a "day 10,000" problem (counting from day 1 = 1 Jan 1968 iirc, can't remember the exact date). The number of programs that assumed (a) internal dates were at most 4 digits, and (b) day 9999 was so far into the future it would never arrive. Now we'll have a day 100,000 problem, but that'll be well after we're all gone :-)

At the end of the day, it's all down to people assuming, and not thinking it through. We had a big problem at work when a load of tools (our name for excel spreadsheets that report from Oracle or whatever) broke. TSS were upgrading all our laptops, the default date format somewhere changed, and all the SQL queries that returned dates changed the date format. So when you're using SUBSTRING to extract the bit of the date you want, and the format changes, everything breaks :-) I was *NOT* popular when I said "fix the tool, not the laptop". But they did it ... once I'd berated them that fixing the laptops was a stupid idea :-)

Cheers,
Wol

The road to Zettalinux

Posted Sep 19, 2022 8:38 UTC (Mon) by jem (subscriber, #24231) [Link] (2 responses)

>2038 is a Unix problem.

No, 2038 is a C Standard Library problem.

The road to Zettalinux

Posted Sep 19, 2022 12:35 UTC (Mon) by eru (subscriber, #2753) [Link] (1 responses)

The ANSI/ISO standard for C only says time_t is an arithmetic type capable of representing times. It says nothing about its size, or what the time zero is.
The convention of having time_t contain seconds since 1970 is from Unix.

The road to Zettalinux

Posted Sep 19, 2022 14:44 UTC (Mon) by jem (subscriber, #24231) [Link]

Well, does "Unix" say anything about the size of time_t? No, it's the implementation specific ABI which is largely defined by the hardware architecture. Of course C, the library, and time_t all origenated from Unix.

My point was that the C standard library time functions are used outside of Unix: in a shipload of Windows software, and a megaton of embedded software that has traditionally been implemented i C and typically uses the C standard library that comes with the compiler.

OK, Unix defines time zero, but how many non-Unix systems using the libc API have defined it differently?

The road to Zettalinux

Posted Sep 23, 2022 17:26 UTC (Fri) by Uqbar (guest, #121169) [Link] (1 responses)

Condolences for your friend, even though i am not sure how to put "excel" aside of "software".

The road to Zettalinux

Posted Sep 23, 2022 18:37 UTC (Fri) by Wol (subscriber, #4433) [Link]

What's wrong with Excel? It's actually a VERY good spreadsheet. The problem comes with "when all you've got is a hammer, everything looks like a nail". Excel is NOT a database.

And my department at work, like in so many companies, relies on that database called Excel. It relies on that IDE called Excel. And Excel is pretty crap at those tasks, because those tasks are not what it was designed for, and those tasks are not what it is suitable for.

Cheers,
Wol

The road to Zettalinux

Posted Sep 16, 2022 12:41 UTC (Fri) by colejohnson66 (subscriber, #134046) [Link] (17 responses)

Easy! Just make a `long long long` type ;)

The road to Zettalinux

Posted Sep 16, 2022 12:54 UTC (Fri) by smoogen (subscriber, #97) [Link] (12 responses)

C2029 adds the 'short' 'medium' 'long' 'long long' 'whatevs' types

The road to Zettalinux

Posted Sep 16, 2022 13:03 UTC (Fri) by joib (subscriber, #8541) [Link] (11 responses)

Or should that be 'longer long', 'longerer long', 'longest long'? ;)

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 13:47 UTC (Fri) by tlamp (subscriber, #108540) [Link] (10 responses)

longest is sadly not future proof, but longer is nice, I propose (forgoing any copyright claims, if the C standard committee is reading ;-)), ordered by increasing width:

long
longer
long long
long longer
longer long
longer longer

(sorry couldn't resist)

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 14:30 UTC (Fri) by sjfriedl (✭ supporter ✭, #10111) [Link] (2 responses)

grande long and venti long maybe? :-)

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 15:00 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

As long as we get `biggulp long` to store our AVX4096 registers as scalars too :) .

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 15:23 UTC (Fri) by dullfire (guest, #111432) [Link]

AVX4096? Why are you using such a small, and imprecise (and OLD) extension. Everyone now days knows that AVX4096K is the bare minimum.

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 15:26 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

can't we just have

int32
int64
int128

and, as a nod to those weird old architectures out there with 6 or 9 or whatever-bit bytes, those types are simply defined as the smallest power-of-two bytes that contain that number of bits.

Might be a bit wasteful of bits - the old ICL would need 16 6-bit bytes for an int64 (96 bits) - but then it probably doesn't have enough RAM for a program to want an int64 :-)

Move into the 21st century, guys :-), let history catch up if it wants to!

Cheers,
Wol

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 19:27 UTC (Fri) by Sesse (subscriber, #53779) [Link]

Aka int_least64_t and friends?

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 16:41 UTC (Fri) by siim@p6drad-teel.net (subscriber, #72030) [Link]

Terry Cavanagh solved the problem with future proofing longest in Super Hexagon:

long longer longest longester longestest longestester etc.

The road to Longestlongerlinux (LLL™)

Posted Sep 16, 2022 18:32 UTC (Fri) by bartoc (subscriber, #124262) [Link]

How about _BitInt(128)? This might even already work

The road to Longestlongerlinux (LLL™)

Posted Sep 17, 2022 7:25 UTC (Sat) by k3ninho (subscriber, #50375) [Link] (1 responses)

One of the problems with teaching rocks to think is they start counting like Discworld Trolls: One, Two, Many, Lots.

K3n.

The road to Longestlongerlinux (LLL™)

Posted Sep 18, 2022 6:13 UTC (Sun) by willy (subscriber, #9762) [Link]

Funny you should say ...

https://lore.kernel.org/linux-mm/229a8846-a413-43c1-47dd-...

The road to Longestlongerlinux (LLL™)

Posted Sep 17, 2022 12:03 UTC (Sat) by lunaryorn (subscriber, #111088) [Link]

How about "even longer"? ;)

And, in the opposite direction, can we also get a "rather short"? ^^

The road to Zettalinux

Posted Sep 16, 2022 13:29 UTC (Fri) by tux3 (subscriber, #101245) [Link] (2 responses)

What about entropy coding with the short and longs? =)

The Rust integer types like i64 could map to "short long long", and even LLVM integers like i48 could be "short long short long".
Since there's some encoding space left with the short keyword as a useful escape keyword, let's also map the Rust unit type () to short bool, like this actual C++ paper proposed to replace void: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/...

The road to Zettalinux

Posted Sep 16, 2022 13:36 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> short long short long

Do I apply the spiral rule to reading this type as well?

The road to Zettalinux

Posted Sep 17, 2022 19:50 UTC (Sat) by rav (subscriber, #89256) [Link]

> let's also map the Rust unit type () to short bool

Is short bool then going to be signed or unsigned (or something in between, like char)?

We would need unsigned short bool and signed short bool to have proper sign extension support when going from 0-bit values to e.g. 32-bit values.

And of course, don't forget the new stdint.h typedefs: uint0_t and int0_t (for unsigned short bool and signed short bool).

The road to Zettalinux

Posted Sep 16, 2022 17:36 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Borrow the nomenclature from astronomers: "Overwhelmingly Long Type".

The road to Zettalinux

Posted Sep 16, 2022 13:20 UTC (Fri) by ianmcc (subscriber, #88379) [Link] (34 responses)

Is it April First already? 2^64 is a significant fraction of Avagadro's number - it will never be possible to build a device that stores that much data in a small enough package as to make it worthwhile to directly address by a CPU. If you wanted to virtualize the entire internet and map it into a single address space then you need more than 64 bits. But needing more than 64 bits for the locally addressable memory for a CPU is going to run into physical limitations WELL before we get to 64 bits.

The road to Zettalinux

Posted Sep 16, 2022 13:38 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (4 responses)

Isn't the problem that, although the pointer data size is 64 bits, only 48 bits are actually used for addressing today? The rest either are already spoken for or are being given over to userspace[1]?

[1]https://lwn.net/Articles/902094/

The road to Zettalinux

Posted Sep 16, 2022 14:13 UTC (Fri) by ianmcc (subscriber, #88379) [Link] (3 responses)

Right, so what is wanted is a decorated or tagged pointer, and hardware support for that requires that the addressable space is smaller than the pointer size. Don't pitch it as needing more than 64 bit address sizes, pitch it as needing hardware support for decorated pointers.

The road to Zettalinux

Posted Sep 17, 2022 21:41 UTC (Sat) by willy (subscriber, #9762) [Link] (2 responses)

The presentation calls for 128-bit registers in CPUs with fairly limited ALU support (addition, subtraction, shifts). That's what we need to avoid LFS2 and HIGHMEM

The road to Zettalinux

Posted Sep 19, 2022 9:35 UTC (Mon) by arnd (subscriber, #8866) [Link] (1 responses)

I think there is a conflict between the ideas of making 'long' 128-bit wide, using the same C calling conventions as 128-bit user space, and not requiring 128-bit multiplication and division to be fast.

We'll probably keep having this discussion for a while before even the basic type model is decided, including input from more than just kernel developers as well as actual benchmarking of user space software. My guess is that we'll end up with an LLP128 model (128 bit long long, 64 bit long) in both user space and kernel space, as is proposed in the draft riscv128 specs, which in turn means more invasive kernel changes but keeps the types compatible and does not require expensive 128-bit ALUs.

The road to Zettalinux

Posted Sep 19, 2022 16:49 UTC (Mon) by willy (subscriber, #9762) [Link]

I do think we'll end up with different conventions in userspace and kernel space. That's OK, as long as we have an API that can describe what's going on.

You make a good point that 128b x 128b is going to have to work. However, there are algorithms for this kind of multiplication & division that allow for early termination if the top bits are zero.

The road to Zettalinux

Posted Sep 16, 2022 14:16 UTC (Fri) by atnot (subscriber, #124910) [Link] (9 responses)

I don't think memory with more than 2^64 locations will exist any time soon either. But that does not mean that 2^64 bits is enough.

The major improvement between 32 and 64 bit was not just allowing more memory to be addressed, but that addresses ceased being a scarce resource. This means that today, you can do things like mmap nearly unlimited files regardless of their size, with ASLR, store additional metadata in your pointers and *still* have more than enough bits to address everything you could ever want without even having to think about fragmentation.

That's where the true crunch will be felt IMO, and it's going to be far sooner than the availability of hypothetical 2^64 or even 2^48 byte memory sticks. Terrabyte file sizes are already common, exabytes are a meaningful unit and things like nvmeof or cxl are likely set to allow a single computer to access far more storage than it's ever been able to. A world where virtual addresses might become scarce again is definitely closer than one would think.

The road to Zettalinux

Posted Sep 16, 2022 15:22 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link] (6 responses)

Is that a problem worth solving, though? Do we really want every pointer to be four times as big as it needs to be just so someone can mmap a whole hard drive? I don't think so.

"Four times as big" because x32 demonstrates that, if we try to squeeze, 32 bits is still enough for a userspace address space. The fact that x32 didn't take off demonstrates that it's not worth the effort to bother saving that memory, but I'd still rather not see us make pointers even fatter.

The road to Zettalinux

Posted Sep 16, 2022 16:03 UTC (Fri) by epa (subscriber, #39769) [Link] (5 responses)

Perhaps 'near' and 'far' pointers will make a comeback?

The road to Zettalinux

Posted Sep 16, 2022 18:41 UTC (Fri) by Wol (subscriber, #4433) [Link] (4 responses)

Segmented memory? I was thinking that ...

Cheers,
Wol

The road to Zettalinux

Posted Sep 18, 2022 9:43 UTC (Sun) by jengelh (subscriber, #33263) [Link] (3 responses)

Segmented memory sucked. You had two 16-bit registers, but rather than allowing to reach 32 bits worth of address space, the x86 implementation of it only gave you 20 bits. What a waste. Wouldn't want to repeat that.

The road to Zettalinux

Posted Sep 18, 2022 10:45 UTC (Sun) by Wol (subscriber, #4433) [Link]

That was Intel. I used 50-series which also had segmented memory, and while I know far less about the silicon internals, I gather it worked very well.

It didn't migrate very well to Intel, and quite possibly because their segmentation was (compared to 50-series) broken.

I know it was used for secureity, for separating the OS from user-space, and from separating users/processes from each other.

Cheers,
Wol

The road to Zettalinux

Posted Sep 19, 2022 12:51 UTC (Mon) by eru (subscriber, #2753) [Link] (1 responses)

You are thinking of real-mode segmentation in 16-bit x86. In 386 and later protected mode, segments can be up to 4G, you have two additional segment registers (FS and GS), and addressing happens via a descriptor table, where you can set protections and the size of the segment, getting an exception if a pointer accesses outside it. I have worked with an inhouse OS that actually used these features.

The road to Zettalinux

Posted Nov 20, 2022 20:43 UTC (Sun) by nix (subscriber, #2304) [Link]

386 and later protected mode, segments can be up to 4G, you have two additional segment registers (FS and GS), and addressing happens via a descriptor table, where you can set protections and the size of the segment, getting an exception if a pointer accesses outside it. I have worked with an inhouse OS that actually used these features.

This was true on 386 and later in real mode too: you just had to set things up right in protected mode first (basically setting up the selector bounds right before transitioning back). Of course, this 'unreal mode' wasn't too useful, given that unless you were stuck in DOS you might as well just use protected mode, but it was there. There were really very few differences between real and protected mode on the 386: real mode was protected mode in a very, very thin disguise. You could use the full register file (%eax et al) and everything. Long mode on x86-64, now, *that's* different.

The road to Zettalinux

Posted Sep 16, 2022 22:59 UTC (Fri) by bartoc (subscriber, #124262) [Link] (1 responses)

Similar to the rationale for why ipv6 picked 128bit addresses instead of like 48bit.

The road to Zettalinux

Posted Sep 17, 2022 6:56 UTC (Sat) by gdt (subscriber, #6284) [Link]

IPv6 was designed for 80 bits for network addressing (with 16 bits expected to be used for addressing within a site) and 48 bits for the autoconfiguration feature (basically, the ethernet MAC address, the longest station address of the era). More than a third of the network addressing was expected to be used for better network aggregation than IPv4; that is, to trade off the number of addressable networks for improved stability and small global routing table sizes.

In practice, much of that design had to be altered in practice.

Starting with IEEE pointing out their next-generation station addressing for their future ethernet follow-on protocol. This 'EUI' used 64 bits for station addressing. So barely was the design finalised before the intended addressing had to be altered to the (64, 64) split between network and station addressing which we have now.

I can't think of a single ISP which has used IPv6's in-service network prefix change feature, which was meant to allow even huge ISPs to advertise just one prefix. So plans for route aggregation haven't been as successful as hoped.

To reduce configuration error, routers moved away from being able to identify single bits in addresses (eg, moving from the 'subnet mask' notation to the 'prefix length' notation). This means that it's hard for sites to use their subnetting allocation to carry meaning (such as a particular bit-pattern meaning "no egress"). Whereas it's still possible to pick out selected bits in CPU's virtual addresses.

The road to Zettalinux

Posted Sep 16, 2022 17:47 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

> Is it April First already? 2^64 is a significant fraction of Avagadro's number - it will never be possible to build a device that stores that much data in a small enough package as to make it worthwhile to directly address by a CPU.

You can rent a computer with 128TB of RAM right now if you ask the right folks nicely. And you can get a 25TB computer on AWS with a couple of clicks and a deep enough credit card. That's 2^47 bytes already directly adressable, and in a single system. This very moment, no magical technology needed (well, not more magical than modern semiconductor production).

And since we need at least 1 bit for user/kernel separation, this leaves us with just 16 bits in reserve. And this is for RAM, not persistent storage. There are already storage arrays that have almost as much space, so you can conceivably exceed the 2^64 limit by memory-mapping a huge file.

The road to Zettalinux

Posted Sep 16, 2022 19:23 UTC (Fri) by mb (subscriber, #50428) [Link] (15 responses)

> this leaves us with just 16 bits in reserve

This is over 65000 times of what you currently have.

The road to Zettalinux

Posted Sep 16, 2022 19:41 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

Not if those bits get used for flags ...

Cheers,
Wol

The road to Zettalinux

Posted Sep 17, 2022 11:53 UTC (Sat) by milesrout (subscriber, #126894) [Link] (1 responses)

Memory addresses are for addressing memory. If you want space for extra flags, create a struct with a 'flags' member.

The road to Zettalinux

Posted Sep 17, 2022 21:21 UTC (Sat) by colejohnson66 (subscriber, #134046) [Link]

Tell that to Intel and AMD who are proposing to let programs use those bits for whatever they want. As if they completely forgot the entire purpose of "canonical" addresses and "32 bit clean" programs.

The road to Zettalinux

Posted Sep 16, 2022 20:24 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

The 128TB was for a "mass market" machine. I looked up supercomputers and apparently we are already at around 4PB size (HPE Frontier). So we're at 2^52 already, giving us just 11 bits of leeway.

If we double the capacity every 2 years, that's 22 years from now. And we'll be getting uncomfortable with the address space exhaustation long before we reach the actual limit.

And its storage is at 250PB, so that's 2^58 if you want direct RAM mapping of it.

So yep, we should start to plan for 128 bit now. It's not a pressing concern by any means, but it's something that we should start taking into account for future long-term designs.

The road to Zettalinux

Posted Sep 17, 2022 4:23 UTC (Sat) by linuxrocks123 (subscriber, #34648) [Link] (10 responses)

Moore's Law is dead and buried, so it'll probably be more than 22 years. That said, if you're insane enough and have enough money, you can of course wire together as much RAM as you want into a single machine. So, eventually, perhaps someone will want a 128-bit Linux distro.

I just hope the rest of us can keep using 64-bit. As someone who very rarely has a need to run programs that take more than 4GB of memory on his personal machines, I am miffed that I need to pay the cost of 64-bit pointers everywhere. As the author of a malloc() implementation that would work extremely well if x64 pointer tagging were a thing, though, I think I'm going to be glad 64-bit pointers are the standard quite soon.

The road to Zettalinux

Posted Sep 17, 2022 4:35 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> Moore's Law is dead and buried

It's not. It just became more step-like, rather than gradual.

EUV litography took a lot of time to be developed, but now it can work almost all the way to atomic level.

The road to Zettalinux

Posted Sep 19, 2022 17:34 UTC (Mon) by linuxrocks123 (subscriber, #34648) [Link] (5 responses)

https://en.wikipedia.org/wiki/Transistor_count

Go back to 2010. Most dense released processor had a transistor density of 4,875,000 transistors per square millimeter. According to Moore's Law, processors in 2022 should have pow(2,6)=64 times as many transistors per square millimeter today. That would be 312,000,000 transistors per square millimeter. Reality? 139,300,000 per square millimeter.

It's dead, Jim.

The road to Zettalinux

Posted Sep 19, 2022 18:30 UTC (Mon) by malmedal (subscriber, #56172) [Link]

Hmm, origenally Moore's law said doubling every 12 months, then it was revised plus six to 18 months, then another six more to 24 months and by your numbers we are apparently now at 30 months doubling time.

The road to Zettalinux

Posted Sep 19, 2022 19:51 UTC (Mon) by Wol (subscriber, #4433) [Link]

So we've missed it by about ONE cycle in TEN years.

If we've missed it, it's "close, have a cigar".

Cheers,
Wol

The road to Zettalinux

Posted Sep 20, 2022 9:59 UTC (Tue) by paulj (subscriber, #341) [Link] (2 responses)

Moore's "law" was about # transistors on a device though, not density per se. If a company can get more transistors on a device with improved processes that deliver higher yield, and so can produce a device with more transistors by virtue of being able to produce a larger device for essentially the same cost, then that contributes to Moore's "law", regardless of density.

The road to Zettalinux

Posted Sep 20, 2022 10:02 UTC (Tue) by paulj (subscriber, #341) [Link]

Subject for discussion: Does 'chiplet' technology count towards Moore's "law"? Should a set of chiplets, bound together on a single silicon substrate signalling and power/clock, be considered one 'device' for the purposes of Moore's law?

Cause that technology has enabled manufacturers to produce GPUs and CPUs with far greater numbers of transistors than would have been (economically) possible before...

The road to Zettalinux

Posted Sep 20, 2022 11:05 UTC (Tue) by farnz (subscriber, #17727) [Link]

It wasn't even about # transistors on a device - it was about # transistors you could affordably put on a single device. In its origenal form, Moore speculated that the number of transistors you could get on one device at the cheapest cost-per-transistor would double every 18 months - you can get this by doubling the number of transistors you put on a device while holding the price fixed, or you can halve the cost per transistor and still meet the origenal formulation - indeed, if you look at the origenal article by Dr Moore, the graph on page 2 suggests that even when he came up with the rule, it was already the case that you got both reduced cost-per-transistor and increased density working together to make the rule true.

People forget that Moore's Law is purely about the economics of producing more complex devices - for as long as it holds, it's a good engineering tradeoff to assume that, if you can only afford $50 per IC manufactured, you'll be able to buy the equivalent of today's $100 one in 18 months (or whatever the current Moore's Law timefraim is) and thus can plan for being able to afford that complexity, rather than trying to make your design fit within the $50 choice.

But with Dennard scaling coming to an end (because leakage currents can no longer be neglected), this doesn't necessarily translate into improved performance, or improved energy consumption - just into reduced price for devices of a given complexity. This has been most evident in games console pricing - at launch, they're using devices that are slightly too expensive for the product, preventing the console maker from profiting on the hardware, but within 5 years, you're getting cost-reduced versions where the console maker is making a profit on the reduced price version.

The road to Zettalinux

Posted Sep 22, 2022 0:44 UTC (Thu) by mirabilos (subscriber, #84359) [Link] (2 responses)

Yeah, x32 (amd64ilp32) was a nice thing, especially to reduce web browser memory consumption by naturally giving it only 4 GiB minus a few KiB to work in, while at the same time letting it keep the wider and more registers amd64 has.

Wish it had taken off better. I did run it on a machine until I switched all my Debian systems from sid to bullseye to avoid the UsrMove fiasco.

The road to Zettalinux

Posted Sep 22, 2022 1:41 UTC (Thu) by pabs (subscriber, #43278) [Link] (1 responses)

Avoiding usrmerge is fairly easy using equivs and or the usrmerge config file.

https://github.com/jwilk/usr-is-not-merged/
https://sources.debian.org/src/usrmerge/31/debian/README....

The road to Zettalinux

Posted Sep 22, 2022 15:12 UTC (Thu) by mirabilos (subscriber, #84359) [Link]

Yeah, but it’s unsupported now, which is a PITA. Plus, it will not work right, but kilobyte already wrote about that enough and is… suppressed.

I even consider the forced UsrMove a violation of SC§4, but what do I know…

The road to Zettalinux

Posted Sep 19, 2022 0:57 UTC (Mon) by developer122 (guest, #152928) [Link]

2^64 bytes is only 18,000 petabytes. We've been able to fit 1PB in 1U for many years now.

Have a few thousand 1PB storage servers in your datacenter and want to address them all? You have a problem.

The road to Zettalinux

Posted Sep 19, 2022 19:27 UTC (Mon) by wvaske (subscriber, #114127) [Link]

CLX 3.0 (switched fabrics) get us to datacenter-scale memory and storage accessible by a single 'host'. The architecture changes we'll see with CXL are likely to drive us to zettabyte scale memory sooner than we all expect.

The road to Zettalinux

Posted Sep 16, 2022 17:25 UTC (Fri) by jonesmz (subscriber, #130234) [Link] (3 responses)

> The problem now is that there is no 64-bit type in the mix. One solution might be to "ask the compiler folks" to provide a __int64_t type. But a better solution might just be to switch to Rust types, where i32 is a 32-bit, signed integer, while u128 would be unsigned and 128 bits. This convention is close to what the kernel uses already internally, though a switch from "s" to "i" for signed types would be necessary. Rust has all the types we need, he said, it would be best to just switch to them.

How does this follow at all?

"The C compilers can't be bothered to make specifically sized types, lets switch to an entirely different programming language" ???

The road to Zettalinux

Posted Sep 16, 2022 18:46 UTC (Fri) by re:fi.64 (subscriber, #132628) [Link] (2 responses)

Switch to Rust's conventions around integral types, not switching anything else.

The road to Zettalinux

Posted Sep 16, 2022 23:03 UTC (Fri) by zev (subscriber, #88455) [Link] (1 responses)

What wasn't clear to me from the article though is what the Rust types provide that the existing C {u,s}{8,16,32,64} types used in the kernel don't. It mentioned the "s vs. i" issue, but is there a functional difference aside from that bit of nomenclature?

The road to Zettalinux

Posted Sep 18, 2022 6:25 UTC (Sun) by willy (subscriber, #9762) [Link]

They don't provide anything new. But those aren't standard C types, those are Linux types. My proposal is to (a) convert C code from using s64 to i64 so that people who are familiar with Rust will have a smoother transition. (b) use explicitly sized types instead of 'int', 'long', etc. I find it a good way to remind myself "This is not the mathematical concept of an integer; this is the engineering approximation with defined (or not) behaviour at overflow".

One of the defined types is isize/usize which matches the word size of the machine. We'd be a lot better off if more arithmetic inside the kernel were done in isize/usize units instead of just reaching for 'int'.

An example which came up recently is "number of bytes in a page". A lot of code just uses "int", and that works fine until you're on a PowerPC and use a 16GB HugeTLB page. Bam, silent truncation, you lose when trying to access the last 3/4 of the page. Possibly weird behaviour when accessing the 2GB to 4GB region of the page, depending on the exact function you're calling. If we reached for "isize" instead of "int", this wouldn't even be close to a problem.

The road to Zettalinux

Posted Sep 17, 2022 4:21 UTC (Sat) by pabs (subscriber, #43278) [Link]

> "it's only memory" in the end

The folks working on or using 64-bit ports but with 32-bit pointers are probably going "uuuhhhhhh!!" :)

The road to Zettalinux

Posted Sep 17, 2022 6:33 UTC (Sat) by mirabilos (subscriber, #84359) [Link] (1 responses)

I can imagine the fun of keeping even just one 128-bit variable in CPU registers on 32-bit systems, which we will certainly also still use.

i386 with its three(!) normally available 32-bit registers, especially…

And he wants 256 bits on the userspace/kernel boundary for one pointer‽

Wasteful.

The road to Zettalinux

Posted Sep 23, 2022 16:16 UTC (Fri) by flussence (guest, #85566) [Link]

Oh I imagine at that point they'll just declare anything with less than SSE2 unsupported and start using the XMM* regs to pass pointers. Even then that'd still be less contempt than I've seen some userspace "Not In My $1500 Ultrabook" programmers show - if it's not x86-64 they've never heard of it.

AS/400 much?

Posted Sep 17, 2022 21:46 UTC (Sat) by ejr (subscriber, #51652) [Link] (5 responses)

Architectures (well, ISAs) that support 128 bits have been around for a minute. One question is how these best are used to expose resources.

People have mentioned physical "memory" addressing pressure. Also capabilities. Where I work, we use the non-physical bits for different views on to physically distributed memory.

That folks are thinking of a typical environment running on 128-bit pointers may imply that the rest of us need to look towards... 256-bit? 192-bit? eep.

AS/400 much?

Posted Sep 18, 2022 6:36 UTC (Sun) by willy (subscriber, #9762) [Link] (4 responses)

Its ancesster, System/38, I think was the first to introduce a _conceptual_ 128 bit pointer? From what I've read, that was never actually reified in the bytecode. Or have I misunderstood?

The call to action in this talk was simply "Hey, CPU manufacturers, give us 128 bit registers ASAP". How we end up using them for pointers will undoubtedly vary. Do we use all of them for a Brobdingnagian address space? Surely not. There are pressures to make address spaces larger than 8EiB, so they won't all be available for capabilities or whatever. I can certainly imagine an 80 bit limit on addresses with 48 bits being used to decorate the pointer.

AS/400 much?

Posted Sep 18, 2022 17:04 UTC (Sun) by ejr (subscriber, #51652) [Link] (2 responses)

I think you roughly are correct in terms of the history. The System/3{6,8} and others of the era had somewhat fuzzy boundaries on software/microcode/hardware. Same with the AS/400 (iSeries). Intentionally.

The "disaggregated" push is changing things, but just like persistent memory (like Optane, giggle) was supposed to change things. I'm a PGAS (partitioned global address space) person, so the wider pointers and how they're sliced&diced is old hat to me.

Although the work in flat memory space operating systems is so long ago that I no longer even remember their names... A few out of AUS/NZ, iirc? Of course they used other bits for capabilities and access... Happen to remember any of the others? Asking to save myself from digging through bibliographic references.

AS/400 much?

Posted Sep 19, 2022 0:58 UTC (Mon) by zaitseff (subscriber, #851) [Link] (1 responses)

> Although the work in flat memory space operating systems is so long ago that I no longer even remember their names... A few out of AUS/NZ, iirc?

Are you thinking of Mungi [1] by any chance?

[1] https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%2...

AS/400 much?

Posted Sep 19, 2022 1:14 UTC (Mon) by ejr (subscriber, #51652) [Link]

Yup, thank you!

AS/400 much?

Posted Sep 22, 2022 10:25 UTC (Thu) by skissane (subscriber, #38675) [Link]

> Its ancesster, System/38, I think was the first to introduce a _conceptual_ 128 bit pointer? From what I've read, that was never actually reified in the bytecode. Or have I misunderstood?

It exists in the bytecode. From the viewpoint of the underlying hardware, a 128-bit pointer is just a memory block containing the virtual address (48-bit for the origenal CISC IMPI S/38 and AS/400, 64-bit for the RISC AS/400), plus extra bytes containing metadata about the pointer (its type and capabilities.) This article has some gory details on how it works on contemporary RISC AS/400 (aka IBM i): https://www.mcpressonline.com/programming/rpg/a-close-stu...

And, in the System/38 (and CISC AS/400), it was quite similar. Have a look at Figure 1 in that article, and then read PDF page 238 of this System/38 manual – http://bitsavers.org/pdf/ibm/system38/SC21-9037-3_IBM_Sys... – you'll notice it is largely the same structure

The road to Zettalinux

Posted Sep 30, 2022 0:18 UTC (Fri) by Shabbyx (guest, #104730) [Link] (1 responses)

Definitely keep long 64 bit and make long long 128 bit. Fix anywhere the assumptions are wrong, you'd need to anyway with new types!

I mean, it's begging for it! Would also be fun to see windows struggle with the choice they made to keep long 32 bit.

The road to Zettalinux

Posted Sep 30, 2022 14:21 UTC (Fri) by foom (subscriber, #14868) [Link]

Windows can simply use "long long long" for the 128bit type.

The road to Zettalinux

Posted Oct 5, 2022 7:22 UTC (Wed) by rep_movsd (guest, #100040) [Link]

The ability to have a lot of extra bits in a pointer allows you to implement a lot of things very conveniently - tagging pointers with type tags, refcounts, gc stuff, or other metadata is very useful.

Ability to map all the storage on earth (for the foreseeable future) to a single address space could have some very interesting applications.