Kernel stack hygiene

In modern (as in, anything less than 40 to 50 years old) computer architectures, you need some form of memory organized as a stack: a scratch area where you can push data, and pop (retrieve or discard) that data later.

The main use of that stack is to store arguments to functions before calling them, the return address, and local variables of said function while it runs. This allows function calls to be nested, and every function to be able to have some local storage.

That is, until your so-called call stack grows too deep and you end up exhausting your stack limit, something known as a stack overflow, which gave its name to a popular community-driven help site for all kind of computing problems.

When you are running an application, say, the popular kitchen sink with a text editor module, it has its own stack, subject to the resource limits set in its parent shell with the ulimit command (ulimit -s for the stack usage), which wraps the setrlimit system call.

But in the kernel, things are a bit more complicated than this.

During the early start of the kernel, a temporary boot stack is used, either in the kernel image itself, or set up by the boot loader machinery. Then, once kernel initialization has reached a state where it can spawn processes, every single process gets a memory area known as the u-area. The letter u here stands for "user" or "userland" (as opposed to "kernel"), and it as abbreviated to that single letter, because in the early Unix versions, the pointer to that area was accessed as a global kernel variable simply called u.

The u-area serves as two things:

it contains various per-process data such as register contents, at its start.
the rest of the u-area serves as a stack, when the process enters the kernel.

What I mean with process enters the kernel is that, normally, the process will use its own stack; but if an interrupt occurs, or the process performs a system call (which is another form of interrupt), control gets transferred to the kernel, and the kernel needs to pick a stack before it can transfer control to its own C code to handle the interruption.

If the process was running in userland (i.e. this is the first interruption causing it to enter kernel mode), then the kernel will pick the top of the u-area as a temporary stack. If the process was already in kernel land (for example in the case of nested interrupts), the kernel will keep using the current stack.

Now, regardless of whether the interruption has been caused by a device needing attention, or by a system call request, the kernel will invoke a function, which may in turn invoke a few others, but - hopefully quickly - will return to the userland process, restoring its register state and stack pointer.

If the system call requires the userland process to wait until a condition is fulfilled, or if a clock interrupt tells the kernel it's time to schedule another process in order to give the impression of a multi-tasking system (a context switch), then the kernel will select another process to run, switch to its u-area and stack, and resume it, until the interrupted process gets a chance to run again at a future context switch.

Although the u-area name comes straight from Unix, that concept of process area with its own stack exists in all modern multitasking operating systems, and for example OS/2 and Windows NT keep such stacks as part of their Thread Control Blocks, which is the name they use for their own u-areas.

Now, there are also some variants around that scheme, for example in IRIX, some interrupt handlers can be registered with their owņ, dedicated, stacks, which, in addition to being able to use more stack space than the (fixed) u-area size, also sort-of makes them processes (because an u-area and thus a process can be made out of that stack), allowing them to sleep and to have their own processor time accounting.

But the general model stands: when in the kernel, you are running on a fixed size stack, and that size is not that large, because Unix started on horribly constrained systems (by today's criteria) and also because, well, keeping the stacks small did work.

Ok, now that I've set up the decor, time for the horror story.

When you're running in userland and hit the bottom of the stack, this will cause a page fault, but the kernel will be a nice lad and will allocate your process some more memory, so that you can dig your stack deeper. Until you hit your stack limit, in which case it will send you a deadly SIGSEGV signal and your process will dump core and be sorry for trespassing over its limits.

But in the kernel, there is no inner kernel to handle stack faults. Well, there could be, but so far, as best as I know, nobody ever wrote code to do so. So while the kernel code tries to be small and fast and as self-contained as possible, with a reasonable level of nested function calls, if one were to reach the bottom of the u-area, memory corruption would inevitably occur because the few non-stack bits of the u-area would get overwritten, and if one were to increase the stack usage further, it would expand beyond the u-area, and either hit unmapped memory (instant kernel panic) or, worse, start corrupting the nearby memory, which could be another process' u-area, and more weird behaviour would occur (but you can reasonably bet a kernel panic would happen eventually.) On some systems and/or platforms, there is sometimes an unmapped page intentionally put at the end of the u-area, in order to "catch" this situation before it causes more damage.

Because kernel developers (hopefully) know their kernel and the platforms it runs on well, the size of the u-area is chosen with care to be large enough to avoid stack overflows, and small enough to avoid using too much memory (since every single process running on your Unix system needs its own u-area, in addition to its own memory footprint.)

This care unfortunately doesn't mean stack overflows can't happen. And when they happen, they are often deterministic - once you figure out what combination of events and/or requests can cause them, you can reproduce them with ease.

Yet, fortunately, such stack overflows are rare. And they are difficult to recognize...

You can trust me on this - because it took me years to recognize one.

Among the various old platforms OpenBSD was running on in the first years of its existence, were the old, Motorola 68000-based, Apple Macintosh computers, running OpenBSD/mac68k inherited from NetBSD/mac68k.

These systems were quite unpopular among OpenBSD developers, because of the small screen resolution, one-button mouse, and slow 68020 and 68030 processors.

But then, the latest Macs, before the switch to PowerPC processors, were based upon 68040 processors, and their interrupt system had been revised in order to make it better suited to timesharing systems, such as Apple's A/UX Unix clone, or BSD.

However, one really annoying thing about these machines is that there was no way to run BSD on them without some help from MacOS. This situation would improve with the PowerPC-based models, but the m68k-based models had no documented (and no stable either) firmware interface, and the only way to run BSD was to load the kernel from a MacOS booter application.

(Of course, the Macintosh computers were never designed with the intent of being able to run something else than Mac OS on them. If you are interested in stories from the early Macintosh development, I strongly recommend reading Andrew Hertzfeld's Folklore stories, the best of which have been put in print by O'Reilly as ``Revolution in the Valley''. These OpenBSD stories are the small equivalent, at my level, of the Folklore stories.)

In fact, there were two MacOS programs written for NetBSD/mac68k: one installer program, which would create the BSD filesystems on disk and extract the operating system archive, and one booter program, which would load the kernel from either MacOS or the root BSD partition, and transfer control to it.

If you wanted to make changes to these programs, you would need access to a MacOS developer environment, with a properly licensed C compiler (probably either Think C, MPW or Metrowerks CodeWarrior), something very few BSD developers had (or could afford.)

One particular feature of OpenBSD, is that all supported platforms could be installed by running a dedicated kernel, bsd.rd, which, as the .rd extension suggests, would embed a ramdisk containing the installer script and various base utilities.

For those who are not familiar with OpenBSD, the bsd.rd kernel uses the embedded miniroot as its root filesystem, and the /etc/rc script run by init asks you whether you want to perform an installation, or an upgrade, or simply get a shell. If you ask for a shell, you'll be in a single-user environment with a basic set of binaries available, allowing you to configure the network, fix and mount your filesystems, and more. In fact, as part of its recovery abilities, one can boot on that kernel, then mount the regular disk, and complete a multiuser boot almost as usual, after issueing the proper chroot command.

This is really a swiss army knife, and has helped more than one OpenBSD user, which is why you'll find a bsd.rd file installed in your root filesystem by default.

Yet there was no such bsd.rd kernel for the mac68k port.

However, the state of the mac68k port wasn't really my problem, as I owned no such hardware. Yet it really was my problem, because every change affecting all m68k-based platforms I would work on would need to be tested on all these platforms, and I could not test on mac68k unless I had such a machine.

Of course, there were other developers willing to help, who could test my changes, but quite often, when you're doing kernel changes, it's way easier to test them yourself, so that when things break, you can debug them without having to email people commands to run in the kernel debugger, and wait for them to answer (assuming they did not reboot on a working kernel because you did not react quickly enough after they reported kernel panics.)

So I had no real choice but to get a Macintosh. I really didn't want to own one, but getting a mac68k was the wisest thing to do. So I spent a ridiculously small amount of money (I don't even remember how much, maybe about EUR 200) in march 2002, to buy an used Macintosh Quadra 650 - a small desktop system with onboard video and Ethernet, some NuBus slots, internal SCSI disk and floppy drive, and - if I remember correctly - 32MB of memory. Truly a high-end mac68k system (with a 33MHz 68040 processor), despite horrible disk performance due to the lack of a DMA controller.

That particular Macintosh has since then earned a well-deserved retirement in a recycling center, in 2016. It hurts a bit to write this, but I miss it. I cursed a lot while adding memory or replacing the internal disk, because the case is densely populated and with a lot of sharp metal edges, and (due to the lack of DMA) my other 68040 systems (running either OpenBSD/hp300 or OpenBSD/mvme68k) ran circles around it, yet it was a nice and small machine.

Fortunately that feeling does not last long - it's a Macintosh we're talking about, after all.

So I had bought that machine and painfully installed OpenBSD on it. And the net result was that I had yet another slow machine, and no way to easily upgrade it, short from recompiling everything, because there was no working bsd.rd yet and the Mac OS-based installer did not do upgrades (and was dog slow anyway.)

Therefore, one of the first things I did on that system, was to do the necessary plumbing in order to build a bsd.rd kernel. A few hours later, I had a binary; booting it, I got the usual device probes, then the familiar

(I)nstall, (U)pgrade, (S)hell?

prompt, to which I chose U for upgrade. A bit later, the upgrade procedure asked which disk was the root disk, I answered sd0, and got a kernel panic as a reward.

When you're a kernel developer, kernel panics are part of your daily diet, but some panics are harder to digest than others. This particular panic was definitely unedible, because the stack traceback didn't make any sense.

(Unfortunately, I can't show you that nonsensical traceback - I was not running that machine with a serial console, and never thought I would ever tell that story years later, otherwise I would have spent some time writing it down.)

Of course, you, the reader, having read the introduction to this story, already must have a good idea of what goes on. But I was in the play without having been given the script, and all I had was this awful traceback I couldn't make sense of, and a proof by example that bsd.rd did not work on mac68k.

Mind you, I wasn't the first developer to be awarded that unusable traceback. A few years earlier, Jason Downs (not the actor) had also tried to build a mac68k bsd.rd (for the same reasons as me, as he was the OpenBSD/hp300 portmaster at that time, a role that I had ended up take over after Jason had taken some distance from OpenBSD, working on losing weight and documenting this on livejournal.com back then - and I am glad to tell you that he succeeded going from being seriously overweighted to having a normal BMI in less than two years.)

Over a period of 10 months, from april 2002 (shortly after I got that Quadra 650), to february 2003, I gave bsd.rd the occasional try, only to always get that kernel panic which traceback made absolutely no sense, and started to seriously consider giving up on mac68k.

Yet at that time, developer Martin Reindl was spending a lot of time catching up with the NetBSD improvements to the mac68k port, giving the OpenBSD port better performance and stability.

On my side, it was obvious the bsd.rd behaviour was a software bug, albeit a difficult one to investigate, and should it get fixed, with the work Martin had been doing, we could get some momentum and get more developers interested in that not-sucking-as-much-as-it-used-to platform; also, on the second hand market, Macintoshes were much easier to find (and way cheaper) than hp300 or mvme68k systems.

Excerpt of the OpenBSD developer chatroom on february 20th, 2003:

<miod> mac68k bsd.rd does not work. period. it dies during mount or fsck,
       at your option.
[...]
*fries* during mount/fsck of vnd device or you create bsd.rd, but the boot'ing of
        it when you're using the install script, it fails to mount/fsck ?
*miod* you create bsd.rd, you boot it, choose (I)nstall or (U)pgrade, then it fsck
       or mounts filesystem, and tada! panic in vfs.
*fries* k. just checking.
[...]
<miod> really, if I get bsd.rd on mac68k working, i'll spend some serious
       time on this port.

Of course, unsurprisingly, no progress happened on the mac68k bsd.rd front, until december 2004.

For some reason, I tried to run bsd.rd again on december 1st, with the intent to investigate the kernel panic more thoroughly and try to make sense of it.

And then it dawned on me. It was a stack overflow.

<miod> wow. mac68k has so many disk problems because reading a disklabel eats
       a bit more than 16KB on the stack for a temporary array.
<miod> this oflows the kernel stack right on the vector table.
<miod> inconceivable.
<toby> ouch!
<miod> getting closer to working bsd.rd...
<martin> was just about to ask what this means to bsd.rd :)
<miod> better. i still oflow the stack for now but I die in a much more
       friendly way...

The best description of the problem and its root cause can be found in the email I sent Jason Downs later in the evening:

Date: Wed, 1 Dec 2004 22:29:27 +0000
From: Miod Vallat
To: Jason Downs
Subject: OpenBSD/mac68k bsd.rd solved

Hello,

  I don't know if you still care about this, but I figured you would be
interested anyway.

Do you remember you had mac68k bsd.rd dying very early? I had the same 
problem, did not manage to debug it, and eventually gave up.

Well, I have been looking back recently, and I finally found the
problem this evening!

I had noticed that bsd.rd would work fine if I asked the booter to ask
for root/swap devices, and selecting sd0. Also, when dying, the pc would
always be very low, and always the same value.

After enough thinking I came to the conclusion that this pc was part of
the vectors table, which had been overwritten at some point. So at the   
next clock interrupt, kaboom, illegal instruction and panic with no
useful traceback.

Then I eventually noticed that in mac68k locore, the vector table was
located just _below_ the kernel stack. So the odds were that the kernel
stack was overwritten. But by what?

Tinkering again from the bsd.rd shell, I quickly noticed that the usual
crash, which was after fscking sd0a, was easily reproducable with a   
simple "disklabel sd0" as well at the prompt. Which pointed the
disklabel code as the culprit...

And indeed, somewhere in disksubr.c, we would use a temporary array of
MacOS partition headers. A mere 32 entries, each one being 512 bytes.
Total, 16KB. Four pages. Exactly the size of the kernel stack...   

This explained everything! The kernel stack was trashed indeed, and this
would only occur when reading a disklabel for the first time after the
stack has been used. Since the disklabel of the boot disk would be run
using the boot temporary stack high in memory, this would never happen.
But read the disklabel of a new disk (i.e. not cached), and kaboom.
Either manually with disklabel, or fdisk accessing sd0a as instructed
to by the user.

And of course this was a timebomb, nothing would explode until it was
time to go back to cpu_switch()...

Right now I have a functional bsd.rd. Time to clean the code and put it
in...

Cheers,
Miod

A fix was commited the same evening to sys/arch/mac68k/mac68k/disksubr.c, revision 1.25, with the following log message:

Much, much, much less stack pressure when reading a disklabel.

This is a temporary workaround which might live longer than initially
expected.

The next day, I could make progress on bsd.rd on mac68k, and this was quite well-received among developers:

<martin> yes, gone are the times when a mac68k install took 4 days and
         nights
<claudio> wow. bsd.rd for mac68k? should i uncover the dust from mine?
<drahn> the install... ugh
<Nick> YEEEEEEEA---HAAAAAAA!!!
<henning> now it is only one day and night!
[...]
<miod> yes. but at least we eventually reached the point where people like
       me consider the machine manageable and will consider hacking on it. not
       having a bsd.rd is a major PITA
<martin> indeed
<mickey> mmmm pita

The remaining changes to get bsd.rd built as part of the regular release process were commited later during the night.

<miod> mac68k bsd.rd in (except for docs updates)

And this work made the OpenBSD 3.7 release.

This was my first serious exposure to a kernel stack overflow. I had eventually spotted one and fixed it, but there could be many other left.

This probably should have prevented me from sleeping at night, but, truth to be told, it didn't.

Yet, from time to time, we got reports of odd kernel failures or straight kernel panics on bugs@. Often, the stack traceback would go through some layers of device driver code, and then an ioctl routine.

What if we had some drivers using too much stack to process a legitimate ioctl request?

Could we estimate the stack usage of an ioctl request?

And if we couldn't, how could we defend against excessive stack usage?

Well, I don't pretend to have the answer to these questions. But it got me thinking.

In the middle of july 2006, I realized I could instrument the C compiler to warn when functions would use too much stack space, with the proper value of too much to be experimented with.

<miod> sigh... touch flags.h in gcc, and it recompiles everything
<miod> i am adding a warning to gcc, which you'll first love, and then
       loathe once I add it to the kernel makefiles.
<miod> -Wstack-usage-larger-than-%d

The same day, after midnight, this threat materialized as this message:

Date: Sun, 16 Jul 2006 00:56:17 +0000
From: Miod Vallat
To: private OpenBSD mailinglist
Subject: worst kernel stack offenders

You might remember that, some time ago, pascoe@ and art@ had a look at
the per-function stack usage in the kernel, and fixed them to use
dynamically allocated memory instead (or different logic needing fewer
stack memory).

I have been thinking recently on having these checks automatically done
by the compiler for us, i.e. having the compiler warn if the function it
compiles uses too much room on the stack. I have implemented this as a
proof of concept on alpha, and will (over the next few days) port it to
the other platforms and share the diff.

In the meantime, here are the results of compiling with a threshold of
1024 bytes, sorted by size. Feel free to pick an item from this list and
address it.

Miod

/usr/src/sys/dev/ic/ibm561.c: In function `ibm561_cninit':
/usr/src/sys/dev/ic/ibm561.c:193: warning: stack usage is 4768 bytes
  (no kidding. this routine has a >4KB struct on the stack)
/usr/src/sys/dev/usb/usb_subr.c: In function `usbd_probe_and_attach':
/usr/src/sys/dev/usb/usb_subr.c:1013: warning: stack usage is 2224 bytes
/usr/src/sys/dev/ic/bt485.c: In function `bt485_cninit':
/usr/src/sys/dev/ic/bt485.c:187: warning: stack usage is 1904 bytes
/usr/src/sys/dev/ic/bt463.c: In function `bt463_cninit': 
/usr/src/sys/dev/ic/bt463.c:248: warning: stack usage is 1744 bytes
/usr/src/sys/dev/ic/twe.c: In function `twe_attach':
/usr/src/sys/dev/ic/twe.c:396: warning: stack usage is 1744 bytes
/usr/src/sys/dev/isa/isa.c: In function `isascan':  
/usr/src/sys/dev/isa/isa.c:279: warning: stack usage is 1712 bytes
/usr/src/sys/dev/ic/twe.c: In function `twe_intr':
/usr/src/sys/dev/ic/twe.c:1051: warning: stack usage is 1648 bytes
/usr/src/sys/net/zlib.c: In function `huft_build':
/usr/src/sys/net/zlib.c:3909: warning: stack usage is 1600 bytes
/usr/src/sys/net/if_spppsubr.c: In function `sppp_params':
/usr/src/sys/net/if_spppsubr.c:4131: warning: stack usage is 1568 bytes
/usr/src/sys/net/pfkeyv2.c: In function `pfkeyv2_send':  
/usr/src/sys/net/pfkeyv2.c:1790: warning: stack usage is 1472 bytes
/usr/src/sys/net/pf_table.c: In function `pfr_ina_define':
/usr/src/sys/net/pf_table.c:1572: warning: stack usage is 1472 bytes
/usr/src/sys/net/pf_table.c: In function `pfr_add_tables':
/usr/src/sys/net/pf_table.c:1228: warning: stack usage is 1456 bytes
/usr/src/sys/net/pf_table.c: In function `pfr_set_tflags':
/usr/src/sys/net/pf_table.c:1439: warning: stack usage is 1456 bytes
/usr/src/sys/net/pf_table.c: In function `pfr_del_tables': 
/usr/src/sys/net/pf_table.c:1268: warning: stack usage is 1424 bytes
/usr/src/sys/net/pf_table.c: In function `pfr_clr_tstats':
/usr/src/sys/net/pf_table.c:1385: warning: stack usage is 1424 bytes
/usr/src/sys/arch/alpha/alpha/netbsd_machdep.c: In function `netbsd_sendsig':   
/usr/src/sys/arch/alpha/alpha/netbsd_machdep.c:211: warning: stack usage is 1392 bytes
/usr/src/sys/dev/ic/if_wi.c: In function `wi_ioctl':
/usr/src/sys/dev/ic/if_wi.c:2036: warning: stack usage is 1360 bytes
/usr/src/sys/dev/ccd.c: In function `ccdinit':
/usr/src/sys/dev/ccd.c:472: warning: stack usage is 1344 bytes
/usr/src/sys/net/zlib.c: In function `inflate_trees_fixed':
/usr/src/sys/net/zlib.c:4065: warning: stack usage is 1344 bytes
/usr/src/sys/dev/isa/wdc_isa.c: In function `wdc_isa_probe':
/usr/src/sys/dev/isa/wdc_isa.c:138: warning: stack usage is 1344 bytes
/usr/src/sys/arch/alpha/alpha/netbsd_machdep.c: In function `netbsd_sys___sigreturn14':
/usr/src/sys/arch/alpha/alpha/netbsd_machdep.c:273: warning: stack usage is 1344 bytes
/usr/src/sys/nfs/nfs_serv.c: In function `nfsrv_rename':
/usr/src/sys/nfs/nfs_serv.c:1873: warning: stack usage is 1312 bytes   
/usr/src/sys/xfs/xfs_syscalls-common.c: In function `lookup_node':
/usr/src/sys/xfs/xfs_syscalls-common.c:342: warning: stack usage is 1264 bytes
/usr/src/sys/uvm/uvm_swap.c: In function `sys_swapctl':
/usr/src/sys/uvm/uvm_swap.c:884: warning: stack usage is 1264 bytes
/usr/src/sys/dev/usb/ugen.c: In function `ugen_do_read':
/usr/src/sys/dev/usb/ugen.c:667: warning: stack usage is 1264 bytes   
/usr/src/sys/dev/isa/if_lc_isa.c: In function `lemac_isa_probe':
/usr/src/sys/dev/isa/if_lc_isa.c:198: warning: stack usage is 1200 bytes
/usr/src/sys/scsi/ses.c: In function `ses_make_sensors':
/usr/src/sys/scsi/ses.c:490: warning: stack usage is 1152 bytes
/usr/src/sys/dev/ic/aic7xxx.c: In function `ahc_print_register':
/usr/src/sys/dev/ic/aic7xxx.c:6582: warning: stack usage is 1136 bytes
/usr/src/sys/net/pf_table.c: In function `pfr_attach_table':
/usr/src/sys/net/pf_table.c:2076: warning: stack usage is 1136 bytes
/usr/src/sys/dev/usb/ugen.c: In function `ugen_do_write':
/usr/src/sys/dev/usb/ugen.c:766: warning: stack usage is 1136 bytes
/usr/src/sys/dev/usb/uhidev.c: In function `uhidev_attach':
/usr/src/sys/dev/usb/uhidev.c:284: warning: stack usage is 1136 bytes
/usr/src/sys/dev/ic/if_wi.c: In function `wi_set_nwkey': 
/usr/src/sys/dev/ic/if_wi.c:3017: warning: stack usage is 1120 bytes
/usr/src/sys/nfs/nfs_boot.c: In function `nfs_boot_getfh':
/usr/src/sys/nfs/nfs_boot.c:313: warning: stack usage is 1104 bytes
/usr/src/sys/arch/alpha/alpha/disksubr.c: In function `writedisklabel':
/usr/src/sys/arch/alpha/alpha/disksubr.c:580: warning: stack usage is 1104 bytes
/usr/src/sys/dev/usb/usb_subr.c: In function `usbd_print':
/usr/src/sys/dev/usb/usb_subr.c:1254: warning: stack usage is 1088 bytes
/usr/src/sys/dev/ic/if_wi.c: In function `wi_media_status':
/usr/src/sys/dev/ic/if_wi.c:2956: warning: stack usage is 1072 bytes
/usr/src/sys/kern/kern_exec.c: In function `sys_execve':
/usr/src/sys/kern/kern_exec.c:724: warning: stack usage is 1072 bytes
/usr/src/sys/dev/isa/fd.c: In function `fdioctl':
/usr/src/sys/dev/isa/fd.c:1056: warning: stack usage is 1072 bytes
/usr/src/sys/dev/ic/if_wi.c: In function `wi_scan_timeout':
/usr/src/sys/dev/ic/if_wi.c:2075: warning: stack usage is 1056 bytes
/usr/src/sys/nfs/nfs_serv.c: In function `nfsrv_link':
/usr/src/sys/nfs/nfs_serv.c:1969: warning: stack usage is 1024 bytes

(The changes done by Artur Grabowski and Christopher Pascoe I was referring to in that mail are a series of changes in august 2005 intended to reduce stack usage in the kernel, following various reports of instability doing some specific operations.)

Given the large number of functions being reported, the limit of 1KB I had chosen was too small to be useful, especially with many functions in the 1100-1300 range, and a larger threshold would be preferrable.

I shared the compiler diff the next day:

Date: Sun, 16 Jul 2006 21:17:18 +0000
From: Miod Vallat
To: private OpenBSD mailinglist
Subject: Re: worst kernel stack offenders, the diff

Here is the diff which adds the warning to gcc. I only implemented it on
the platform we currently run on, for the gcc version they are running
with - it will not be hard to extend this over time, and the warning
machinery is MI, so one can use the warning even if it is not really
supported.

The performance impact should be minimal if the warning is not enabled
(one comparison and one jump per function being compiled).

Note that the warning only checks against the (rounded) size of the
stack locals. Extra stack usage, for callee-saved registers, ABI
requirements, etc, are NOT taken into account, but I am not strongly
opposed to using the full computation, and would like to know your
opinions about this.

After running kernel compiles on all platforms with this, I think we can
settle to a 2KB limit for now (after a 3-4 weeks window with the gcc
changes in but no use of the warning, so that people can upgrade their
compiler).

There were (fortunately) very few > 2KB stack functions in the tree;
besides the ibm561.c and usb mentioned yesterday, for which fixes are on
their way, there is one last offender: opl_{yds,sb,ess}_match on i386
use 2720 bytes on the stack.

Miod

[...]

There was a legitimate concern that the compiler was not the right place to perform this check.

<dlg> could the stack usage check go into lint rather than gcc?

But apart from that comment, there was mostly silence, which convinced me to commit this check on the 19th.

<miod> noone commented on the gcc stack usage warning... i'm considering
       sneaking it in.
<miod> (ok, a few of you commented privately, you know who you are)
<deraadt> i liked it.

The commit of the compiler changes took place on the 20th:

Introduce a new compiler warning, -Wstack-larger-than-N, to report
functions which are too greedy in stack variables.

This is intended to be used for kernel compiles, where this warning will
be enabled for a reasonable size (after a few weeks grace period so that
people can upgrade their compiler).

Please note that this warning relies upon md code, and as such is only
available on platforms OpenBSD runs on; also, the stack size being warned
on is only the local variables size, regardless of the ABI stack usage
requirements and the callee-saved registers; which means a function may
be warning-clean yet need more stack space than meets the eye; the
actual size being checked on may change to include these extras in the
future.

One week later, this commit enabled the use of that new warning for kernel builds.

Compile all kernels with -Wstack-usage-larger-than-2047, now that all
offending code has been taken out and shot. ok deraadt@

Although I had considered making the check against the complete stack usage of a function (including saved registers, frame pointers, and anything else required by the ELF ABI on that particular platform), I ended up doing no such thing. I would have preferred more accurate sizes, but this would end up with developer doing a change and testing on platforms A, B and C, causing a build error later on platform D because of a minor variation in the stack usage caused by a hardware or ABI difference.
This would only cause developer frustration, and it was better to keep the code as it was. Of course, there would be situations where the threshould would nevertheless be hit on 64-bit platforms but not on 32-bit platform, due to the different size of pointer types, but this would be a less annoying problem as most, if not all, developers had access to 64-bit hardware (alpha, amd64, sgi or sparc64 at that time.)

I had not submitted that change to the gcc developers, because the compiler versions being used in OpenBSD were no longer the latest and greatest, and gcc developers were only interested into diffs against the latest version, which I could not easily test on all supported architectures.

But it turned out not to be a problem. Seong-Bae Park, a former Sun employee who had been contributing to gcc, had the same idea as me, and submitted a proposal for a -Wframe-larger-than warning on february 2008, roughly one year and a half later.

After some quick feedback and minor changes, his proposal was accepted and commited to the gcc repository; that warning became first available with gcc 4.4.0, released on april 2009. This option was also picked by Clang some time later.

After OpenBSD started to use Clang on some platforms, it made sense to allow "my" warning to be controlled with -Wframe-larger-than instead of -Wstack-larger-than, and eventually the kernel build machinery was changed to use the portable name in october 2014.

To this day, this warning remains seldom used in operating system kernels. OpenBSD uses it on all its kernels, of course, since this is where that warning originated in the first place.

I also see it used in the OpenZFS code, and for a few files in the Linux kernel, as well as Linux kernels for a few selected systems (those with CONFIG_FRAME_WARN present in their kernel configuration.)

None of the other *BSD projects appear to use it at the time of writing (2026.)

It is a bit disappointing that this warning, which is a cheap and fine addition to one's toolbox for building better code, remains so neglected. Of course, it will not find all possible cases of stack overflow - if you nest enough calls to functions using less than 1KB of stack, you'll exhaust it eventually anyway - but it helps preventing the introduction of changes which would make the stack usage grow too much.

And if that warning had existed when Jason Downs first tried to build a mac68k bsd.rd sometime in 1997 or 1998, our lives would have been much simpler... but I would not have this story to tell.