Some time ago (a little before 2018-03-10, based on what evidence I can find), I started work on a userland-level SPARC emulator: something that emulates the SPARC ISA, but only the userland portions of it. Anything that involves privileged mode in any way the emulator handles directly.
In the first three months of 2019, I made a bunch more progress on it, getting it to the point where it could almost do a build of the NetBSD/sparc 1.4T+mouseisms world. But it had a nasty tendency to occasionally crash with weird corruption, usually a return from signal delivery apparently restoring garbage.
The last few days, I've been staring at it, adding assorted debugging, trying to figure out what is the matter with it.
Today, I think I may have finally found it. I certainly found a bug, and the problematic behaviour it should have produced matches far too well with the mystery crashes for me to think it's coincidence.
(This paragraph won't make much sense unless you know the SPARC ISA at least a little.) The bug struck when I delivered a signal in between a delayed control transfer that annulled its delay-slot instruction and that annulled instruction. Because of the way I was tracking "next instruction annulled" state, when that happened, the first instruction of the signal handler (usually a save to create a stack frame) got annulled, leading to it being one window out of sync with what it should be. (It also led to the instruction that was supposed to get annulled when the signal handler returned not geting annulled, but I suspect that was less symptomatic.) There were multiple things I could have done to fix it; the way I picked was to suppress signal delivery when the "next instruction annulled" bit is set. I could probably have saved the annullation state somewhere, but it seemed simpler and more reliable to me to do it this way.
I am glad to get that bug squished. It's been annoying me for a long time, and I've had this uneasy feeling that it would all be obvious once I finally found it, and sure enough it looks as though I was right in that feeling. I've got a NetBSD build-of-the-world running now; if it completes as well as I hope I'll be rebuilding my own stuff and exercising it a bit more. Eventually I want to experiment with turning the malloc-family calls into traps to the emulator, as a preliminary attempt at something along the lines of valgrind's memcheck tool. (I looked at approaching it much the way valgrind does, by running modified machine code, but that looks difficult on the SPARC; dealing with a restore in the delay slot of a jmpl, something that is a part of most function epilogues, is a mess.)