(Follow this link to go back to the main SGI page, and this link to go back to the previous part.)
As usual, Fogelström disappeared for a while and we thought again that the SGI port would never materialize unless someone else would take the lead (and likely port the existing NetBSD code).
But on july 30th, it looks like things would happen for real.
<pefo> sooon a snapshot for you guys with SGI O2s will be ready to play with.
R5K required for now though.
<miod> thanks pefo.
<miod> i still have to check the O2 here, but I think I have an R5k module on
one of the i2 or indy.
<miod> thought they might not be suit for o2, come to think of it...
<pefo> no, O2 and Indy modules are not compatible.
<miod> damn. then i hope it's an r5k.
The promised snapshot appeared two days later.
<pefo> ALLRIGHT!! Everyone with SGI O2s R5K, line up and dust them off. Snapshot
ready for ftp in 2 hrs!
<miod> pefo, I'll line up once I'm back home
<pefo> tsk, tsk... ;)
<miod> if you can pay the bills and the food I'll gladly stop working and spend
my time home (-:
[...]
<pefo> OK, the SGI O2 snap is up and ready for download. Have fun!
Date: Mon, 2 Aug 2004 15:57:51 +0200 From: Per Fogelström To: private OpenBSD mailing list Subject: First SGI O2 snap now available!!! OK, for all those who waited a long time, the moment has come!!! This snap runs on O2s with R5K and i think R5K2 cpus and at least 64Mb of sdram. You will need a NIC, preferably a fxp. Integrated ethernet support is on the way. Read the README file for more info. Get it at ftp.opsycon.se/pub/OpenBSD/3.5+/sgimips when it's fresh! Have fun and report back to me! Per
Theo de Raadt seized the opportunity to ask for hardware donations (it never hurts to try...).
Date: Wed, 04 Aug 2004 08:43:57 +0000 From: Theo de Raadt To: openbsd-misc Subject: SGI O2 A developer is about to import an SGI O2 codebase. Anyone want that? There's a catch. We need at least 3 machines in Calgary. There's a little known rule in OpenBSD -- the "to make an official snapshot" rule -- which says roughtly [sic]: in addition to the main developers of a new architecture, at least Theo and Peter must have machines. Otherwise all of us get scared of the possibility of a new architecture languishing and creating an unsightly pit of dead code in the tree. We could use more than 3, since developers visit here once in a while, and we like to send them on their way with full suitcases. Is there anyone out there that wants to take care of this? ie. Finding some machines, and getting them to Calgary. It's a mission. I'm serious. Sometimes we ask and nothing happens. It is kind of like I am asking our user community to invest some time in getting access to some damn cheap (just check Ebay) machines and simply post them to Calgary :) People seem to ask how they can help quite often; here's an example. Why add a new architecture? SGI's mips machines are dead, aren't they? Well our expierence [sic] has been that every architecture we add has helped us find bugs in shared code that affects other architectures. (As long as it receives developer [attention] and does not rot like NetBSD's architectures do). Some very scary and major bugs have been dredged out almost automatically. In some cases, the effort is not worthwhile. In this case, I judge it to be valuable. In particular, the SGI developer will probably peek out quite a few busdma bugs, which will affect driver support on any busdma architecture. ps. m88k has been the most worthwhile example, since some insane parts of that architecture permits Miod to find a MI bug weekly.
And now that there were some concrete bits, we could start addressing one of the most important problems in computing: naming things.
<deraadt> So, should it be called "sgimips" or "sgi". Comments?
<pval> will we have any other mips in foreseeable future?
<pval> guess it doesn't matter for the name actually
<deraadt> Well I doubt we'll be supporting the sgi68k or sgi88k or sgisparc
machines.
<miod> to be fair, I have been looking for an sgi68k for some years now, they
are very very very rare.
<deraadt> I dislike typing long names, so I think it should just be sgi.
<miod> well, it was supposed to be sgi initally.
<deraadt> sgi was very successful at buying them back to give to their
sledgehammer guy.
<deraadt> I met their California sledgehammer guy.
<miod> yup
<deraadt> I was introduced :)
<miod> did he hit you with his hammer?
<deraadt> Was told that they spent about 1 million dollars over 10 years keeping
them off ebay and out of the hands of other resellers.
<miod> heh.
<miod> am not surprised.
<millert> why bother?
<deraadt> To kill the reseller market.
<deraadt> SGI machines had low value on buyback.
<deraadt> Unlike Sun, who put high value on buyback.
<miod> sgi tried to be very vocal on the theme "2nd hand sgi are bought as
refurbished from sgi".
After one good night of thought...
<deraadt> so, sgi or sgimips? I think sgi.
<drahn> sgi sounds good to me.
<kettenis> sgi also sells ia64 systems isn't it?
<deraadt> Yes, but those a standard ia64 systems.
<deraadt> Nothing SGI about them.
<kettenis> ah, ok
<kettenis> Hmm, NetBSD calls their SGI port sgimips though
<deraadt> Yes, and they call their amd64 port x86_64
[...]
<mickey> it seems to be more of an issue of hpcmips vs sgimips rather than sgi
making any other than mips machines
<mickey> same as macppc vs mvmeppc
[...]
<deraadt> I like simpler.
[...]
<deraadt> sgi68k is not going to happen.
<drahn> right so sgi makes sense.
<mickey> on he other hand there is no catsarm
<deraadt> cats have legs, not arms, dummy
<mickey> paws you fluffy you
<miod> and tails.
[...]
<matthieu> but are sgimips and sgimips64 the same port or 2 different ones?
<miod> isn't sgimips 32bit only so far?
<deraadt> I think it is 64bit only.
<miod> NetBSD/sgimips is 32bit only as far as I understand.
[...]
<matthieu> so we want 2 ports sgi and sgi64
<miod> matthieu, likely.
<deraadt> We want to support the 32 bit machines? Or just say forget it?
<miod> i'd like to. but then I have too much time on my hands as everyone knows.
<deraadt> so maybe sgi64?
<miod> or sgi for 64 bits, and sgi32 later (-:
<deraadt> sure
<deraadt> if. when.
<miod> hell, you said you wanted to only have to type "sgi".
<deraadt> i prefer sgi :)
<todd> my vote is "sgi" == 64bits, all cpu's I own atm are supposedly 64bit
capable
<art> it's "sg<TAB>" anyway. So who cares?
(catsarm in the discussion above is a reference to the ARM-based CATS board, which was used to bring back ARM support in OpenBSD in order to have a solid fundation before starting the SHARP Zaurus port.)
Fogelström came with a nice summary the next day.
<pefo> hey, while i have a good night sleep you people can decide wether you
want to call it sgimips, sgi, sgi64, pamela, or whatever! ;)
Eventually there was a general agreement that ``sgi'' was the better name, and the source code was added to the OpenBSD source tree on august 6th.
Michael Shalayeff ported the NetBSD O2 on-board Ethernet driver a few days later, and a few other developers started working on code cleanup.
Date: Sat, 14 Aug 2004 15:40:01 +0200 From: Per Fogelström To: private OpenBSD mailing list Subject: New SGI snapshot available. New snap available in ~pefo/sgisnap-0814 This snap has mec ethernet driver and a lot of other fixes. When using mec for network be aware that there is a snag somewhere we are searching for which makes the driver hang the system. Although i have done complete make builds via nfs source using mec there is something hiding in there. fxp is reliable though. MACHINE is now sgi instead of sgimips, MACHINE_ARCH=mips64. A little confusing since code is still LP32. As a consequence of that cc and as does not agree on things and -mips1 or -mips2 has to be explicitly given as option. I will try to fix that in the next snap coming in a couple of days. Alternatively pick up the comp35 tar from ftp.opsycon.se and use binutils from there. No disk boot yet. Code is ready but not tested yet. Heading for LP64 now!!! Per
A few days later:
Date: Tue, 17 Aug 2004 07:56:16 +0200 From: Per Fogelström To: private OpenBSD mailing list Subject: SGI snap update The SGI snap in ~pefo on cvs is now updated. The toolchain should now be working (nm(1)). The only tar which is updated is the comp36.tgz. nm(1) doesn't work with binutils 2.14 and mips. It does not say 'T', 'U' etc on shared lib symbols. 'W' works though. If someone could take a look at this i would appreciate it since perl does no longer build because of that. The snap dir also contains a diff against the tree with the patches currently needed to do a make build. Many of these will be obsolete when the toolchain is fixed, among them the GOT separation and alignment. Basically all MI fixes will be gone and only some MD will be left. I will be on the mainland for a couple of day, back Thursday, so until then have fun. :) Per
Before performing the switch from 32 to 64 bit, there were a few less important things Fogelström wanted to address, which took the rest of august: a working standalone bootloader, and the switch from gcc 2.95 as the system compiler to gcc 3.3, which was a requirement for reliable enough 64-bit code generation.
<pefo> SGI: diskbooting now works. code is not yet committed, i have to run. a
new snap will be put up later today or tomorrow.
...
<pefo> ok, a new SGI snap is in ~pefo/sgisnap. the kernel does not yet
autodetect the boot device but the one in the next snap will. it's
building right now.
<pefo> don't forget to 'setenv OSLoader boot' otherwise it will try to start
sash.
<pefo> (for you who don't read the install doc ;) )
On august 31st, things were ready for the 64-bit port to start.
<pefo> ahh! sgi fully migrated to gcc3 now. <deraadt> oh really? in mips32 or mips64? <pefo> mips32. now going 64. <deraadt> neat.
Date: Tue, 31 Aug 2004 22:03:46 +0200 From: Per Fogelström To: private OpenBSD mailing list Subject: gcc3 based sgisnap available As usual in ~pefo at cvs. This is probably the last snap before going full 64 bit.
Two days later, a 64-bit kernel was working.
<pefo> Loading ELF64 file
<pefo> 0x0:0xffffffff, Zero 0x339bb0:0xffffffff, 0x347bd0:0xffffffff, start at 0x801001d0
<pefo> Found SGI-IP32, setting up.
<pefo> Initial setup done, switching console.
<pefo> -Copyright (c) 1982, 1986, 1989, 1991, 1993
<pefo> The Regents of the University of California. All rights reserved.
<pefo> Copyright (c) 1995-2004 OpenBSD. All rights reserved. http://www.OpenBSD.org
<pefo> OpenBSD 3.6 (GENERIC64) #24: Thu Sep 2 13:24:13 CEST 2004
<pefo> root@moosehead.opsycon.se:/usr/src/sys/arch/sgi/compile/GENERIC64
<pefo> real mem = 134217728
<pefo> rsvd mem = 7020544
<pefo> avail mem = 108924928
<pefo> using 1638 buffers containing 6709248 bytes of memory
<pefo> mainbus0 (root)
...
<pefo> TADA!!!
<otto> hip hip hurray!
<pefo> well this was the easy part, migrating the kernel to 64 bits. now comes
userland...
<pefo> i'm cheating a little though... not running on full address space yet.
<pefo> and don't tell theo, the same code is used to build a 64 or 32 bit kernel.
just feed the compiler -32 or -64 and everything is taken care of. ;)
<otto> /msg deraadt did you know pefo cheats? he uses the same code to build
64 and 32 bit kernel!
<otto> oops ;-)
Userland took four more days, with some setbacks.
<pefo> wow! just went multiuser in 64 bit mode on sgi. userland is static since
ld.so needs some fixing. but this is looking promising!
[...]
<pefo> ssh craps out on mips64
<pefo> RSA_public_decrypt failed: error:0407006A:rsa routines:RSA_padding_check_PKCS1_type_1:block type is not 01
<pefo> is there something in the libs that needs to be set to 64?
<drahn> my guess would be libssl/crypto/arch/mips64/opensslconf.h
<pefo> oh! thanks! it's a great timesaver when someone knows the answer or where
to look!
<grange> dale was there with arm ;-)
<pefo> :)
This 64-bit mips work also led to unexpected discoveries.
<pefo> heh! found a new lever on my chair! <pefo> this one adjusts the y-position of the seat. <pefo> and i've had it for almost 3 years! <miod> next week, you'll find the instructions manual!
The 64-bit adaptation work was commited on september 9th.
Kernel moves to 64 bit. A few more tweaks when binutils is updated.
And then we faced our first severe bug.
<pefo> oh crap!
<pefo> panic: pool_get(mbpl): free list modified: magic=e291a1a; page 0xffffffffc332f000; item addr 0xffffffffc332f880
[...]
<pefo> panic: pool_get(mbpl) happens everytime i try to ssh to the O2.
<miod> smells like alignment issue.
<miod> i.e. you allocate @0 but access @0+4 onwards...
<pefo> what is mbpl
<markus> mbuf pool
<pefo> ok. perhaps i should try using the old trusty fxp to see if it may be
an mec driver problem.
<pefo> when i think about it's very possible. that driver have never run in
64 bit mode before since netbsd have no mips64 yet.
[...]
<miod> do you still get the mbpl panic?
<pefo> haven't had time to check that any further. i switched to a fxp though
but that one crashes in the driver with a messed up mbuf chain. funny
thing is that dong nfs from the O2 works fine. but as soon as i try to ssh to
<pefo> the box it crashes.
<pefo> same sypthimh taht is with the fxp. either fxp or mec works fine nfs-client.
<pefo> sympthom that...
<pefo> never saw this problem with the 32bit kernel.
[...]
<miod> panic: pool_get(mbpl): free list modified: magic=56617018; page 0xffffffffc332f000; item addr 0xffffffffc332f580
<miod> still not triggered early by your diff Todd )-;
<millert> Oh well
<miod> still a nice idea.
<miod> it's from the MGET in m_prepend().
<millert> Yeah, I see
<pefo> miod, do you have a trace so you can see where pool_get is called ?
<miod> _pool_get+0x644 (1ffffce7,ffffffff803caab0,56617018,ffffffffc332f000) sp ffffffffc6c6b770 ra ffffffff801d3f6c, sz 64
<miod> m_prepend+0xb4 (1ffffce7,ffffffff803caab0,56617018,ffffffffc332f000) sp ffffffffc6c6b7b0 ra ffffffff80271c80, sz 48
<miod> udp_output+0x130 (1ffffce7,ffffffffc3293008,0,0) sp ffffffffc6c6b7e0 ra ffffffff80272608, sz 144
<miod> udp_usrreq+0x638 (1ffffce7,ffffffffc3293008,0,0) sp ffffffffc6c6b870 ra ffffffff801d933c, sz 64
<miod> sosend+0x62c (1ffffce7,0,0,ffffffffc332f480) sp ffffffffc6c6b8b0 ra ffffffff802b7d00, sz 128
<miod> nfs_send+0x88 (1ffffce7,0,0,ffffffffc332f480) sp ffffffffc6c6b930 ra ffffffff802b9430, sz 32
<miod> nfs_request+0x9e8 (ffffffffc3304de0,ffffffffc332f000,3c6c6bdd0,ffffffffc332f480) sp ffffffffc6c6b950 ra ffffffff802cdf10, sz 288
<miod> nfs_lookup+0x400 (c3304de0,ffffffffc332f000,3c6c6bdd0,ffffffffc332f480) sp ffffffffc6c6ba70 ra ffffffff801f4308, sz 464
<miod> VOP_LOOKUP+0x60 (ffffffffc3304de0,ffffffffc6c6be10,ffffffffc6c6be38,ffffffffc332f480) sp ffffffffc6c6bc40 ra ffffffff801e8c2c, sz 48
<miod> ddb> show pool mbpool
<miod> POOL mbpl: size 128, align 8, ioff 0, roflags 0x00000018
<miod> alloc 0xffffffff80434538
<miod> minitems 16, minpages 1, maxpages 8, npages 1
<miod> itemsperpage 31, nitems 20, nout 11, hardlimit 4294967295
<miod> nget 2638, nfail 0, nput 2627
<miod> npagealloc 1, npagefree 0, hiwat 1, nidle 0
<miod> currently entered from file /data/src/sys/kern/uipc_mbuf.c line 236
<miod> (sorry for flood)
<espie> flood expected for a pool overflow, you know.
<miod> go drown yourself.
<espie> ENOMEM: pool not big enough.
<miod> QUI EST GROS ?
<pefo> stfu, anything interesting scrolls off!
<pefo> ;)
<miod> echo -n "heh"
<pefo> you get this in nfs, eh?
<miod> ssh -1 machine, but the private key is on an nfs mounted share.
<miod> gonna reboot and try ssh -lroot...
<miod> but then nfs when logged on the console works.
<miod> (and this ssh works ~ 1 time out of 3)
<matthieu> I'm using nfs from the console too for my source tree.
<pefo> ah, ok. i'm strting to think it has to do with fragmentation. i get my
crash in mec_start when it figures that the packet can't be send as is
but must be revuilt.
<miod> pefo, but you had this with fxp as well, right?
<pefo> it crashes with the fxp, but it looks different. may be something else
although related since nfs works fine but ssh to box crashes.
<pefo> to bad the R5K doesn't have the watch register feature. could have nailed
this in less than an hour... :(
<miod> i thought you had an RM7k o2 too?
<pefo> no, r10k, and a r12k cpu module on its way.
<pefo> the r7k seems to be very rare...
<pefo> i have several other mips systems/boards with rm7k's though but they
don't run mips64 yet.
[...]
<pefo> R10K have the watch register. looks like i'm going to add support for it
a little earlier than i planned...
(``Qui est gros?'' above in all caps is an Obelix reference.)
It took several days and many people's brains to figure out; that was caused by a machine-dependent constant in the network stack, which ought to have been enlarged during the switch to 64 bit, but had been left unchanged, leading to network stack assumptions no longer being respected.
The machine-independent nature of the bug was confirmed by testing with an incorrect value on other platforms and experiencing similar network memory corruption.
This was fixed on the 17th:
Crank MSIZE and NMBCLUSTERS, per other 64bit arches.
Further investigation of the causes of that bug also pointed out an earlier change in the network stack had been subtly incorrect, and that change was also reverted.
Another good side-effect of that bug hunt was that Fogelström worked on R10000 processor family support earlier than initially intended.
<pefo> OpenBSD/sgi (moosehead12k.opsycon.se) (tty00)
<pefo> login:
<pefo> mickey! want a new kernel?
<miod> how come it sez (tty00) instead of (console)
<pefo> prolly something in conf.c? actually have no idea. my mind have been busy
with other things. :)
<miod> probably your /etc/ttys
<miod> console "/usr/libexec/getty std.9600" unknown off secure
<miod> tty00 "/usr/libexec/getty std.9600" unknown on secure
<miod> oh, i did not notice this earlier.
<pefo> go ahead and fix it if it's wrong.
<miod> no, actually it's ok until we get video console
<pefo> and that will be??? ;)
<miod> hopefully soon
<pefo> you have the stuff from glaurung's?
<miod> i have looked at it, yes
<miod> and then my hair turned white.
<pefo> miod, but you didn't lose your hair! it could have fallen off! ;)
<wvdputte> now we know why Miod wears a hat
<pefo> i'm running the 12K with dirty speculative still on in kernel mode, eg
like the 10K will do. building a kernel right now and so far it seems to
work OK.
<miod> actually, i should get a haircut soon.
<pefo> just powered up the Origin 200. pretty beefy machine. 225Mhz, 512Meg.
<miod> only 225 MHz?
<miod> that's a scam!
<pefo> for a R10K that is pretty good.
<miod> but my fastest r4.4k runs at 250MHz!
<miod> (ok, I'm sounding like an old record again here)
<pefo> you will be outrun anyway ;)
<pefo> building a kernel with the R12K was a little more than 3 times faster. i
had hoped for about 4, but anyway.
<miod> wait till you have smp code working! (-:
<pefo> oh yeah! and 1 128 cpu Origin 2000 cluster! Only $10K on ebay! ;)
(glaurung in the discussion above being Vivien Chappelier.)
(Also note the name of Fogelström's system - Moosehead was the SGI code name for the O2, and 12k obviously is the processor type, a MIPS R12000.)
The mips64 codebase would enter another turbulence zone, and this time I was the one to blame. The machine-dependent part of the virtual memory subsystem, known as the pmap module, had still some parts coming from the OpenBSD/arc code years ago, and were behind many changes, in particular the data structures handling the modified and referenced state of the virtual memory pages (which had to be maintained by software on MIPS) could be improved. While working on these changes, I tried to kill too many birds with a stone (my first mistake) but did not test well enough (my second mistake) and introduced several bugs which caused, among other things, random segmentation faults in userland binaries.
I should have reverted my changes and done them again in smaller steps, but I was so sure this would be fixed by minor changes (a ``one-liner'' or two) that I did not want to do it (my third mistake); it took pressure from several developers and a heated discussion with more curse words than should have been needed for me to revert the troublesome parts, and we had lost three days.
But this allowed a much more stable snapshot to be built and released on the next day.
Date: Fri, 24 Sep 2004 13:30:18 +0200 From: Per Fogelström To: private OpenBSD mailing list Subject: New sgi snap available. OK, after much bug digging a new snap is put up in ~pefo@cvs. The kernel now seems stable wrt random core dumps. Problems were found and fixed in pmap code and in ld.so. This snap still lacks sendmail and friends. gcc still barfs on it. binutils is a moving target and seems to have bugs which manifest themselfs as failed linking of certain programs. it's being looked at. however it means that a few programs fail to link or cores. Most of these can be linked static but the major mess is gcc. don't try to rebuild it unless you really need (wanna fix bugs?). in that case contact me and i will explain how to build it. however if something cores, try to build it -static. groff for example must be linked static. There are two extra files in the snap dir. One is the emulparam that is going into ld. You only beed this if you plan to rebuild binutils and especially ld. The other file contains the diffs i have wrt head in my build tree. mostly binutils but also a "gross" hack to ahc.c to achive full disk speed. A better fix for this is coming. Note that this diff is not MI safe so be careful. known problems beside those in the toolchain is that the mec ethernet chip sometimes get stuck with its interrupt asserted. a power cycle or a reboot from the maintenance console fixes it. ahc craps out now and then, it seems. i'm not sure if this may be related to the R10K speculative dirty problem. it would be nice if people could test on both r10k's _and_ other machines to see if the problem occurs over the entire line.
(The "R10k speculative dirty problem" is a reference to the speculative execution behaviour on this processor. Refer to the technical note I wrote earlier about it for details. In both NetBSD and OpenBSD, the cache invalidation discipline in the drivers, done as part of the bus_dma layer, turned out to be good enough to not suffer from speculative execution side effects. Linux, on the other hand, never was that lucky, and support for R10000 O2 has never been considered stable, to the point that you need to go out of your way in order to be able to build a kernel supporting that particular hardware configuration.)
Bugfixes kept coming, but we had to disable the stack-protector code on a few binaries (the libsmutil part of sendmail) as it would cause internal compiler errors.
Eventually Theo de Raadt was able to take over the snapshot builds in early november.
<deraadt> latest sgi snap is by me :) <miod> theo, you mean the latest sgi snap is unreliable (-: <deraadt> ? <miod> you said you built it yourself!
Later that month, the O2 I had been using (courtesy of Wim Vandeputte, who had bought that machine and lent it to me earlier that year) failed.
<miod> damn! looks like the o2 here died
<miod> amber light, no startup sound...
<pval> no cereal?
<pval> mine did that, but it wasn't dead - there's a jumper you should try out
<pval> it's pretty close to the cpu when you take the board out, towards the
edge, a single jumper used to reset everything to defaults
<deraadt> no, peter, that was because you did a setenv of a variable wrong
<deraadt> miod, there is a table that says what colours at boot mean what
<miod> i know
<deraadt> for amber, i have managed to let it sit off for an hour, and it worked
again
<pval> yeah, i'm getting old as i forgot this so quickly
<deraadt> kind of scary
[...]
<kettenis> well, my o2 arrived completely dead. After cleaning the motherboard
<-> chassis connector even the disk works now.
<miod> unfortunately it is clean. i had cleaned the box when it was DOA, but here
it just doesn't want to restart after being shut down one more time...
<deraadt> it's that 10mbit crap you are hooking it up to
<miod> no, it says "cpu board failure"
<miod> i'll let it rest the night.
<miod> wim?
<wvdputte> yo
<miod> do you remember who is the guy who lent the O2 which is at my place at the
moment?
<wvdputte> yes, that would be me
<miod> no, you told me you got it as a lent [sic] from someone else.
<wvdputte> no, I bought it last year
<miod> oh.
<miod> want to attend the funeral?
<wvdputte> you broke it?
<miod> it died on me, apparently.
<pefo> amber led?
<miod> yes. not blinking.
<wvdputte> Open it up, try to fix it. Otherwise, I'll take it back and send it to
my O2 doctor in .nl
<miod> according to http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=hdwr&db=bks&fname=/SGI_EndUser/books/O2_OG/sgi_html/ch04.html&srch=o@%20troubleshooting
it's the cpu board.
<miod> wim, been there, done that.
<miod> and it's not the first time this happens, but today nothing seems to bring
it back to life.
<miod> err, i said "cpu board failure" it's "system board failure" actually.
<wvdputte> RIP
Fortunately, I had connections at the local university, and I was able to get another O2 motherboard loaned to me one week later, thanks to François Delobel and David Delon.
| SGI model | common name | Linux | NetBSD | OpenBSD |
| IP12 | Indigo (R3000) | complete distribution no graphics |
||
| IP20 | Indigo (R4000) | complete distribution no graphics |
||
| IP22 | Indigo2 | complete distribution XL (newport) graphics only |
complete distribution no graphics |
|
| IP24 | Indy | complete distribution XL (newport) graphics only |
complete distribution XL (newport) graphics only, no X server |
|
| IP27 | Origin 200, Origin 2000 | complete distribution | ||
| IP28 | POWER Indigo2 R10000 | not-yet-integrated kernel patches otherwise same as IP22 |
||
| IP30 | Octane | not-yet-integrated kernel patches X server on Impact only |
||
| IP32 | O2 | not-yet-integrated kernel patches no R10000 support |
complete distribution no graphics |
complete distribution no graphics not public yet |
I spent some time trying to get the O2 frame buffer to work, with no success. At some point I vented my frustration.
Date: Fri, 18 Feb 2005 21:40:02 +0000 From: Miod Vallat To: private OpenBSD mailinglist Subject: O2 video Ok, this is a nightmare. It turns out the Linux driver has been written after noticing the O2 ``GBE'' hardware is close to the SGI ``DBE'' found in the expensive x86-based ``Visual Workstations'' they produced in '99 or so. These chips are supposed to be smart because they have no frame buffer memory, but instead do DMA blits from the main memory, laid out in ``tiles'', in order to allow the memory to be discontiguous in practice. Just like on Zaurus (-: (except the Z uses a single contiguous area) So the Linux guy tinkered a bit, got something close to working, tinkered more, and tada, it was deemed working. It is obvious, though, that he never looked at the contents of all the ``DBE'' registers first - because the GBE layout is slightly different. In particular, a few things in the Linux driver are clearly wrong, but apparently they are lucky enough to not suffer from putting apples in the pumpkins registers. Too bad I have not been as lucky, so either with ``more correct'' code or with the exact same sequence of operations as the Linux code, I lose - either an unstable image or a nice black screen. Or an interleaved madness which I can not make any use of )-: (not to mention spurious interrupts, but this part is solved now). I am trying to find more documentation about the GBE, and will probably disassemble some IRIX code to help... Maybe there are different revisions of this piece of hardware as well... Anyway, knowing that breakthroughs usually happen *after* I send mails about non-working code or problems I am stuck with, I wrote this mail only to relax my mind and shuffle my ideas, in the hope of getting the damn thing to work. If you have read till this sentence, you may discard this mail and resume your regular slack^Wwork. Thanks for reading! Miod PS: Several versions of the non-working code are available upon request if you want to play this game, too!
One month later, I had to return the O2 motherboard and was unable to do further tinkering. An Octane was lent to the OpenBSD project, and ended up at my place in may, but I had no time to tinker with it.
I relocated across the country in autumn, and thanks to the help of Matthieu Herrb, I was lent another O2 and could resume bug hunting on that platform.
Near the end of the year, I stumbled upon a funny bug introduced during the switch from 32 to 64 bits: in userland, there were implicit memory aliases every 2 GB. In other words, if you had a variable at address A, accessing memory at address (2GB + A) would not only not cause a segmentation fault, but would return the value of the variable. I wrote a simple program demonstrating this fact, and shared it with the appropriate kernel fix.
Date: Fri, 16 Dec 2005 07:37:59 +0000
From: Miod Vallat
To: Per Fogelstrom, private OpenBSD mailinglist
Subject: [mips64] userland space aliases
The current mips64 codebase has an oddity introduced when switching from
32 to 64 bit: in userland, all addresses between 0 and
7fff.ffff.ffff.ffff are aliased every 2 GB.
Let's consider the following test program:
#include <setjmp.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
jmp_buf cheat;
void
segv(int signo)
{
/*
* Since the SEGV we're going to trigger can't be recovered from,
* pretend it's safe to use non-signal-safe functions here.
*/
longjmp(cheat, 0x666);
}
void
test(char *address)
{
printf("accessing %p... ", address);
if (setjmp(cheat) != 0)
printf("segfaulted!\n");
else
printf("%d\n", *address);
}
int
main()
{
vaddr_t va;
char *c;
signal(SIGSEGV, segv);
/* pick a valid address on stack... */
c = (char *)&c;
va = (vaddr_t)c;
test(c);
/* try 1GB later */
c = (char*)((1ULL << 30) + va);
test(c);
/* now 2GB */
c = (char*)((2ULL << 30) + va);
test(c);
/* now 3GB */
c = (char*)((3ULL << 30) + va);
test(c);
/* now 4GB */
c = (char*)((4ULL << 30) + va);
test(c);
/* now 160GB */
c = (char*)((40ULL << 32) + va);
test(c);
return (0);
}
On mips64, here is what you will get:
$ ./obj/amazing
accessing 0x7ffe0720... 0
accessing 0xbffe0720... segfaulted!
accessing 0xfffe0720... 0
accessing 0x13ffe0720... segfaulted!
accessing 0x17ffe0720... 0
accessing 0x287ffe0720... 0
$
Yet the userland address space (so far) is supposed to be restricted to
2GB...
The reason behind this is that the XTLB refill handler has been cloned
from the 32bit TLB refill handler, but needs more bounds checking.
In the 32bit world, the current pmap scheme makes sure that all virtual
addresses (0 to ffff.ffff) are managed in the pmap's pm_segtab. But in
the 64bit world, there is a hole between the 2GB userland limit (at
0000.0000.8000.0000) and the kernel space (at 8000.0000.0000.0000),
which is not checked by the TLB handler. Since the code does a logical
and operation to narrow the pm_segtab access, this means the upper bits
of the logical address are ignored.
In practice, this means any access to a particular address would end up
using the mapping for this address modulo 2GB.
The diff below is a suggested fix to this - we simply add this bounds
check in the XTLB refill handler. Note that the TLB refill handler does
not need to be modified, as it will only get invoked for faults in the
32-bit address space, where we are always within our bounds.
The test program will thus behave as expected:
$ ./obj/amazing
accessing 0x7fff4af0... 0
accessing 0xbfff4af0... segfaulted!
accessing 0xffff4af0... segfaulted!
accessing 0x13fff4af0... segfaulted!
accessing 0x17fff4af0... segfaulted!
accessing 0x287fff4af0... segfaulted!
$
Comments?
Miod
[...]
The fix was commited shortly afterwards.
There was not much visible sgi-related activity in NetBSD in 2006.
During the second half of the year, Steve Rumble added support for some Fast Ethernet GIO expansion cards for Indigo, Indy and Indigo2, as well as for the LG (Indigo entry-level) frame buffer.
2006 was a quiet year for OpenBSD on the sgi front. I had noticed some odd things in the kernel and stumbled upon worse problems every time I tried to clean or fix them.
At the end of october, given the stability issues on the rise on the O2, Theo de Raadt considered pulling the plug.
<deraadt> ready to give up on sgi.
(Follow this link to go forward to the next part.)