OpenBSD on SGI, 4/6: Tinkering on big iron

(Follow this link to go back to the main SGI page, and this link to go back to the previous part.)

2007, NetBSD

There was not much sgi-related activity in NetBSD in 2007.

The only important accomplishment was the addition of a driver for the O₂ frame buffer in april, thanks to Jared McNeill's hard work.

Date: 04/12/2007 09:25:47
From: Jared D.McNeill
To: port-sgimips
Subject: Testers wanted: SGI O2 framebuffer console and keyboard controller in -current

Heyas folks --

I've added wscons support to the GENERIC32_IP3x kernel config in -
current. Feel free to bang away at it and let me know how it works
out for you :)

The framebuffer driver will configure your display using whatever
your firmware setup by default. It will also steal 4MB of physical
memory from you -- I'm looking into ways of reducing this. As for
XFree86, if you manage to get the wsfb driver built, the display
driver will support it. It does seem that bus_dmamem_mmap ignores any
hints passed to it, and the result in X is quite interesting :)

Cheers,
Jared

That work was completed by Michael Lorenz to provide a true colour X server in july.

Date: 07/25/2007 23:56:16
From: Michael Lorenz
To: port-sgimips
Subject: X on O2

Hello,

I just committed a few changes to crmfb and wsfb - now you can run
XFree86 with the wsfb driver in 24bit colour. Still no acceleration but
since the framebuffer is in main memory it's not horribly slow either,
in fact I've seen worse performance with hardware acceleration on some
machines...
There's one problem though - gtk2 apps won't work because the
underlying graphics library - Cairo - apparently doesn't support the
pixel format used by the O2's graphics hardware. And apparently nobody
expects to encounter something like that so Cairo passes a NULL pointer
to Xrender which doesn't check it either so there's your SEGFAULT.

KDE, gtk1, Motif etc. work fine though.

have fun
Michael

2007, OpenBSD

Noone could realize this at the time, but the events which would unfold into OpenBSD/sgi getting support for the Fuel started on march 12th.

<miod> friend of mine bought an SGI Fuel (r14000-based), asked me if there was
       a chance of BSD running on it...

The next month, Saâd Kadhi gave me his O₂. He told me he had never been able to get OpenBSD to run on it, and I first suspected an incompatibility in its PROM, which was a much more recent version than the O₂ we were used to run OpenBSD on. But once I could tinker on his machine, it turned out that he had set up his partitions incorrectly and ended up overwriting the boot loader code.

That machine was powered by a 300MHz RM5200 processor with 1MB of L2 cache, and was an appreciated performance improvement over the 180MHz R5000SC with only 512KB of L2 cache I had been using until then. That machine also came with 640MB of memory, but OpenBSD would only report 256MB. This was a known problem, as Mark Kettenis' O₂ had the same behaviour albeit having 448MB, but nobody had done any serious work to address this limitation.

With such a machine at my fingertips, I had no excuse not to give this task a try. But first, I needed to work on the system overall stability.

Thankfully, I identified and corrected an horrible bug in the way page table memory was managed, which could be blamed for a lot of the instability problems seen recently.

<deraadt> oh sgi is way more stable now?
<miod> yes
<deraadt> well we will see when i get home if it still crashes for me ;)
<miod> both jsg's r5k and the new rm5200 here are happy

A few days later, while working on supporting more than 256MB of memory, I realized the kernel code had yet another horrible bug, where the value of an important processor register would not keep the value carefully programmed upon kernel startup.

Date: Mon, 23 Apr 2007 20:48:39 +0000
From: Miod Vallat
To: Theo de Raadt, Mark Kettenis, Jonathan Gray, Per Fogelström
Subject: [mips] to psr or not to psr

The current code does not maintain a consistent value for the ``status
register'' accross the life of the kernel. Because of a couple issues,
we end up having the SR being zero (or, well, with all the nice bits
clear) most of the time.

This goes in the way of two things:
- r12k systems are supposed to run with the DSD bit set all the time.
- my work towards a real 64bit (well, 40) address space requires SR_KX
  to be set for the large addresses to be processed correctly (guess how
  I found out this problem).

The following diff tries to keep SR at a better value all the time:
- when saving FP context, ``or'' values into it, rather than ``set''
  them.
- when forking a new process, make sure it will restore the SR from its
  parent in proc_trampoline().
- initialize proc0 SR to what it was in the kernel after early
  initialization.

This diff should apply cleanly whether you run with some of my pmap
diffs or not.

This works for me (I am running a kernel with all direct mappings but
proc0 stack in XKPHYS at the moment, which means I can enable more than
512MB of memory - that's tomorrow's work).

Miod
[...]

With a few more days of work, and improving on a memory detection diff written by Mark Kettenis querying the memory controller instead of asking ARCBios for that information, I could apparently use the whole memory of the new O₂.

<miod> kettenis, your december 2005 diff was not in vain:
<miod> OpenBSD 4.1-current (GENERIC) #58: Wed Apr 25 20:19:57 GMT 2007
<miod>     miod@santoire.gentiane.org:/usr/src/sys/arch/sgi/compile/GENERIC
<miod> real mem = 671088640
<miod> rsvd mem = 7020544
<miod> avail mem = 587165696
<miod> using 5734 buffers containing 33554432 bytes of memory
<miod> mainbus0 (root)
<miod> cpu0 at mainbus0: PMC-Sierra RM52X0 CPU rev 10.0 300 MHz with RM52X0 FPC rev 10.0
<otto> it took you almost one and half year to test?
<Nick> three digit clock speed.  he overcompensated. :)
<drahn> is he cross compiling from the 68k box again?
<miod> no, it took me one and a half year to write the infrastructure for > 256MB to work.
[...]
<kettenis> wow, you have more memory than I in your sgi
<miod> actually i thought it had 768MB, to be honest.
<miod> i'll have to recheck
<miod> and matthieu's has 448 or 512
<miod> borrowing memory from his, i could reach the max 1GB
<wvdputte> heresy!

I shared my work two days later.

Date: Fri, 27 Apr 2007 18:36:05 +0000
From: Miod Vallat
To: private OpenBSD mailinglist
Subject: sgi > 256MB

The following diff lets your O2 use the whole memory you blessed it
with.

This is a small starting step, where the virtual memory layout is not
modified yet: kernel is limited to 1GB KVM, userland to 2GB address
space. This will improve over time.

Also, to keep the changes minimal, I did not replace all instances of
KSEG0 or KSEG1 operations with XKPHYS operations. These will be replaced
on a ``i need to make other changes to this file'' basis. Note that
there is nothing wrong with cache flushes using KSEG0 addresses, since
the cache aliasing mask is much smaller than the 256MB KSEG0 and KSEG1
let you access.

Special care is necessary in mem.c to allow /dev/kmem access to physical
memory with either KSEG[01] or XKPHYS.

And of course, it is necessary to set a few more flags in the status
register (which is why I had to fix its propagation accross userland a
few days ago).

The code checking the crime registers on O2 to find the memory above
256MB has been written by kettenis@ 16 months ago, and I only made minor
modifications to it.

Tested on a 640MB RM5200 and a 320MB R5000 machines.

Miod

PS: Do not attempt to recompile libkvm until you have installed new
includes.
[...]

This work ended up in the OpenBSD source tree on may 3rd.

In the meantime, I had also spent some time trying to understand why Matthieu Herrb's O₂ would randomly crash or hang, while another O₂ with a similar configuration would run reliably. Eventually, I realized that the particular R5000 processor on his machine was requiring a processor errata workaround, despite being supposedly safe from that particular errata.

Edge cases can trigger a TLB miss exception instead of an invalid TLB
exception on early R5000 revisions. Despite this bug being supposedly
fixed in R5000 revision 2 onwards, it nevertheless occurs quite frequently
on matthieu's revision 2.1 R5000.

Servicing the TLB miss exception would cause a duplicate TLB to be inserted,
which causes the processor operation to become unpredictable (but lethal to
the kernel, ten times out of nine).

More details about the problem can be found in:
  http://www.linux-mips.org/archives/linux-mips/2000-02/msg00040.html

We work around the issue by checking for an existing TLB entry, and handling
this as an invalid TLB exception (as it was intended to be), in this case.

Unfortunately this causes a measurable 1% slowdown on ``safe'' processors,
so we'll work on providing different tlb handler flavours in the near future
to recover from this.

The important part of the linux-mips message I was referring to is:

     Sometimes you get a utlbmiss exception when there is already matching
TLB entry.  If you then blindly drop in the TLB entry, you get a duplicate,
which leads to Bad Things (tm).  The workaround is to probe for a duplicate,
and skip the tlbwr if an entry already exists.  It should be enabled on any
real R5000.  

    This is from the R5000 Errata list of 30 October 1997:

----------------------------------------------------------------------
3.  An erroneous JTLB miss exception will be taken under
    these conditions. 

    a) An instruction which does not cause an exception or
       stall is 8 bytes away from the end of a page.
    b) A load or store instruction is the last instruction of that page.
    c) The load/store target address has a matching but invalid
       JTLB entry
    d) The next sequential page is not mapped in the JTLB

    In this situation, when the load/store instruction is executed,
    a JTLB invalid exception should be taken, instead a JTLB miss
    exception is incorrectly taken. If the exception handler
    does a random TLB write to resolve the exception, this will in 
    general insert a duplicate TLB entry for each erroneous exception.
    If the first instruction is a jump or branch, this will cause
    an infinite loop of JTLB miss exceptions to occur upon the return
    from the exception handler.  Otherwise, there will be only one
    erroneous exception, followed by a correct exception, leaving
    one duplicate entry in the TLB.

    A software fix is for the JTLB miss handler to detect this situation,
    by probing for a matching TLB entry (treating a hit as being this case),
    ignore the JTLB miss and treat the exception as an JTLB invalid exception.

    Errata 3 is fixed in Rev 2.0.
----------------------------------------------------------------------

      It is not clear to me that Errata 3 is fixed in all cases in Rev 2.*,
so IRIX has the workaround enabled for all R5000 revisions.

(emphasis mine.)

At this point, the only apparently remaining problem was that, on warm boot, the kernel would sometimes get stuck at the end of device detection, unable to run userland, and noone had found the reason why yet; it looked like an interrupt not being handled correctly, or not at all.

Yet, shortly afterwards, I started to experience strange, non-deterministic, failures. Although, in retrospect, the "large memory" change appears as the obvious suspect, these failures did not appear immediately afterwards, so I suspected other kernel changes first, and lost time until I tried to limit memory to 256MB.

<miod> [all caps expletive deleted]
<miod> sgi works again if you disable memory above 256MB.
<kettenis> something got broken?
<miod> i don't know.
<miod> there must be something left which uses the compat direct mapped segments
       and which i did not see
<miod> so memory above 512MB logical loses
<miod> might explain why the sgi with 440 MB was happier than the sgi with 640MB.
<kettenis> hmm, mine has <512MB too IIRC
<miod> i'm compiling a kernel from scratch to confirm it is ok and i'll commit
       the workaround

And a few hours later:

<miod> oh that's just plain gross. on sgi, dma from pci devices to memory above
       256MB gets endianness-translated
<grange> to get buggy drivers work?
<miod> no, because the pci bridge has two windows to map physical memory, one
       untranslated, one translated. and the currently dumb bus_dma
       implementation does not know this, and translates 1:1, but it turns out
       physical addresses for the memory above 256MB end up in the translated
       window
<miod> i have a simple fix for this, i just need to polish it.
<deraadt> endianness-translated??????
<miod> well every 32 bit chunk gets swapped, apparently.
<miod> scary, eh?
[...]
<miod> ok, who wants to read an evil sgi bus_dma diff?

A better explanation of the problem can be found in the mail I sent:

Date: Sun, 17 Jun 2007 19:50:25 +0000
From: Miod Vallat
To: Jonathan Gray, Mark Kettenis, Dale Rahn, Jason Wright, Theo de Raadt, Per Fogelstrom
Subject: (sgi) bus_dma diff, comments sought

The following diff updates bus_dma on sgi, so that it can work with the
memory over 256MB - the current scheme is not good enough.

Even if you don't do mips stuff, please read on while I explain what the
diff does and why it does it, because I am interested in your opinion,
as a bus_dma expert, on this diff.

On O2, the situation is as follows:
- cpu accesses the first 256MB of memory at address 0 (this is an
  ArcBIOS requirement), and the rest of the memory at 1GB + 256MB
  onwards. Thus, we have two contiguous zones.
- non-pci onboard devices access physical memory at 0x40000000 (1GB)
  onwards, as a contiguous area.
- pci devices access physical memory at 0 onwards, as a contiguous area,
  unless you want endianness translation of your data, in which case you
  need to access at 0x40000000 (1GB) onwards.

The existing bus_dma code only copes with an offset between cpu and pci
view of the memory. It was using zero. As long as you do not have more
than 256MB in your machine, everything works fine (we don't have any
non-pci device doing dma).

When I started to play with memory above 256MB, I have been pretty lucky
since no I/O ever crossed that boundary. But with the change in buffers
allocation, this is no longer the case, so large compilations would hit
the > 256MB zone, and read reversed data. Kaboom.
[...]

After some tweaks, that fix was commited on june 21st, and from then on support for more than 256MB of physical memory was really reliable...

...at least on some machines. For some reason, Matthieu Herrb's O₂ would still not even boot multiuser without hitting random process failures. Investigating and tinkering, I identified issues in the interrupt handling code, but could not yet devise a reliable way to address them, and got lost in my changes.

<miod> sgi doing its third make install in a row, i think I narrowed the issue.
<deraadt> always the optimist? :)
<miod> sgi fails on matthieu's secondary r5k, but then both Matthieu and I are
       suspecting the hardware - before I introduced the R5k tlb errata fix it would
       not even boot multiuser, if at all.
<miod> well, i am not sure i have a decent grasp of the issue, but the changes I
       have made are consistent with the failure symptoms.
<miod> it's yet another case of, when you see the workaround, it becomes obvious
       when you look at the symptoms. except i only had the symptoms.
<miod> $ ll sgi-intr-wip*
<miod> -rw-r--r--  1 miod  dmg  45923 Jun 26 18:44 sgi-intr-wip.vari
<miod> -rw-r--r--  1 miod  dmg  46286 Jun 27 20:07 sgi-intr-wip2.vari
<miod> -rw-r--r--  1 miod  dmg  48189 Jun 30 16:37 sgi-intr-wip3.vari
<miod> -rw-r--r--  1 miod  dmg  48118 Jul  1 14:01 sgi-intr-wip4.vari
<miod> -rw-r--r--  1 miod  dmg  49533 Jul  2 05:09 sgi-intr-wip5.vari
<miod> -rw-r--r--  1 miod  dmg  63038 Jul  6 15:06 sgi-intr-wip6.vari
<miod> -rw-r--r--  1 miod  dmg  62231 Jul  8 19:37 sgi-intr-wip7.vari
<miod> -rw-r--r--  1 miod  dmg  60186 Aug  8 16:58 sgi-intr-wip8.vari
<miod> one of these is the real fix
<miod> problem is, 5 or 6 of this diffs cause clock issues
<miod> and i do not remember which ones are safe
<miod> so it's post 4.2
<miod> 4.2 will only have a workaround.
<miod> an ugly one.
<miod> i think #5 or #6 is the best. #7 and #8 are totally impredictable but safe
       - except for time handling.
<miod> #8 can cause time to go backwards and it's not really something you want (-:
<deraadt> how about the first half of 3 and the second half of 6? :)

I kept tinkering with no success, and voiced my frustration about this.

Date: Thu, 16 Aug 2007 16:13:08 +0000
From: Miod Vallat
To: a few OpenBSD developers
Subject: Re: sgi stability diffs. good ones. really.

Sorry guys, don't lose time on this, it does not work.

The same kernel can make a build and a release and whatnot and be rock
solid for days, then you reboot your machine with the same kernel and it
won't last 10 minutes.

That's so depressing I am considering changing the hostname of my O2 to
``marvin''.

Miod

(The ``marvin'' name is, of course, a reference to Marvin, the Paranoid Android appearing in Douglas Adams' Hitchiker's Guide to the Galaxy.)

Due to this sorry state of affairs, OpenBSD/sgi was not part of the OpenBSD 4.2 release.

Once the release cycle was over, we took the opportunity to invite Joel Sing, who had been contributing good fixes to the O₂ support in the last few months, to join the project.

<deraadt> any new developers to setup?
<espie> ajacoutot should have mailed you about Andry Breuil (spelling ?)
<espie> and I think miod also backs this up.
<miod> I back Landry Breuil up
<deraadt> he's setup.  next?
<miod> i need to talk to Joel Sing next week
<miod> he has sent sgi diffs in the last few months and is helping testing stuff

By sheer luck, at the same time, Artur Grabowski was reworking the process switching context code in order to help future SMP improvements, and his change had the side-effect of putting sgi back on track.

[=Topic=] miod changed the topic to "plz test switchto.9 okthxbye"
<deraadt> switchto.9 is in snaps for the fast 5 arch's so far.
<miod> excellent.
<miod> it looks like it unfscks sgi as well.
<deraadt> WHAT!
<miod> what i wrote.

Joel Sing accepted the invitation and, within days, contributed a fix for the interrupt storm at warm boot.

Disable timer/compare interrupts on the macebus. This prevents interrupt
storms from occurring on IRQ 6. ok miod@

By the end of the year, he had also written a working frame buffer driver for the O₂, allowing these systems to be used with glass console on december 14th, with a few more fixes on the 31st.

fall 2007 status

SGI model	common name	Linux	NetBSD	OpenBSD
IP12	Indigo (R3000)		complete distribution
IP20	Indigo (R4000)		complete distribution
IP22	Indigo²	complete distribution XL (newport) graphics only	complete distribution XL (newport) graphics only, no X server
IP24	Indy	complete distribution XL (newport) graphics only	complete distribution XL (newport) graphics only, no X server
IP27	Origin 200, Origin 2000	complete distribution
IP28	POWER Indigo² R10000	not-yet-integrated kernel patches otherwise same as IP22
IP30	Octane	not-yet-integrated kernel patches X server on Impact only
IP32	O₂	complete distribution no R10000 support	complete distribution	complete distribution (but no stable release at the moment)

2008, OpenBSD

An SCSI controller driver tweak to let disks in the O₂ run faster than in synchronous, 10MHz mode (default wide SCSI speed, although the controller was capable of faster transfers) was ported from NetBSD in january.

Interestingly, the issue had been noticed in the very early days of the O₂ port but then completely forgotten - in fact I only rediscovered the following conversation, which took place in september 2004, while working on this narrative.

<pefo> how fast can an aic7880 go in sync mode?
<pefo> hmm 40MB it seems. 20Mhz. Negotiates at only 10Mhz.
<pefo> the O2 bios is not setting up the scsi at full speed. tweaking setup in
       ahc_pci.c allows it to run in sync 20Mhz and disk speed is much better.
<pefo> so using bios setting is not a good replacement for an eeprom here.
<pefo> 267934720 bytes transferred in 20.183 secs (13274662 bytes/sec)
<pefo> and it is not a very fast drive.
<pefo> 268435456 bytes transferred in 7.424 secs (36157628 bytes/sec)
<pefo> soo.. how do we feed ahc_pci.c what we want it to set up with?
<mickey> what is it ?
<drahn> hmm, wonder if this is why ahc is not too fast on macppc. (slightly older
        controller/drive -> ~11MB/s 'ahc0: Host Adapter Bios disabled.  Using
        default SCSI device parameters'
<pefo> default pars seems to allow sync negotiations at least. but the 7880 falls
       back to 10Mhz in my case since it does not see "precision resistor
       termination".

With the O₂ situation having been much improved, and a few occasional bugfixes as bugs were found, it was time to finally work on the Octane.

I initially thought that the Octane would be quite similar to the O₂, although with more processing capabilities and a 64-bit firmware. On that specific topic, its 64-bit flavour of ARCBios, now called ARCS, was indeed apparently providing the known ARCBios interface, with 64-bit values. However, the fact that ARCBios on the O₂ was restricting the hardware information it exposes to only 256MB of memory was a serious hint that pure ARCS information might not be enough to correctly recognize the onboard hardware.

(The ARCBios interface consists of a special area in memory at a fixed address, with a signature marker and a set of function pointers the operating system can invoke to get information about the system characteristics, and also perform some basic functions, such as console output, disk I/O, getting the current time, or restart or powerdown the machine.)

Another interesting thing, specific to the Octane, is that the physical memory is not configured from physical address zero onwards, but from address 0x2000.0000 onwards, i.e. 512MB. And due to the way the MIPS Memory Management Unit works, it is not possible to run a 32-bit kernel without having some parts of the physical memory addressable within the first 512MB; therefore there was no choice but load a 64-bit kernel. Since OpenBSD/sgi used 64-bit binaries, this would not be a problem, except for the bootloader code eventually, as our bootblocks were specifically built as a 32-bit binary.

I don't remember who, of Joel Sing and Jasper Lievisse Adriaanse, obtained early Octane and Origin 200 code from Per Fogelström (which, similarly to his initial OpenBSD/sgi port of 1997, had been lingering and kept private for years.)

But this got us (well, at least Joel and I) in the mood to spend some time working towards Octane support.

Date: Fri, 15 Feb 2008 18:44:25 +0000
From: Miod Vallat
To: Joel Sing
Cc: Jasper Lievisse Adriaanse
Subject: Re: o200 + octane code

[...]
I have started to look at pefo's bits, linux bits, and a few other
things.

First, pefo's tarball does not contain any M (all modified files have
been commited since), so what matters are the six new files:

        include/mnode.h
        localbus/xbowmux.[ch]
        pci/xiopci{bridge.c,brvar.h}
        sgi/sginode.c

xiopcibridge is not important at the moment, because it's a
search'n'replace of macepcibridge with the O2 specific parts removed
(and actually, due to a typo, it won't attach).

These files are a very humble start, they give us knowledge of some of
the xbow structures in memory, and how to parse them.

So I gathered some information... and I am sharing the important bits.

You might want to have a look at that document:
http://techpubs.sgi.com/library/manuals/3000/007-3410-001/pdf/007-3410-001.pdf
to get more familiar with the xbow.

Skipping not-so-important details, you can consider that the system is
built of boards connected to the xbow. A board can be either a cpu board
with up to two processors, or a xio i/o board.

Let's concentrate on a single-mobo origin or octane for now. The system
will sport a mobo, and at least one xio board: the pci bridge. To the
pci bridge is connected at least an IOC3 chip, which drives the serial
ports, the keyboard and mouse ports, the parallel port, and the onboard
ethernet.

Fortunately, the IOC3 has all of these devices memory-mapped, which
allows us to access the serial ports before the xio stuff initializes.

Well, that's good, because this is all we've got now, nothing else.
Pefo's code will be able to attach two serial ports and nothing else.

What needs to be done:
1. bring the new files in and get a kernel to be loaded by the machine
2. add quick ip30 bits to match the ip27 bits so that I can work on
Octane.
3. xbow enumeration code (find xbow ``widgets'', decide what to do with
them).
4. early pci work
5. early ioc3 work, so that com attaches to ioc, not to xbow
6. xbow interrupt code. there are 2x64 interrupt bits per cpu (two
levels of hardware interrupts with a 64 bit interrupt mask), so this
will be adapted from mace.
7. fix bugs until isp@pci attaches.
8. built-in ethernet.
9. snapshot.
10. smp (-:

I'll try to work on the first five points this weekend (while I'm in the
mood).

My initial idea of the device tree for these machines would be:

  xbow0 at mainbus0
  xio* at xbow0         # xio connections
  xiopcibr0 at xio?     # pci bridge
  pci* at xiopcibr?
  isp* at pci?          # on-boart scsi
  ioc* at pci?          # IOC3 superio chip
  com* at ioc?          # serial ports
  radone* at pci?       # audio

Oh, and I forgot:
5b. get and set date and time.

Later,
Miod

This was followed, the next day, with:

Date: Sat, 16 Feb 2008 21:54:30 +0000
From: Miod Vallat
To: Jasper Lievisse Adriaanse, Joel Sing
Subject: octane diff of the day

Not much success today, unfortunately.

There is indeed a kernel load address issue - I expected I could cheat
and find some address outside of already used memory areas on
ip27/ip30/ip32, but it appears there isn't any, so we'll be stuck with
different kernel configurations, unfortunately.

This diff is made of a few files from pefo coerced into compiling, and
some preliminary address work, to let uncached mappings be correct on
xbow-based machines.

The current hurdle is the inability to tell a machine is an IP30
(Octane). However, since IP27 will return a NULL pointer to the system
description call, while an IP30 won't, we can tell them apart. But the
code which finds an IP27 fails on Octane, so I am currently stuck in
arcbios.c.

I have been picking at Linux a bit (not enough), and there are indeed
significant differences between IP27 and IP30 (different clock chip,
different interrupt handling, and probably more).

I'd like to know if a kernel with the following diff boots (only to find
a serial console and nothing else) on an Origin 200 machine. Jasper, can
you give this a try?

Miod

PS: for reference, here is the hinv -t output on the Octane:

system ARC SGI-IP30 key 0
  processor CPU MIPS-R10000 key 0
    processor FPU MIPS-R10010FPC key 0
    cache primary icache 32 Kbytes (block 2 lines, line 64 bytes)
    cache primary dcache 32 Kbytes (block 2 lines, line 32 bytes)
    cache secondary cache 1024 Kbytes (block 2 lines, line 128 bytes)
  processor CPU MIPS-R10000 key 1
    processor FPU MIPS-R10010FPC key 1
    cache primary icache 32 Kbytes (block 2 lines, line 64 bytes)
    cache primary dcache 32 Kbytes (block 2 lines, line 32 bytes)
    cache secondary cache 1024 Kbytes (block 2 lines, line 128 bytes)
  memory main 128 Mbytes
  adapter XTalk heart key 0
    adapter PCI key 15
      adapter multi function ioc3 key 0
        controller network ef0 key 0
          peripheral network ef0 ethernet (100/10 base-T) key 0
        controller serial IP30 tty key 0
          peripheral line key 0
        controller serial IP30 tty key 1
          peripheral line key 0
        controller pointer pcms key 0
          peripheral pointer key 0
      adapter SCSI QLISP1040 key 0
        controller disk SEAGATE ST118202LC key 1
          peripheral disk unit 0
      adapter SCSI QLISP1040 key 1
      controller audio RAD Audio Processor key 0
    controller display SGI-SI key 0
[...]

A few days later, I had been able to get a kernel to run on the Octane and try its chance at identifying devices.

Date: Wed, 20 Feb 2008 22:32:54 +0000
From: Miod Vallat
To: Jasper Lievisse Adriaanse, Joel Sing, Martin Reindl
Subject: Octane status and diff

I have reached the point where PCI device enumeration almost works
(PCI configuration space access). bus_space for PCI devices is probably
completely wrong, so actual PCI drivers will not work yet.

This means that, right now, it boots into:

>> bootp()/bsd.octane
Setting $netaddr to 10.0.1.158 (from server )
Obtaining /bsd.octane from server
2900544+401224 entry: 0xa800000020020000
ARCS64 Firmware Version 64.0
memory descriptor: type 0 start 0x0 count 0x1
memory descriptor: type 1 start 0x1 count 0x1
memory descriptor: type 7 start 0x2 count 0x2
memory descriptor: type 3 start 0x20004 count 0x1c
memory descriptor: type 5 start 0x20020 count 0x327
memory descriptor: type 3 start 0x20347 count 0xbb9
memory descriptor: type 6 start 0x20f00 count 0x100
memory descriptor: type 3 start 0x21000 count 0x7000
SR=340050a0
Found SGI-IP30, setting up.
memory bank 0: 80430000
memory from 0x20000000000 to 0x28000000000
memory bank 1: 00000000
memory bank 2: 00000000
memory bank 3: 00000000
memory bank 4: 00000000
memory bank 5: 00000000
memory bank 6: 00000000
memory bank 7: 00000000
tentative console uart address @0x900000001f620178
Initial setup done, switching console.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2008 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 4.2-current (OCTANE) #66: Wed Feb 20 21:50:54 GMT 2008
    miod@santoire.gentiane.org:/usr/src/sys/arch/sgi/compile/OCTANE
real mem = 134217728 (128MB)
rsvd mem = 1064960 (1MB)
avail mem = 122413056 (116MB)
mainbus0 at root
cpu0 at mainbus0: MIPS R10000 CPU rev 2.7 174 MHz with R10000 FPU rev 0.0
cpu0: cache L1-I 32KB D 32KB 2 way, L2 1024KB 2 way
xbow0 at mainbus0: XBow revision 4
xheart0 at xbow0 widget 8: Heart revision 4
"HQ4 / ImpactSR" revision 2 at xbow0 widget 9 not configured
xbridge0 at xbow0 widget 15: Bridge revision 3
pci0 at xbridge0 bus 0
isp0 at pci0 dev 0 function 0 "QLogic ISP1020" rev 0x05: irq not
implemented yet
isp0: invalid NVRAM header
scsibus0 at isp0: 16 targets
isp0: Polled Mailbox Command (0x15) Timeout
isp0: Polled Mailbox Command (0x15) Timeout

... and spins writing this message every few seconds.

However, if I disable isp:

>> bootp()/bsd.octane -c
[...]
OpenBSD 4.2-current (OCTANE) #66: Wed Feb 20 21:50:54 GMT 2008
    miod@santoire.gentiane.org:/usr/src/sys/arch/sgi/compile/OCTANE
real mem = 134217728 (128MB)
rsvd mem = 1064960 (1MB)
avail mem = 122413056 (116MB)
User Kernel Config
UKC> disable isp
 17 isp* disabled
UKC> quit
Continuing...
mainbus0 at root
cpu0 at mainbus0: MIPS R10000 CPU rev 2.7 174 MHz with R10000 FPU rev
0.0
cpu0: cache L1-I 32KB D 32KB 2 way, L2 1024KB 2 way
xbow0 at mainbus0: XBow revision 4
xheart0 at xbow0 widget 8: Heart revision 4
"HQ4 / ImpactSR" revision 2 at xbow0 widget 9 not configured
xbridge0 at xbow0 widget 15: Bridge revision 3
pci0 at xbridge0 bus 0
"QLogic ISP1020" rev 0x05 at pci0 dev 0 function 0 not configured
"QLogic ISP1020" rev 0x05 at pci0 dev 1 function 0 not configured
"SGI Rad1" rev 0xc0 at pci0 dev 3 function 0 not configured
clock0 at mainbus0 ticker on int5 using count register
/dev/ksyms: Symbol table not valid.
softraid0 at root
boot device: lookup 'sd0a' failed.
root device:

You'll find below the diff I have been using to reach this state. There
are ugly cheats to get com to get compiled in (so that we can use the
com console routines although we can't attach it yet because we lack an
IOC3 driver at the moment.

Anyway, here are my plans with this code:
- I plan to put a large part of the interrupt knowledge into the xheart
  (for ip30) and xhub (for ip27) drivers, which are empty shells for
  now. There will be various function pointers which will be filled by
  either xheart or xhub as it attaches.
- I need to work on the clock attachment since right now it's @mainbus
  because there is no other place to put it. But this needs a basic IOC3
  driver first.
- I need to complete the pci bridge code. You may notice in the previous
  dmesg that the IOC3 at pci0 dev 2 has not been found at all, and that
  the isp driver went nuts quickly.
- I need to find out a better way to find out the serial console
  address. Joel found its address but we do not understand (yet) why it
  is there.

The short term work will be mainly on xbridge.c and xbridgevar.h, if
only to describe more registers and initialize them. And I'll need to
give the ip27 a try this weekend (I borrowed an O200 from Matthieu's
lab today).

My plan is still to get as much as possible done by sunday, since
afterwards I'll concentrate more on release tasks. I don't think this
code will make 4.3 anyway...

Miod

PS: The kernel address issue. This can really be fixed at the
bootloader. Right now we start our kernel at a location low in physical
memory, making sure there is room for the ARCBios reserved pages in low
memory and the bootloader code itself. What can be done is linking the
kernel at an XKPHYS address, and have the bootloader handle this
correctly. Right now the bootloader truncates the load address to 32
bits, which is why it needs to be in KSEG, for sign-extension to provide
a correct 64 bit pointer. By fixing it to use the real 64 bit address,
the existing load address converted to XKPHYS should work on O2, O200
and Octane - I need to check the CCA space we use on r10k is ok on r5k
though.

Another possibility is to stick to the KSEG address, but then we'll need
to provide several boot loaders because the Octane prom will not load a
kernel in KSEG. And bsd.rd would need to be linked to an XKPHYS address
too... Oh well, it will need some tinkering.

And don't worry about installation scripts, they can decide which
bootloader to use from this:
        sysctl hw.model | sed 's,.*(\(IP.*\)),\1,g'

PPS: rebooting and powering off from ddb work nicely.

[...]

Joel Sing started working on a driver for the IOC3 chip, and shared code on the 23th. I sent him back a new diff with both our work-in-progress changes merged.

Date: Sat, 23 Feb 2008 15:22:29 +0000
From: Miod Vallat
To: Joel Sing
Subject: Re: Introducing ioc(4)

Current code with ioc changes, com@ioc, and new dsrtc attachment. I'm
losing hair on the onewire stuff at the moment.
[...]

The onewire comment is related to the fact that, on Octane and, to a lesser extent on Origin 200, some system information, mainly serial numbers as well as the Ethernet address of the onboard interface, were stored on iButton Dallas chips, which were small write-once memory devices (i.e. it was possible to append data, for example to register a system component upgrade, but not to erase existing data) connected to a 1-Wire bus.

Thankfully there was already support for some 1-Wire devices under OpenBSD thanks to the work of Alexander Yurchenko, so all I had to do in order to be able to get the Ethernet address was to understand how to control the 1-Wire controller found in the Octane.

Between frustrating kernel tests on the Octane, I started tinkering with an Origin 200 system.

It turns out these systems have a stripped-down ARCS firmware with only enough data and function pointers to be able to load a bootloader and run it, and not much else. All the system configuration data is set up in a tree of complex structures in memory, laid out by the PROM during its initialization, and called the KLCONFIG. I don't know what the letters KL are supposed to mean, but until I got familiar with it, it sounded like KLingon to me.

Technical details you may skip!

There is a short description of these structures in IRIX /usr/include/sys/SN/klconfig.h header file.

/*
 * The KLCONFIG structures store info about the various BOARDs found
 * during Hardware Discovery. In addition, it stores info about the
 * components found on the BOARDs.
 */
[...]
/*
 * The KLCONFIG area is organized as a LINKED LIST of BOARDs. A BOARD
 * can be either 'LOCAL' or 'REMOTE'. LOCAL means it is attached to
 * the LOCAL/current NODE. REMOTE means it is attached to a different
 * node.(TBD - Need a way to treat ROUTER boards.)
 *
 * There are 2 different structures to represent these boards -
 * lboard - Local board, rboard - remote board. These 2 structures
 * can be arbitrarily mixed in the LINKED LIST of BOARDs. (Refer
 * Figure below). The first byte of the rboard or lboard structure
 * is used to find out its type - no unions are used.
 * If it is a lboard, then the config info of this board will be found
 * on the local node. (LOCAL NODE BASE + offset value gives pointer to
 * the structure.
 * If it is a rboard, the local structure contains the node number
 * and the offset of the beginning of the LINKED LIST on the remote node.
 * The details of the hardware on a remote node can be built locally,
 * if required, by reading the LINKED LIST on the remote node and
 * ignoring all the rboards on that node.
 *
 * The local node uses the REMOTE NODE NUMBER + OFFSET to point to the
 * First board info on the remote node. The remote node list is
 * traversed as the local list, using the REMOTE BASE ADDRESS and not
 * the local base address and ignoring all rboard values.
 *
 *
 KLCONFIG

 +------------+      +------------+      +------------+      +------------+
 |  lboard    |  +-->|   lboard   |  +-->|   rboard   |  +-->|   lboard   |
 +------------+  |   +------------+  |   +------------+  |   +------------+
 | board info |  |   | board info |  |   |errinfo,bptr|  |   | board info |
 +------------+  |   +------------+  |   +------------+  |   +------------+
 | offset     |--+   |  offset    |--+   |  offset    |--+   |offset=NULL |
 +------------+      +------------+      +------------+      +------------+


 +------------+
 | board info |
 +------------+       +--------------------------------+
 | compt 1    |------>| type, rev, diaginfo, size ...  |  (CPU)
 +------------+       +--------------------------------+
 | compt 2    |--+
 +------------+  |    +--------------------------------+
 |  ...       |  +--->| type, rev, diaginfo, size ...  |  (MEM_BANK)
 +------------+       +--------------------------------+
 | errinfo    |--+
 +------------+  |    +--------------------------------+
                 +--->|r/l brd errinfo,compt err flags |
                      +--------------------------------+

 *
 * Each BOARD consists of COMPONENTs and the BOARD structure has
 * pointers (offsets) to its COMPONENT structure.
 * The COMPONENT structure has version info, size and speed info, revision,
 * error info and the NIC info. This structure can accomodate any
 * BOARD with arbitrary COMPONENT composition.
 *
 * The ERRORINFO part of each BOARD has error information
 * that describes errors about the BOARD itself. It also has flags to
 * indicate the COMPONENT(s) on the board that have errors. The error
 * information specific to the COMPONENT is present in the respective
 * COMPONENT structure.
 *
 * The ERRORINFO structure is also treated like a COMPONENT, ie. the
 * BOARD has pointers(offset) to the ERRORINFO structure. The rboard
 * structure also has a pointer to the ERRORINFO structure. This is
 * the place to store ERRORINFO about a REMOTE NODE, if the HUB on
 * that NODE is not working or if the REMOTE MEMORY is BAD. In cases where
 * only the CPU of the REMOTE NODE is disabled, the ERRORINFO pointer can
 * be a NODE NUMBER, REMOTE OFFSET combination, pointing to error info
 * which is present on the REMOTE NODE.(TBD)
 * REMOTE ERRINFO can be stored on any of the nearest nodes
 * or on all the nearest nodes.(TBD)
 * Like BOARD structures, REMOTE ERRINFO structures can be built locally
 * using the rboard errinfo pointer.
[...]
 */

Date: Wed, 27 Feb 2008 05:59:48 +0000
From: Miod Vallat
To: Jasper Lievisse Adriaanse
Cc: Martin Reindl, Joel Sing
Subject: Re: O2/O200/Octane: we'll need three kernels

[...]
I have fixed a few things and I've been able to panic in uvm bowels
shortly after the copyright message:

>> bootp()/bsd.ip27
Setting $netaddr to 10.0.1.145 (from server )
Obtaining bsd.ip27 from server
2908192+400344 entry: 0xa800000000020000
ARCS64 Firmware Version 64.0
memory descriptor: type 0 start 0x0 count 0x1
memory descriptor: type 1 start 0x1 count 0x1
memory descriptor: type 3 start 0x19 count 0x7
memory descriptor: type 5 start 0x20 count 0x328
memory descriptor: type 3 start 0x348 count 0xfb8
memory descriptor: type 6 start 0x1300 count 0x100
memory descriptor: type 3 start 0x1400 count 0x100
memory descriptor: type 6 start 0x1500 count 0x300
memory descriptor: type 6 start 0x1800 count 0x200
memory descriptor: type 7 start 0x1a00 count 0x600
SR=246000a0
CFG=6c0aa985
Found SGI-IP27, setting up.
config @0x9600000000004000
magic 0xbeedbabe version 0
console 0x9200000008620178 baud 9600
Machine is in M mode.
Region present 0xffffffffffffffff.
Calias size 0x2.
board type 11 components 4
cpu type 934 225Mhz cache 2MB speed 225Mhz
cpu type 934 225Mhz cache 2MB speed 225Mhz
hub port -1 speed 90MHz
memory 1024MB, select 0
   bank 1 256MB
   bank 2 512MB
   bank 3 128MB
   bank 4 128MB
board type 41 components 0
component type 26
board type 21 components 4
component type 5
component type 11
component type 11
component type 6
sys_config.console_io.bus_base = 0x9200000008000000
sys_config.cons_ioaddr = 0x620178
kernel: 0x20-0x348
page 0: 0x19-0x20
uvm_page_physload(0x19,0x20)
page 1: 0x348-0x1300
uvm_page_physload(0x348,0x1300)
page 2: 0x1400-0x1500
uvm_page_physload(0x1400,0x1500)
page 3: 0x2000-0x40000
uvm_page_physload(0x2000,0x40000)
Initial setup done, switching console.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2008 OpenBSD. All rights reserved.  http://www.OpenBSD.org

Trap cause = 7 Frame 0xffffffff8001edb8
Trap PC 0xa8000000001ffb3c RA 0xa8000000001ffb20 fault 0xffffffffc93c9a70
0xa8000000001ffb3c ra 0xa8000000001ffb20 sp 0xffffffff8001ef10 (0xffffffffca311000,0x100027,0x1000,0xffffffff8001ee64)
0xa8000000001ffb20 ra 0x0 sp 0xffffffff8001ef10
User-level: pid 0
stopped on non ddb fault
Stopped at      0xa8000000001ffb3c:     sd      v1,0(v0)
0xa8000000001ff9d8 (ffffffffca311000,100027,1000,ffffffff8001ee64) sp ffffffff8001ef10 ra a8000000001ffb20, sz 0
0xa8000000001ff9d8 (ffffffffca311000,100027,1000,ffffffff8001ee64) sp ffffffff8001ef10 ra 0, sz 0
User-level: pid 0
ddb>

That's progress! Although not much.

The 1-Wire side quest completed, and it was time to address interrupt handling.

Date: Thu, 28 Feb 2008 20:04:00 +0000
From: Miod Vallat
To: Jasper Lievisse Adriaanse, Joel Sing, Martin Reindl
Subject: octane news

Bah. I'm busy working on my employer's answer to an European Space
Agency 18 months contract. Not much left to hack...

* Octane

  - there's a 1-Wire controller on the HEART widget, it gives us the
    mobo serial number.

  - reworked the 1-Wire stuff to share code, split the EEPROM driver
    into the Ethernet Address flavor and the serial number flavor, got
    rid of the intermediate iocow layer. Also, onewire attaches early
    so that (eventually) other devices can access the information they
    provide.

  - no work on the interrupt stuff )-:

  - no improvement on the isp side although I've experimented a lot of
    things (this is why I want minimal ip27 support so that I can test
    other pci devices, as I don't have the pci cardcage in ``my''
    Octane).

* Origin 200

  - fixed memory segment initialization: memory below 32MB gets filled
    by ARCBios with a bug workaround, memory above 32MB gets filled from
    the node information. This lets me past copyright.

  - unfortunately uvm_km_page_init() dies with a bus error, which I am
    currently investigating.

* Short-term plans

  - interrupt handling on IP30 (necessary so that ioc subdevices can use
    interrupts, to be confirmed with a working bsd.rd). This will allow
    Joel to work on the Ethernet driver.

  - Find out why IP27 is dying so early and fix it.

No up-to-date diff yet, although I have uploaded recent kernels. Here's
what I now get on Octane:

Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2008 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 4.3-beta (GENERIC-IP30) #45: Thu Feb 28 19:45:09 GMT 2008
    miod@santoire.gentiane.org:/usr/src/sys/arch/sgi/compile/GENERIC-IP30
real mem = 134217728 (128MB)
rsvd mem = 1064960 (1MB)
avail mem = 122408960 (116MB)
mainbus0 at root
cpu0 at mainbus0: MIPS R10000 CPU rev 2.7 174 MHz with R10000 FPU rev 0.0
cpu0: cache L1-I 32KB D 32KB 2 way, L2 1024KB 2 way
clock0 at mainbus0: ticker on int5 using count register
xbow0 at mainbus0: XBow revision 4
xheart0 at xbow0 widget 8: Heart revision 4
onewire0 at xheart0
owserial0 at onewire0 "16kb EPROM" sn 00000012fd21
owserial0: "PM20175MHZ" serial 030-1208-002
"ImpactSR" revision 2 at xbow0 widget 9 not configured
xbridge0 at xbow0 widget 15: Bridge revision 3
pci0 at xbridge0 bus 0
isp0 at pci0 dev 0 function 0 "QLogic ISP1020" rev 0x05: irq 8
isp0: Polled Mailbox Command (0x8) Timeout
isp0: Polled Mailbox Command (0x0) Timeout
isp1 at pci0 dev 1 function 0 "QLogic ISP1020" rev 0x05: irq 9
isp1: Polled Mailbox Command (0x8) Timeout
isp1: Polled Mailbox Command (0x0) Timeout
ioc0 at pci0 dev 2 function 0 "SGI IOC3" rev 0x01
onewire1 at ioc0
owmac0 at onewire1 "1kb EPROM" sn 0000013f1552
owmac0: Ethernet Address 08:00:69:0b:f9:a1
owserial1 at onewire1 "16kb EPROM" sn 00000012034b
owserial1: "PWR.SPPLY.SR" serial 060-0028-003
owserial2 at onewire1 "16kb EPROM" sn 0000001e4edf
owserial2: "FP1" serial 030-0891-003
com0 at ioc0 base 0x00020178: ns16550a, 16 byte fifo
com0: console
com1 at ioc0 base 0x00020170: ns16550a, 16 byte fifo
dsrtc0 at ioc0: DS1687
"SGI Rad1" rev 0xc0 at pci0 dev 3 function 0 not configured
/dev/ksyms: Symbol table not valid.
softraid0 at root
boot device: lookup 'sd0a' failed.
root device:

That Octane work was stalled due to the OpenBSD 4.3 release process, during which I got to be the person building the OpenBSD/sgi release binaries.

I intended to resume working on it, but kept being distracted by other work. I commited the work-in-progress code to the OpenBSD CVS repository on april 7th, in order to make it public, so that other people could tinker with it if they would like to.

Late april, I received in private mail a report that an O₂ system where OpenBSD would report more memory than its PROM. Thankfully the reporter agreed to test a kernel with extra debug information, which allowed me to figure out the cause of the problem. A fix was quickly devised.

Date: Tue, 29 Apr 2008 17:55:13 +0000
From: Miod Vallat
To: private OpenBSD mailinglist
Subject: better SGI O2 memory detection

I have received a report in private mail of bsd not reporting memory
correctly on an O2.

hinv would report 576MB:

              Memory size: 576 Mbytes

and the kernel would report this:

        real mem = 671088640 (640MB)
        rsvd mem = 7020544 (6MB)
        avail mem = 877572096 (836MB)

While having more available memory than existing is an impressive
achievement, there is no doubt the machine will misbehave when trying to
use memory which is not there.

The discrepency [sic] between real and available memory hinted that physmem
would not increase when memory banks were counted, which in turns hints
at overlapping banks.

So I experimented with many weird memory layouts in an O2 and some debug
information, and this was exactly what I saw. The logic in
crime_configure_memory() assumes empty banks are reported as address
zero.

This is wrong!!!

What really happens is that CRIME reports an overlapping memory
region - but not always the same!

Here's an example. The machine was populated with 2x32MB, then 4x64MB,
then 2x32MB, for a total of 384MB memory. CRIME happily reports:

        bank 0  32MB at 256MB
        bank 1  32MB at 288MB
        bank 2  128MB at 0MB
        bank 3  32MB at 256MB           -- empty due to 2x64MB merge
        bank 4  128MB at 128MB
        bank 5  32MB at 256MB           -- empty due to 2x64MB merge
        bank 6  32MB at 320MB
        bank 7  32MB at 352MB

An unmodified GENERIC kernel would not even boot with this setup - it
would die early in uvm initialization.

In this case the empty slots simply mimic the first one (32MB at 256MB).
But the value may change. Here is a setup with 2x32+2x64+2x32:

        bank 0  32MB at 128MB
        bank 1  32MB at 160MB
        bank 2  128MB at 0MB
        bank 3  32MB at 128MB           -- empty due to 2x64MB merge
        bank 4  32MB at 192MB
        bank 5  32MB at 224MB
        bank 6  32MB at 128MB           -- empty
        bank 7  32MB at 128MB           -- empty

Now let's remove all 64MB DIMMs and keep 4x32MB:

        bank 0  32MB at 0MB
        bank 1  32MB at 32MB
        bank 2  32MB at 64MB
        bank 3  32MB at 96MB
        bank 4  32MB at 0MB             -- empty
        bank 5  32MB at 0MB             -- empty
        bank 6  32MB at 0MB             -- empty
        bank 7  32MB at 0MB             -- empty

And finally, for more fun, 2x64+4x32:

        bank 0  128MB at 0MB
        bank 1  128MB at 0MB            -- empty due to 2x64MB merge
        bank 2  32MB at 128MB
        bank 3  32MB at 160MB
        bank 4  32MB at 192MB
        bank 5  32MB at 224MB
        bank 6  128MB at 0MB            -- empty
        bank 7  128MB at 0MB            -- empty

So basically:
- every time you'll put a pair of 64MB DIMMs, CRIME will merge them as a
  single 128MB bank, and the bank following it is empty.
- really empty banks (such as #6 and #7 in the last example) copy the
  value of the first bank.

If you follow SGI recommandation [sic] to put the larger DIMMs in the lower
banks, it will configure them at the lowest address (below 256MB), so
the copies of bank0 will be ignored. But since the machine won't mind
booting with the DIMMs in random order, we have to cope with this
nevertheless.

With this in mind, I cooked the following diff.

Please test this on your O2s and let me know if the kernel still reports
as much memory as hinv, and available memory lower [than] real memory.

Miod

After about a week of testing, this fix hit the tree on may 4th.

In late july, I borrowed a PCI WaveLan card from a friend. That card was in fact a PCI/PCMCIA bridge with a PCMCIA WaveLan card plugged into it.

<miod> wi0 at pci0 dev 3 function 0 "US Robotics WL11000P" rev 0x02: irq 11
<miod> wi0: "Lucent Technologies, WaveLAN/IEEE, Version 01.01"
<miod> wi0: Firmware 6.04 variant 1, address 00:60:1d:1d:21:61
<miod> on sgi.
<kettenis> O2 or something more interesting?
<miod> O2.
<deraadt> a pci one eh.
<miod> who wants the diff?
<deraadt> I do.  It needed a diff?  There was stuff wrong?
<miod> bus_space fixes
<deraadt> Oh.  sgi.  Right :)
<miod> then it worked out of the box
<deraadt> nice.
<dlg> miod has pci devices?
<miod> I borrow them.

Octane progress would still have to wait, however.

(Follow this link to go forward to the next part.)