OpenBSD stories
miod > software > OpenBSD > stories > OpenBSD on Motorola 88000 processors

(Follow this link to go back to the main m88k page, and this link to go back to the previous part.)

PART 2: A New Hope

On june 29th, 2003, I announced that I was taking a break from OpenBSD, which was ``likely to last at least three months''. I definitely needed time to cool off and step back.

But although I was stepping back, I could not stop tinkering with the OpenBSD source code, and kept writing (and sharing) minor bugfixes.

In mid-july, I decided to reduce my time off to two months and come back on september 1st.

I realized this was the low-pressure time I needed to try and fix the compiler bugs. Either I would be successful and the mvme88k port would have a future, or (more likely) I would fail miserably and noone would ever know about this and be disappointed.

I started from the assertion failure in uvm with the gcc 2.95 compiled kernel.

On july 15th, I had been able to produce a simple testcase out of it.

Date: Tue, 15 Jul 2003 17:48:58 +0000
From: Miod Vallat
To: Marc Espie, Thierry Deval, Henning Brauer, Nick Holland
Subject: fun with gcc

No comments.

miod@arzon OpenBSD/mvme88k [/users/miod] $ uname -a
OpenBSD arzon 3.3 GENERIC#73 mvme88k
miod@arzon OpenBSD/mvme88k [/users/miod] $ gcc31 -v
Using builtin specs.
Reading specs from /usr/lib/gcc-lib/m88k-unknown-openbsd3.1/specs
gcc version 2.95.3 20010125 (prerelease)
miod@arzon OpenBSD/mvme88k [/users/miod] $ cat assert.c
#include <stdio.h>
#include <sys/types.h>

#define KASSERT(e)      ((e) ? (void) 0 : __assert( __FUNCTION__, #e))

void
__assert(const char *funcname, const char *error)
{
        printf("Assertion failed in %s: %s\n", funcname, error);
        /* exit(0); */
}

void *
uvm_pagealloc_strat(void *obj, u_int64_t off, void *anon, int flags, int strat,
    int free_list)
{
        KASSERT(anon == NULL);

        return obj;
}

void *
uvm_pagealloc(void *obj, u_int64_t off, void *anon, int flags)
{
        KASSERT(anon == NULL);

        return obj;
}

main()
{
        char *pg;
        char *kobj = "kobj";

        pg = uvm_pagealloc(kobj, 0, NULL, 0);
        pg = uvm_pagealloc(kobj, 0, NULL, 0);
        pg = uvm_pagealloc_strat(kobj, 0, NULL, 0, 0, 0);
        pg = uvm_pagealloc(kobj, 0, NULL, 0);
}
miod@arzon OpenBSD/mvme88k [/users/miod] $ gcc31 -O0 -o assert assert.c
miod@arzon OpenBSD/mvme88k [/users/miod] $ ./assert
Assertion failed in uvm_pagealloc: anon == NULL
miod@arzon OpenBSD/mvme88k [/users/miod] $ gcc -v
Reading specs from /usr/lib/gcc-lib/m88k-unknown-openbsd2.5/2.8.1/specs
gcc version 2.8.1
miod@arzon OpenBSD/mvme88k [/users/miod] $ gcc -O0 -o assert assert.c
miod@arzon OpenBSD/mvme88k [/users/miod] $ ./assert
miod@arzon OpenBSD/mvme88k [/users/miod] $

A few hours later, I had been able to shrink my test case even more, then understand the cause of the bug, and fix it.

Date: Tue, 15 Jul 2003 23:33:40 +0000
From: Miod Vallat
To: Marc Espie, Thierry Deval, Steve Murphree, Paul Weissmann, Theo de Raadt
Subject: One less gcc bug on m88k...

gcc 2.8:
$ grep FUNCTION_ARGS_ADVANCE *
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, TYPE_MODE (type), type,
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, Pmode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
function.c:      FUNCTION_ARG_ADVANCE (args_so_far, promoted_mode,

gcc 2.95:
$ grep FUNCTION_ARGS_ADVANCE *
calls.c:      FUNCTION_ARG_ADVANCE (*args_so_far, TYPE_MODE (type), type,
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, Pmode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
function.c:      FUNCTION_ARG_ADVANCE (args_so_far, promoted_mode,

Note how the first call uses a pointer dereference now? Can you already
guess what the bug description below is?


$ cd config/m88k
$ head -1069 m88k.h | tail -18
/* A C statement (sans semicolon) to update the summarizer variable
   CUM to advance past an argument in the argument list.  The values
   MODE, TYPE and NAMED describe that argument.  Once this is done,
   the variable CUM is suitable for analyzing the *following* argument
   with `FUNCTION_ARG', etc.  (TYPE is null for libcalls where that
   information may not be available.)  */
#define FUNCTION_ARG_ADVANCE(CUM, MODE, TYPE, NAMED)                    \
  do {                                                                  \
    enum machine_mode __mode = (TYPE) ? TYPE_MODE (TYPE) : (MODE);      \
    if ((CUM & 1)                                                       \
        && (__mode == DImode || __mode == DFmode                        \
            || ((TYPE) && TYPE_ALIGN (TYPE) > BITS_PER_WORD)))          \
      CUM++;                                                            \
    CUM += (((__mode != BLKmode)                                        \
             ? GET_MODE_SIZE (MODE) : int_size_in_bytes (TYPE))         \
            + 3) / 4;                                                   \
  } while (0)

Note how CUM is unprotected, especially in CUM++ ...

What did this produce in practice? Well, the m88k calling convention
mandates that the arguments are passed in registers r2-r9, and if this
is not enough, on the stack. It also mandates that, if an argument can
not fit in one register (float [I really meant to write "double" here], or int64_t), it gets put in two
consecutive registers starting at an even number (so that double word
load and store opcodes can be used) - the CUM++ test logic is to know
whether we have to waste an unused odd-numbered register to respect
this.

In my test case, I used the following function prototypes:

void even64(int oddmaker, int evenmaker, u_int64_t stamper, int value);
void odd64(int oddmaker, u_int64_t stamper, int value);

Their calling conventions would be...
odd64:
  r2 - oddmaker
  r3 - evenmaker
  r4, r5 - stamper
  r6 - value

and for even64:
  r2 - oddmaker
  r3 - unused
  r4, r5 - stamper
  r6 - value

Compiling a call to even64() would trigger the CUM++ statement from the
call in calls.c using the pointer. One more bug caused by the
preprocessor on apparently correct code, especially since it had been
working in previous gcc versions...

As a result, CUM would end up with a very huge (semi-random) value,
which would cause the register allocator to consider it had exhausted
the r2-r9 range, and use this calling convention:

  r2 - oddmaker
  r3 - unused
  r4, r5 - stamper
  stack - value


When applied to the kernel, this would cause any kernel compiled by gcc
2.95 to die horribly in the first few uvm KASSERT macros...
[...]

(If you're not familiar with the way C preprocessor macros work, they perform a direct substitution of their arguments when "invoked". So when CUM was substituted with *args_so_far in the first use in calls.c, the payload of the first if statement in the macro expansion, intended to skip an odd-numbered register if the argument to pass to the function would be passed in a register pair, would be *args_so_far++, which would increment the pointer, but not the value it points to, instead of the intended (*args_so_far)++, which would increment the value it points to and leave the pointer unmodified.

Because of this, not only would the value of CUM be incorrect, but we would slowly corrupt the compiler's own memory. Apparently this was benign enough to not cause it to crash.)

This was, of course, trivial to fix, by making sure all the arguments of all the macros in the m88k backend were put in parentheses every time they were used, i.e. writing (CUM) instead of CUM. In fact, all other gcc backends had been fixed that way, but for whatever reason, the m88k backend hadn't.

Fixing this gave me a kernel which booted without failing assertions upon startup.

I was however not willing to trust that compiler and that kernel yet, and rebooted on the old kernel compiled with gcc 2.8.1 I had been using.


Before I could start trusting anything built with gcc 2.95, I wanted it to be able to rebuild itself.

In fact, building gcc 2.95, with the macro fix, using gcc 2.8.1, would work. Then attempting to build gcc 2.95 again, this time with itself, would fail quickly, with one of its helper programs, genrecog, freshly built, dumping core.

The genrecog tool is used to generate the insn-recog.c file from the machine-dependent backend information, in our case config/m88k/m88k.md. It shares a lot of code with other genfoo code extracting the various information from the m88k.md machine description file. In fact, I could track very quickly the breakage to genrecog.c. Compiling every other part of genrecog but genrecog.c with gcc 2.95, and genrecog.c with gcc 2.8, would produce a working genrecog binary.

Also, apparently, that genrecog.c miscompilation was the only thing preventing gcc 2.95 from recompiling itself (with optimization disabled, at this point.) For a while, I procrastinated on this by using gcc 2.8.1 to compile that particular file, but there was nevertheless a bug waiting to be taken care of.

genrecog emits a lot of information, in fact C source code, on its standard output. When run manually, it would die very quickly with this output:

$ cd obj
$ ./genrecog /usr/src/gnu/egcs/gcc/config/m88k/m88k.md
/* Generated automatically by the program `genrecog'
from the machine description file `md'.  */

#include "config.h"
#include "system.h"
#include "rtl.h"
#include "insn-config.h"
#include "recog.h"
#include "real.h"
#include "output.h"
#include "flags.h"

extern rtx gen_split_1 ();
extern rtx gen_split_2 ();
Memory fault (core dumped)
$

This was simple enough to trace the segfault to this snippet from main():

  while (1)
    {
      c = read_skip_spaces (infile);
      if (c == EOF)
        break;
      ungetc (c, infile);

      desc = read_rtx (infile);
      if (GET_CODE (desc) == DEFINE_INSN)
        recog_tree = merge_trees (recog_tree,
                                  make_insn_sequence (desc, RECOG));
      else if (GET_CODE (desc) == DEFINE_SPLIT)
        split_tree = merge_trees (split_tree,
                                  make_insn_sequence (desc, SPLIT));
      if (GET_CODE (desc) == DEFINE_PEEPHOLE
          || GET_CODE (desc) == DEFINE_EXPAND)
        next_insn_code++;
      next_index++;
    }

make_insn_sequence might produce an insn sequence or whatever. To continue diving into genrecog, all we need to know is what kind of return value it provides. A quick glance at the source will tell that:

static struct decision_head make_insn_sequence PROTO((rtx, enum routine_type));
...
static struct decision_head merge_trees PROTO((struct decision_head,
                                               struct decision_head));

So these function work on decision_head structures, which are defined as:

/* Data structure for a listhead of decision trees.  The alternatives
   to a node are kept in a doublely-linked list so we can easily add nodes
   to the proper place when merging.  */

struct decision_head { struct decision *first, *last; };

Using the best debugging tools of the trade, also known as printf-based debugging, I added traces of the values of these decision_head structs.

Adding traces to genrecog is very easy: since it outputs C code, you just have to output your traces as C comments. Adding simple traces to merge_trees proved very quickly that it was given an invalid argument. But then traces in make_insn_sequence would prove that it would produce valid data! The output with my trace information would end like this:

[...]
extern rtx gen_split_1 ();
/* make_insn_sequence -> 2e000.2e000 */
/* merge_trees <- 0.0 8.4008 */
/* merge_trees -> 8.4008 */
extern rtx gen_split_2 ();
/* make_insn_sequence -> 2e380.2e380 */
/* merge_trees <- 8.4008 18.4018 */
Memory fault (core dumped)
$

This was enough to hint that the problem was related to the way short structures were being passed to and returned by functions. Mimicing the decision_head struct layout, I came with that test program:

#include <stdio.h>

struct maze { const char *item1, *item2; };

struct maze
builder(void)
{
        struct maze m;

        m.item1 = "you are lost";
        m.item2 = "in the maze.";

        printf("builder: m.item1 = %p, item2 = %p\n", m.item1, m.item2);

        return m;
}

void
checker(struct maze m)
{
        printf("checker: m.item1 = %p, item2 = %p\n", m.item1, m.item2);
}

main()
{
        checker(builder());
}

When compiled using gcc 2.8, it would run nicely...

$ gcc28 -O0 -o maze maze.c
$ ./maze
builder: m.item1 = 0x10f8, item2 = 0x1108
checker: m.item1 = 0x10f8, item2 = 0x1108
$

...while, once rebuilt with gcc 2.95, it would misbehave.

$ gcc295 -O0 -o maze maze.c
$ ./maze
builder: m.item1 = 0x10f8, item2 = 0x1108
checker: m.item1 = 0x0, item2 = 0x0
$

Tinkering with the size of the struct being used in this program showed that only structs which size was between 8 and 32 bytes, inclusive, would not be passed correctly.

Looking at the generated code, gcc 2.8 would produce this code for main (prologue and epilogue omitted):

        or      r12,r0,r31
        bsr     _builder
        bsr     _checker

But gcc 2.95 would emit two more instructions:

        ld.d    r24,r0,r31
        or      r12,r0,r31
        bsr     _builder
        st.d    r24,r0,r31
        bsr     _checker

Before we go further, we need to know a few more information regarding the m88k calling convention. The standard is to never pass the structures in registers, but always on the stack. And if the function returns a struct itself, the address of the returned struct should be set in r12 by the caller, which will have allocated the appropriate space. This allows such function calls to be recursive, unlike the older, PCC struct passing convention, where the space for the struct would be a global anonymous variable in memory.

In the gcc 2.8 code, the compiler already optimized the calls flow so that the structure is immediately on the stack frame, thus immediately usable in checker(). After r12 is set to point to the temporary location, builder() and checker() are invoked.

gcc 2.95, on the other hand, adds two extra statements. The first one saves the stack area which is about to be used by builder() in registers r24 and r25. The second statement restores this area immediately before invoking checker(), effectively making it check uninitialized memory!

In fact, if I introduce a temporary variable to store the builder() result, like this:

main()
{
        struct maze m = builder();
        checker(m);
}

Then both gcc 2.8 and gcc 2.95 would produce the same, correct, code:

        addu    r12,r30,8
        bsr     _builder
        or      r13,r0,r31
        addu    r11,r30,8
        ld      r12,r11,0
        ld      r11,r11,4
        st      r12,r13,0
        st      r11,r13,4
        bsr     _checker

In this case, local variable m is found at r30+8; then after builder returns, the address of the struct is computed again in r11 (remember this is without any form of optimization), its two fields are copied to the stack at r31+0, and that address passed to checker in r12. (The contents of m need to be copied to a different place, because C functions are allowed to modify the arguments they receive by value, but for their own use only; therefore they must receive a copy of that value.)


This time, I could not quickly figure out what made gcc 2.95 misbehave, and I asked people with better gcc skills than me, for help.

Date: Sun, 20 Jul 2003 00:09:32 +0000
From: Miod Vallat
To: Anil Madhavapeddy, Hiroaki Etoh, Marc Espie, Niklas Hallqvist
Subject: gcc help needed

Hi,

  I am finally seriously working on gcc/m88k, in order to give the
OpenBSD/mvme88k a new chance to exist. I have been finding and fixing a
few issues which makes gcc 2.95 almost working on this platform.

Unfortunately, I hit a showstopper bug, which I don't know how to hunt
right now... Since you guys know gcc internals much better than I do, I
figured I might ask for some of your time on this.

A description of the problem [...]

             [...] I have no idea where the two extra assembly
statements in gcc 2.95 come from... knowing where they are generated
would be a good start!

Thanks for your time,
Miod

I could not however simply wait for outside assistance, and kept poking. In particular, that problem was specific to the m88k backend, and my test program passed with flying colours on all hardware platforms I could try it on, from alpha to vax. Investigating the differences between m88k and the other processors was the next logical step.

Depending on the various way function calls work across the different cpus, with various calling conventions and stack layouts, not even counting register windows, the machine-independent part of gcc tries to offer as much flexibility as possible, letting architecture-dependent configuration files define the different behaviours they provide or expect, using a bunch of way too many macros.

Some of these macros must be defined for every architecture, while others are only defined if the architecture requires it. The m88k is pretty unique here, since it defines REG_PARM_STACK_SPACE and OUTGOING_REG_PARM_STACK_SPACE (a few other architectures do this as well), as well as STACK_PARMS_IN_REG_PARM_AREA.

What do these macros tell the compiler? Let's quote from the manual:

REG_PARM_STACK_SPACE (FNDECL)
Define this macro if functions should assume that stack space has been allocated for arguments even when their values are passed in registers.
The value of this macro is the size, in bytes, of the area reserved for arguments passed in registers for the function represented by FNDECL, which can be zero if GNU CC is calling a library function.
This space can be allocated by the caller, or be a part of the machine-dependent stack frame: OUTGOING_REG_PARM_STACK_SPACE says which.
OUTGOING_REG_PARM_STACK_SPACE
Define this if it is the responsibility of the caller to allocate the area reserved for arguments passed in registers.

Nothing really fancy here. The m88k-specific backend indeed will automagically allocate 32 bytes on the stack during the function prologue. Hey, wait, 32 bytes, exactly like the structure size limit found earlier, with larger structures never being clobbered!

A simple experiment is to comment out OUTGOING_REG_PARM_STACK_SPACE. Compiling gcc with these settings will produce a stack-wasting compiler, because every function prologue will now automagically allocate an extra 64 bytes on the stack: 32 from the m88k-specific prologue, and 32 from the architecture-independent prologue code, since it has been now instructed to do so. However, when using this compiler, the problem disappears completely, whichever size the structure is.

Another path worth trying would be to remove this automatic stack allocation, and undefine REG_PARM_STACK_SPACE. However, I am afraid this could break some 88Open rule, or, even worse, some implicit assumption hidden in the m88k-specific backend code.

Now, time to look at STACK_PARMS_IN_REG_PARM_AREA... Quoting the manuals again:

STACK_PARMS_IN_REG_PARM_AREA
Define this macro if REG_PARM_STACK_SPACE is defined, but the stack parameters don't skip the area specified by it.
Normally, when a parameter is not passed in registers, it is placed on the stack beyond the REG_PARM_STACK_SPACE area.
Defining this macro suppresses this behavior and causes the parameter to be passed on the stack in its natural location.

This is very interesting, and also very obscure (you'll know what to blame for your next headache.) No architecture but m88k defines this. It means that the REG_PARM_STACK_SPACE area can be shared by both function call parameters, and local variables.

While gcc 2.8 would not seem to care much about this area being shared, and would assume we know what we are doing, gcc 2.95 seems to be more strict, and will explicitely save any variable of this shared area, when it might be clobbered. And this is exactly what we had witnessed! The area on the stack which has been saved before invoking builder() and restored after it returned, is the implicit location for a temporary struct maze.

I could not figure out how to make gcc 2.95 handle this shared area the way gcc 2.8 did, and actually, there were probably very good reasons to change this behaviour. To remain on the safe side, I opted to stop defining STACK_PARMS_IN_REG_PARM_AREA.

The generated code, with gcc 2.95, became:

        addu    r12,r30,8
        bsr     _builder
        addu    r13,r31,32
        addu    r11,r30,8
        ld      r12,r11,0
        ld      r11,r11,4
        st      r12,r13,0
        st      r11,r13,4
        bsr     _checker

which is the same code as when the result of builder() was stored in an explicit temporary variable, except for the stack location: instead of being at the beginning of the REG_PARM_STACK_SPACE area, it is now past this area.

I could tell the people I had asked for help that their help was no longer needed, after only a mere 5 hours... and probably went to sleep immediately afterwards, given the timestamp of that mail.

Date: Sun, 20 Jul 2003 05:11:42 +0000
From: Miod Vallat
To: Anil Madhavapeddy, Hiroaki Etoh, Marc Espie, Niklas Hallqvist
Subject: Re: gcc help needed

[...]
I tracked this down to the machine-dependent backend, related to
REG_PARM_STACK_SPACE(), and I have a working, ugly, workaround until a
decent fix is ready.

Miod

I taunted the other developers a few days later.

Date: Mon, 21 Jul 2003 08:03:15 +0000
From: Miod Vallat
To: private OpenBSD mailinglist
Subject: bliss

$ uname -a
OpenBSD arzon 3.3 GENERIC#73 mvme88k
$ head -39 /usr/share/mk/sys.mk | tail -5
#.if (${MACHINE_ARCH} == "m88k")
#CFLAGS?=       -O0 ${PIPE} ${DEBUG}
#.else
CFLAGS?=        -O2 ${PIPE} ${DEBUG}
#.endif
$ gcc -v
Reading specs from /usr/lib/gcc-lib/m88k-unknown-openbsd3.3/specs
gcc version 2.95.3 20010125 (prerelease, propolice)
$

and from then on, it was obvious to everyone that I would be coming back soon.


Switching the default compiler flags from -O0 (no optimization at all) to -O2 (the usual set of optimizations), as hinted in the email above, was premature.

First, I needed to make sure I could build a complete OpenBSD/mvme88k userland, with the modified gcc 2.95, and optimization disabled; then, I could start enabling optimization and see how well they would work.

I lost some time trying to get perl to build. For reasons I don't remember, I had not rebuilt perl recently on mvme88k, and had still perl version 5.6.1 around. But in late october 2002, Todd Miller had updated perl to version 5.8. And perl would not build with the new compiler... but also not with the old one.

Eventually I tracked this down to a bug in the m88k-specific parts of libc, and I had to vent a bit.

Date: Tue, 29 Jul 2003 19:04:08 +0000
From: Miod Vallat
To: "Todd C. Miller", Theo de Raadt
Subject: perl on m88k

Todd, since you know Perl very well, how many grumpyness points do you
get for tracking perl's build problems (miniperl exiting with "panic:
top_lev") to a bug in siglongjmp(3)?

Miod

[Now if I could fix that isakmpd regress/x509 link error, i could even
make build without NO_REGRESS]

I came back on august 1st, after only one month of break, and it was business at usual.

My first commit this day was the compiler update.

A working gcc 2.95/m88k compiler, for some low standard value of working.

Configuration settings mostly borrowed from the former gcc 2.8 configuration.
A few typos and fixes backported from gcc 3.3, and a hell lot of fixes from
my fingertips.

This is enough to yield a compiler which will produce correct code at -O0.
Optimization is slightly broken for some constructs, and more fixes are in
the pipeline.

ok deraadt@

The next one was the libc fix.

Fix the *longjmp() behaviour - it is legal to reuse a jmp_buf several times.
Gets us a working perl 5.8.

A few kernel fixes followed, and then people could start making fun of my engrish commit messages again.

Date: Fri, 1 Aug 2003 23:31:06 +0000
From: Miod Vallat
To: "Todd C. Miller"
Subject: Re: CVS: cvs.openbsd.org: src

> > CVSROOT:    /cvs
> > Module name:        src
> > Changes by: miod@cvs.openbsd.org    2003/08/01 17:15:31
> >
> > Modified files:
> >     sys/arch/mvme88k/mvme88k: pmap.c
> >
> > Log message:
> > The pmap potpourri du jour, while hunting for evil bugs:
>
> "potpourri du jour"?  I thought this was an English list ;-)

Like "pmap" is an english word! (-:

Date: Fri, 1 Aug 2003 23:56:19 +0000
From: Miod Vallat
To: Xavier Santolaria
Subject: Re: CVS: cvs.openbsd.org: src

> > Modified files:
> >     sys/arch/mvme88k/mvme88k: pmap.c
> >
> > Log message:
> > The pmap potpourri du jour, while hunting for evil bugs:
>            ^^^^^^^^^^^^^^^^^
> > - de-cretinize pmap_testbit() and pmap_page_protect()
>     ^^^^^^^^^^^^
>
> ca c'est du log!! :)
[now that's some log!]

Je ne vois pas ce qu'il a de particulier. C'est de l'anglais moderne
correct...
[I don't see anything special in it. It's correct modern english...]

I was back and feeling much better, but gcc 2.95 was still in a recovery state. Every time I tried to enable optimization to some level, I would eventually end up, after having gcc recompile itself no more than a couple times, with a compiler which would either mysteriously fail in surprising ways, or surprisingly fail in mysterious ways, or both.

After disabling every optimizing feature under the sun, I identified the culprit.

Date: Wed, 6 Aug 2003 21:35:33 +0000
From: Miod Vallat
To: Theo de Raadt, Marc Espie
Subject: m88k -O1 workaround

The big picture: libgcc, on m88k, contains supposedly optimal block move
functions for small (less than a few hundred bytes) structures. The m88k
gcc backend will then try to generate calls to these routines whenever
possible, and revert to regular bcopy or memcpy otherwise.

It turns out that, the way this is written, the logic responsible to
invoke those functions is flawed in gcc 2.95 (and probably since the
beginning, in fact), and will sometimes miscompute the destination
address, resulting in garbage and problems.

I think I can fix this in about 8-10 gcc recompilations. That's 20-25
hours. In the meantime, as I would like to make some progress, I have
decided to simply prevent these functions from being used, hence the
ugly #if 0 ... #endif change below.

My plans on the long turn, after fixing this, are to add a
machine-specific option to control whether these routines should be
used, or not. I plan to NOT use these routines for stand/ (so as to get
rid of libgcc.a in the link phase) and for the kernel (which comes with
a heavily optimized bcopy() routine.

But in the meantime, before I enable -O1 by default on m88k (which will
wait until a real make build finishes here, as I am still running some
-O0 binaries), I would like to get this workaround in.

Objections?

Miod

Index: m88k.c
===================================================================
RCS file: /cvs/src/gnu/egcs/gcc/config/m88k/m88k.c,v
retrieving revision 1.2
diff -u -p -r1.2 m88k.c
--- m88k.c      2003/08/01 07:40:19     1.2
+++ m88k.c      2003/08/06 19:02:46
@@ -515,6 +515,7 @@ expand_block_move (dest_mem, src_mem, op
     block_move_sequence (operands[0], dest_mem, operands[1], src_mem,
                         bytes, align, 0);

+#if 0  /* XXX */
   else if (constp && bytes <= best_from_align[target][align])
     block_move_no_loop (operands[0], dest_mem, operands[1], src_mem,
                        bytes, align);
@@ -523,6 +524,7 @@ expand_block_move (dest_mem, src_mem, op
     block_move_loop (operands[0], dest_mem, operands[1], src_mem,
                     bytes, align);

+#endif
   else
     {
 #ifdef TARGET_MEM_FUNCTIONS

I was actually way too optimistic in my fix estimate, as I have never been able to make these routines work correctly. Even though their constraints look correct to me, when enabled, there eventually are situations (in large code blocks) where using these routines corrupts a register. (I eventually removed the ability to invoke those routines completely years later, admitting defeat.)

The workaround I suggested in that mail got committed the day after.


And then, after more than 5 years, I succeeded in building a complete OpenBSD/mvme88k snapshot! The port was back from knocking on death's door to being barely alive, but that was nevertheless an important milestone.

Date: Sun, 10 Aug 2003 01:29:21 +0000
From: Miod Vallat
To: Paul Weissmann
Subject: mvme88k

I am uploading a snapshot, right now. bsd.rd and bootblocks work for me,
so you should be able to create a bootable tape very soon. Probably on
monday (it will need to be put on the ftp site and then the mirrors will
carry it).

Miod

There was still a lot of work to do before the system could be considered stable and reliable, though, as can be seen from this follow-up mail.

Date: Mon, 11 Aug 2003 21:40:19 +0000
From: Miod Vallat
To: Paul Weissmann, "Luke Th. Bullock"
Subject: mvme88k snapshot howto

Hello guys,

 since you will probably be the only two people interested in the
OpenBSD/mvme88k new snapshot, here are roughly what I intended to post
to misc@, until i saw there was already something on deadly.org.

- installation notes on the snapshot are outdated. If you have an
  OpenBSD-current source tree,
    cd /usr/src/distrib/notes && make M=mvme88k
  will produce a better INSTALL.mvme88k file.

- this is not for production use. Definitely not. There are kernel bugs
  left to fix before this can be considered worth using beyond testing.

- only 187 works at the moment. 197 has never been really working (at
  least on my boards), and 188 is broken by a side effect of a commit
  made in april 2001 (yes, that's the truth), which I am currently
  debugging.

- killer bug #1: vi.recover and sendmail will not work. So, after
  installing the snapshot, boot -s on the next boot, mount filesystems
  by hand, set TERM to the correct value, and edit /etc/rc to comment
  out the vi.recover lines, and /etc/rc.conf to set sendmail_flags to
  NO. If you don't, the machine will never finish a multiuser boot, and
  will panic instead.

- killer bug #2: when a process exits, or a new one is started (I am not
  sure of which of both conditions is the killer), the system may
  completely freeze. Even the abort switch on the board will have no
  effect. The only thing to do is to use the reset switch or power
  cycle. This can happen as soon as during the multi user boot, or after
  one week uptime. I have no real clue at the moment to fix this bug.

- don't use gcc -O2. The default make settings will use -O1, which is
  known to work (with perhaps some subtle breakage I have not
  experienced yet). However, last time I tried -O2, I ran into a lot of
  problems. I am working on this.

Miod

The history of kernel changes around that time shows that I had a hard time figuring out the proper sequence in which to process DAE, short for Data Access Exceptions.

The exception model of the 88100 is quite rustic, and exposes a lot of the execution pipeline state. As a consequence of this, when some exception occurs, there may also be pending load or store operations, which the operating system kernel servicing the exception is expected to perform as part of the exception processing, because, on return from the exception, the execution pipeline starts afresh, with these pending operations having been discarded.

The fun starts when performing these operations themselves causes exceptions (for example, an invalid pointer dereference), or if they are part of a data exception fault.

It took me quite some time to find the right order in which to service these. And since there were other bugs interfering with that, sometimes fixing a bug in one area would suddenly expose another, completely unrelated bug, and it was sometimes difficult to figure out whether the last changes were regressions or only exposing something new.

There were also a lot of cleaning tasks overdue, as shown in this mail, shortly before the 3.4 release cutoff.

Date: Tue, 9 Sep 2003 12:12:36 +0000
From: Miod Vallat
To: Theo de Raadt
Subject: mvme88k snapshot and todolist


I am uploading, right now, a more recent 3.4 snapshot to cvs.

Here is also my todolist for m88k, in case you're interested. I am
trying to keep [out of] sys/arch/mvme88k/include whenever possible.

Miod

URGENT (needed for 3.4)

* m8820x_cmmu_init()
  -- not (| ! foo) but (& ~foo). Will really help 188.
  -- may need an api change for m8820x_cmmu_set().
* test 188 and 188A with all the HyperModule configurations ASAP
  -- my 188@20 SYSCON needs a new NVRAM

URGENT TOO (won't make 3.4, but I wish it could have)

* libpthread and arla MD threading bits
  -- libpthread done, but with bugs

LESS URGENT

* Kill the bitfields, use constants
  -- m88100.h dmt_reg
  -- mmu.h cmmu_apr batc_entry_t
  -- psl.h psr
* vector_init()
  -- the 4 NOPs appear to be unnecessary
* db_machdep
  -- we need PC_REGS inst_return and inst_call in a separate file suitable
    for non-DDB kernels. See trap.c
  -- PC_REGS() is inlined in different places (m88k_pc) . Factorize
  -- NOISY2 and NOISY3 not used. NOISY used only once
  -- l_PC_REGS and pC_REGS are not used
  -- remove __GNUC__ test from db_machdep.h
* sigreturn()
  -- compare to sendsig()
* rewrite copystr() in assembly?
* only print master CPU# if more than one

ANY TIME

* remove TODO and syscall.stub
* put back M88100 and M88110 defines in the kernel configuration files, a la
  m68k. Will help files.conf, and make better dependencies...
* ... as we add an M88410 define to compile with 88410 support (197SP/DP only).
* kill locore_c_routines.c. Move DAE code to dae.c (depends: M88100), and the
  rest to machdep.c
* constify string tables

COSMETIC

* typos to fix (grep the whole tree)
  -- proccessor
  -- PAMP
  -- defintions
* no need to initialize cpuspeed
* g/c getscsiid()
* BROKEN_MMU_MASK
  -- is the behaviour correct? can it be determined at runtime?
* *_BREATHING_ROOM unused
* autoconf.h
  -- always_match() does not exist
* CLKF_INTR
  -- theoretically only if frame points to interrupt trap. intstack not reliable
* cpu_number.h
  -- provide short version ifndef MVME188, and KNF. But what about 197DP?
* cpus.h
  -- no need to include in cmmu.c and m88110.c
* locore.h
  -- fubail/subail not used anymore
  -- db_are_interrupts_disabled move to db_machdep.h?
* PATC_ENTRIES unused
* mvme*.h
  -- remove U(), we're in ansi world nowadays
* mvme197.h
  -- also 187->197
* KNF, KNF, KNF (ugh)

All of this did not prevent me to give MVME197 boards a try.

Date: Wed, 10 Sep 2003 10:37:35 +0000
From: Miod Vallat
To: Theo de Raadt, Steve Murphree
Subject: 197LE status

Short version:

  Userland still does not work and I don't know how to make it
  work at this point.

Long version:

  With the help of an extremely verbose trap handler and syscall
  routines, I came to the following results:

* init gets correctly scheduled, and starts executing. The kernel will
  fault for every new page accessed, and uvm_fault() will do the
  necessary page shuffling to load the pages from disk.

* everything goes well until a syscall fails.

  The syscall code ends up, in userland like this:
     tb 0, r0, 128         ! invoke system call
     br _C_LABEL(_cerror)  ! handle error
     jmp r1                ! handle success (return)

  If the syscall is successful, exip is moved to point to the "jmp r1"
  instruction. If it is unsuccessfull, exip is moved to point to the "br
  _cerror" instruction.

  The first instruction at _cerror puts the hi16 of the address of errno
  in a register. Since _cerror is in a text page which has not been
  accessed yet, I would expect to get an instruction access trap, reason
  page fault, at _cerror. However, I get a data access trap, reason page
  fault, wanted address 0 !!!

  I thought initially that returning from a trap to a br instruction
  would let the processor think that the br is safe, so it would not
  check the address and fault to let us pick the page. However, this is
  false, the br would execute and I would only get a trap later; plus I
  built a libc and init with the syscall trampoline changed to

    tb 0, r0, 128
    br 1f
    jmp r1
   1:
    br _C_LABEL(_cerror)

  and still got the same result.

  This may, of course, be a pmap problem; however, forcing the _cerror
  page to be fetched with

    extern void _cerror(int):
    _cerror(42);

  early in main() will cause the page to be fetched, and still the
  problem to occur.

  This is not a problem related to having fetched more than N pages, as
  commenting out syscalls known to fail can make me go much further in
  init.

  Yet, as soon as I let a syscall fail, I'm dead. I tried flushing
  caches, invalidating tlb, etc, to no avail.


If you have any idea on what to try or look at, in order to pass this
hurdle, I'd love to hear about it.

Miod

After the 3.4 release binaries were built (the first real OpenBSD/mvme88k release!), I resumed working on the MVME188 support, since I knew it had been working sometime in the past.

The #1 reason why it would no longer work is that the clock interrupts were not being acknowledged correctly. This was fixed in a series of two commits on september 28th.


In november, Steve Murphree made an interesting discovery...

Date: Thu, 20 Nov 2003 02:03:49 -0800
From: Steve Murphree
To: Miod Vallat
Subject: mc88110

Miod,

I've been playing with a MVME197DP unit that I picked up.  It has SVR4 on it
with a complete development system on it.  Finaly, I have found the 'filter'
they use to preprocess asm files in order to avoid the errata of the
mc88110!  The program is called siff.  There are 2 scripts that integrate
into the compiler environment, asfilter0 and asfilter1.  They run siff
with -fj and -fz, respectively.  So it seems that those 2 cases are the ones
that they were most worried about.  I have also provided the list of options
and descriptions from siff.  With this, we might be able to dulpicate the
funtionality of siff for OpenBSD.  Possibly, it can be integrated into the C
compiler itself.  Contrary to the option output of siff, there is no man
page.

I tried running eh.S through siff, but it only accepts SVR4 88k asm comment
syntax. :(  And, it is designed to be used after any preprocessor macros
have been expanded.  Basically, the .s output of a compiler.

Steve

The first script, asfilter0, had this content:

#!/bin/sh
#ident "@(#)asfilter0 4.1.1.1 13 Apr 1993 "
#
#     (C) COPYRIGHT 1993 MOTOROLA, INC.
#         ALL RIGHTS RESERVED
#
#     THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF MOTOROLA, INC.
#     The copyright notice above does not evidence any actual or
#     intended publication of such source code.
#
# This asfilter script is intended to be used to resolve
# the 88110, chip revision 3.2, Errata #14:
#
#  "A multicycle instruction with r1 as the destination which
#   is followed later by a store of r1 followed later by a
#   bsr/jsr may cause the store data to be incorrect if the
#   bsr/jsr executes before the writeback of the multicycle
#   instruction. The scoreboard bit for r1 is not checked in
#   all circumstances before the execution of the bsr/jsr.
#
#   Workaround: Ensure that the bsr cannot issue until a writeback
#   by the multicycle instruction is forced to complete. For
#   example, precede affected bsr/jsr instructions with
#   or r0,r1,r0 to force the multicycle instruction to complete
#   before the bsr can execute."
#
#
BINDIR=/usr/ccs/bin
SIFF=$BINDIR/siff    # Silicon Filter binary
SIFF_FLAGS="-q"     # Start with quiet mode
SIFF_FLAGS="$SIFF_FLAGS -fj"                    # Turn -M 4E93 filter on
#
$SIFF $SIFF_FLAGS -o $2 $1

(let me laugh for a few seconds regarding the "UNPUBLISHED" part of the copyright notice, given that file was apparently installed on all System V/88 installations as part of the development tools package.)

The other script, asfilter1, was identical, but invoking siff with the -fz option, and was documented to address that errata:

# This asfilter script is intended to be used to resolve
# the 88110, chip revision 4.1, Errata #3:
#
#  "Under certain conditions, a bcnd instruction may resolve
#   incorrectly. Consider the following scenario: two single-
#   cycle integer operations are issued on the clock cycle
#   before the bcnd instruction is issued. This will result in
#   both integer operations writing data back when the bcnd is
#   executed. In this case, if the results of these two integer
#   instructions contain many "1" bits, it is possible that the
#   bcnd operation may be resolved incorrectly.
#
#   Workaround: Precede a bcnd[.n] instruction with a pair of
#   'or r0,r0,r0' instructions."

Murphree also pasted the siff help messages:

Options for siff :   ( Usage: siff [options] <Assembly File Name> )
==========================================================================
 -o <filename>  : Specify Output File. (Default: <filename - '.s'>.siff.s)
 -r             : Remove Comments.
 -V             : Print out the tool's name and version.
 -q             : Quiet Mode - No output messages
 -O             : List Options.
 -f<c>          : Set a specific filtering option On.
                  <c> should be set according to the 'Filter Options'
                  list below.
 -w<c>          : Set a specific warning option On.
                  <c> should be set according to the 'Warning Options'
                  list below.
 -G <Register>  : Specify a scratch general purpose register.
 -C <Register>  : Specify a scratch control register.
 -M <Mask Name> : Set filtering options On for a given mask set.
                  <Mask Name> should be set according to the 'Mask Settings'
                  list below.

Filter Options:
 -fd : NOP Insert - Single/Double Dest. Gen. Reg. of same even value.
 -fn : Keep stores out of the delay slots of bsr.n and jsr.n
 -fp : Repeat an ldcr rD, XPPU if not preceded by an ldcr rD, XPPL
instruction.
 -f3 : Follow all ldcr instructions with a branch past 3 no-ops.
 -fs : Insert a nop between any combination of ldcr/stcr instructions.
 -fm : Insert a stcr r0, XSR before the stcr rS, XCMD part of a probe
command.
 -fy : Pull all ldcr|stcr|xcr instr. out of delay slot of bsr|jmp|jsr &
insert align before it.
 -ft : Insert an (or r0,r0,rD / mov x0,xD) before a store, where rD/xD is
the store's dest reg.
 -fq : Insert a flush ICACHE instruction before an rte instruction.
 -fr : Insert a flush ICACHE instruction sequence before an rte instruction.
 -fi : Insert a flush load before all xmem instructions.
 -fg : Precede a load extended of rd with a load single of rd to the same
address.
 -fl : Follow a ld.x ra,rb,rc with a ld.b r0,rb,rc (Touch Load).
 -ff : NOP Insert - Dest. Registers with = numbers but from diff. files
 -fe : Follow ld.b[.usr]/ld.h[.usr] instructions with a signed extract.
 -fx : Follow all xmem instructions with a trap not taken.
 -fa : Insert a trap not taken after a carry out that's followed by a carry
in.
 -fc : Precede add.co/sub.co instructions with a trap not taken.
 -fb : Replace bcnd[.n] with cmp and bb0[.n].
 -fw : Insert nops after a st.wt to keep loads at least 2 instructions away.
 -fh : Insert nops after an alloc. load to keep stores >=2 instructions
away.
 -fj : Insert an or r1,r1,r1 before all bsr[.n] instructions.
 -fz : Insert two or r0,r0,r0 before all bcnd[.n] instructions.
 -fk : Insure that the rD of the 2nd instr. preceeding a cond. branch != its
rS1
 -fo : Insert a nop between a Arith. or logic instr. and store that both
have rD=r0.

Warning Options:
 -wP : Issue warning if an ldcr rD, XPAR instruction is encountered.
 -wD : Issue warning if st.d w/ an odd general purpose dest. reg. is found.
 -wC : Issue warning if an add/sub[u].c[io] instruction is encountered.
 -wW : Issue warning if a store with the .wt option is encountered.
 -wX : Issue warning if st.x is encountered.
 -wF : Issue warning if an fmul with unequal source specifiers is
encountered.

Mask Settings:

 -M 0d18w :
     Default Filters (-f?) : n, 3, x
     Non Default Filters (-f?) :
     Warnings (-w?) :

 -M 1d18w :
     Default Filters (-f?) : a, b, c, d, e, f, g, h, j, k, l, m, n, o, p, r,
s, t, w, x, y, 3
     Non Default Filters (-f?) : i
     Warnings (-w?) : C, D, F, P, X

 -M 0e98b :
     Default Filters (-f?) : x, q, l, g, b, y, j, k, o
     Non Default Filters (-f?) :
     Warnings (-w?) : W, P

 -M 1e98b :
     Default Filters (-f?) : x, q, l, g, b, y, j, k, o
     Non Default Filters (-f?) :
     Warnings (-w?) : W, P

** Please refer to the man page for a more detailed description of siff **

Despite what the last line said, and as mentioned in Murphree's email, there was no manual page for that binary to be found, at least on end-user systems. Maybe development environments internal to Motorola had a proper manual page installed.

Murphree then started on implementing his own version of siff, based on the description of every option in the help output. He shared that code with me late november.

Date: Thu, 27 Nov 2003 04:51:49 -0800
From: Steve Murphree
To: Miod Vallat
Subject: new siff

Miod,

I got out my green book and figured out some stuff.  Most things are
implemented now.  I hacked up a version of locore.S to look more like a .s
file and processed it.  All I can say is "wow".  This could lead to
something.  The more I looked at what some of the filters do, the more I
think we were in a losing situation before.

Steve
(The green book here, refers to the 88110 user's manual.)

Manuals for various chips of the 88000 series.

I was quite worried about having to perform intrusive post-processing of the compiler output, and also at the impact, in code size and speed, it could have on the existing 88100 systems. Also, given how far the kernel would run on a 88110, I really thought what prevented userland from working was a kernel bug, rather than a processor errata. The fact that the first script was targetting an errata for the version 3 of the 88110, while all the errata information I had was for versions 4 and 5, with all my 88110 hardware being also versions 4 or 5, made me believe that these changes might be overkill and that we would have a chance to run without them.

(To this day, I still don't know if some of these changes would have helped - 88110-based systems, at least the MVME197LE, run reliably, except when sometimes they suddenly freeze, without being able to enter the kernel debugger, and I don't know whether this is a software bug or a bad combination. Maybe I should experiment a bit with siff on a rainy day...)


While the quest towards stability kept me busy, I nevertheless started to work on other improvements to the system. In late december, I significantly improved the MVME376 Ethernet driver, adding support for all Motorola board configurations, and reusing as much of the machine-independent AMD Lance Ethernet code as possible.

Early 2004, as I did not have a copy of the 88110 manual yet, and it had not been scanned on bitsavers yet, I asked Theo de Raadt multiple times for excerpts from the manual. He started taking pictures of pages of the book with a digital camera and sending them to me. That was better than nothing, but unfortunately did not allow me to make much progress on MVME197 support.

Date: Wed, 14 Jan 2004 23:22:53 +0000
From: Miod Vallat
To: Theo de Raadt
Subject: 197

It's better. Init does not die. It won't spawn process correctly either:

197-Bug>bo 6 0 -s
Booting from: VME328, Controller 6, Drive 0
Loading: -s

Volume: M88K

IPL loaded at: $009F0000
Boot: bug device: ctrl=6, dev=0

bootxx: first level bootstrap program [$Revision: 1.1 $]

\
>> OpenBSD/mvme88k bootsd [$Revision: 1.2 $]
2007040+139264+255920+[75840+91275]=0x27347f
Start @ 0x10020 ...
Controler Address @ ffff9000 ...
[ using 167115 bytes of bsd a.out symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2004 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 3.4-current (GENERIC) #532: Wed Jan 14 23:00:16 GMT 2004
    miod@ramade.gentiane.org:/usr/src/sys/arch/mvme88k/compile/GENERIC
real mem  = 67104768
avail mem = 59047936 (14416 pages)
using 844 buffers containing 3457024 bytes of memory
mainbus0 (root): Motorola MVME197, 50MHz
cpu0: M88110 version 0x3
bussw0 at mainbus0 addr 0xfff00000: rev 1
pcctwo0 at bussw0 offset 0x42000: rev 0
clock0 at pcctwo0 ipl 5
nvram0 at pcctwo0 offset 0xc0000: MK48T08 len 8192
cl0 at pcctwo0 offset 0x45000 ipl 3: console
ssh0 at pcctwo0 offset 0x47000 ipl 2: version 2 target 7
scsibus0 at ssh0: 8 targets
vme0 at pcctwo0 offset 0x40000: vector base 0x80, system controller
vme0: using BUG parameters
vme0: 1phys 0x04000000-0xefff0000 to VME 0x04000000-0xefff0000
vme0: 2phys 0x00000000-0x00000000 to VME 0x00000000-0x00000000
vme0: 3phys 0x00000000-0x00000000 to VME 0x00000000-0x00000000
vme0: 4phys 0x00000000-0x00000000 to VME 0x00000000-0x00000000
vme0: vme to cpu irq level 1:1
vmes0 at vme0
vs0 at vmes0 addr 0xffff9000 vec 0x80 ipl 2: target 7
scsibus1 at vs0: 8 targets
sd0 at scsibus1 targ 0 lun 0: <COMPAQPC, DCAS-32160, S6CA> SCSI2 0/direct fixed
sd0: 2006MB, 8188 cyl, 3 head, 167 sec, 512 bytes/sec, 4110000 sec total
vmel0 at vme0
ie0 at pcctwo0 offset 0x46000 ipl 1: address 08:00:3e:22:db:21
boot device: sd0
root on sd0a
rootdev=0x400 rrootdev=0x800 rawdev=0x802
Enter pathname of shell or RETURN for sh:
Jan 14 23:18:55 Enter pathname of shell or RETURN for sh:
Jan 14 23:18:59 Enter pathname of shell or RETURN for sh:
Jan 14 23:19:01 Enter pathname of shell or RETURN for sh:
Jan 14 23:19:03 Enter pathname of shell or RETURN for sh:
Jan 14 23:19:06 Enter pathname of shell or RETURN for sh:
Jan 14 23:19:11 Enter pathname of shell or RETURN for sh:

Not able to make progress on MVME197 yet, I diverted my attention back to the compiler.

Date: Fri, 13 Feb 2004 09:43:02 +0000
From: Miod Vallat
To: Hiroaki Etoh
Subject: Help on a gcc problem

Hello,

  I am still fighting a code generation bug in gcc 2.95 on mvme88k,
which only happens at -O2.

  Unfortunately, I am quite stuck at the moment, so I figured I could
ask you for help, you always have good advice...


  Before I expose the problem, here is a quick reminder of the m88k
calling convention and register usage:

  r0            always zero
  r1            return address
                (i.e. you return from a routine with "jmp r1")
  r2-r9         function parameters (if there are more parameters,
                the remaining ones are passed on the stack)
  r10-r13       non-preserved registers
  r14-r25       preserved registers (callee must preserve them)
  r26-r29       reserved by the ABI (needed for PIC code, etc)
  r30           frame pointer
  r31           stack pointer


  The problem I see now, happens in a non-leaf function with a lot of
living local variables. In this case, the register allocator table
(REG_ALLOC_ORDER in config/m88k/m88k.h) will eventually suggest using
registers from the r2-r9 range, in my test case r8 and r9. This does not
cause a conflict, because this routine uses only r2-r4 as parameters.

  Unfortunately, gcc will not produce code to initialize these
registers!

  I was wondering if this was related to the fact that these registers
match FUNCTION_ARG_REGNO_P, contrary to the other registers; but then,
architectures such as arm always use unused parameter registers as
temporaries, and do not have such a problem.

  Compiling with -O1 does not trigger the problem, only because the
generated code uses less registers as temporaries, and especially not
the two uninitialized registers.

  You can find my current testcode on gentiane, in ~miod/test.c - I also
have gcc -S output with -O1 and -O2 as test.s1 and test.s2 respectively.
If you need to tinker with gcc on an mvme88k machine, you can login to
"arzon".

Thanks for any ideas you could have on this!

Miod

(You might remember Hiroaki Etoh as the stack-protector author; he had also been an invaluable help with some gcc bugs.)

This was not enough to help him figure out the cause of the problem, until my experiments with optimization options allowed me to narrow the scope a lot.

Date: Thu, 18 Mar 2004 17:19:01 +0000
From: Miod Vallat
To: Hiroaki Etoh
Subject: Re: Help on a gcc problem

Hello,

  remember by m88k gcc -O2 problem? It turns out it is caused by -O2
implying -fcaller-saves. Compiling with -O2 -fno-caller-saves produces
correct code.

  I'll have a closer look at the caller-saves logic soon...

Miod

And then, this immediately rang a bell with him, which in turn allowed me to find the appropriate bugfix in the gcc 3.0 sources.

Date: Fri, 19 Mar 2004 10:17:13 +0000
From: Miod Vallat
To: Hiroaki Etoh
Subject: Re: Help on a gcc problem

> I remember the problem is caused by the bug of the register analysis on
> the architecture where a register is related to the other register. hmm, I
> can't explain well. let see register 8 and 9 on m88k.  DImode r8 contains
> SImode r8 and r9. So, r9 is damaged after the set of DImode 8.

Apparently, revision 1.32 of caller-save.c fixes this:

  * caller-save.c (mark_referenced_regs): Mark partially-overwritten
  multi-word registers.

this was commited between 2.95 and 3.0.

I will try to build a compiler with this change backported, and check
how it behaves.

In the meantime I'll force -fno-caller-saves on m88k so that we can
still benefit from other -O2 optimizations.

Miod

Date: Fri, 19 Mar 2004 22:48:02 +0000
From: Miod Vallat
To: "Luke Th. Bullock", Paul Weissmann, Kenji Aoyama
Subject: gcc/m88k -O2 fix you might want to play with...

Hello,

  With the help of Hiroaki Etoh, I eventually found the -O2 killer bug
on m88k. It turns out this is a genuine gcc 2.95 bug, which was fixed in
gcc 3.0, but is triggered more easily on m88k than other architectures.

  If you are tracking OpenBSD/mvme88k, you'll need to revert my last
gcc/config/m88k/m88k.h change, which forces -fno-caller-saves at all
optimization levels (-O2 and -Os imply -fcaller-saves). I commited it so
that the OpenBSD/mvme88k 3.5 release would ship with a compiler able to
produce correct -O2 code, though it will still use -O1 by default. If
you're not, your m88k.h is probably clean!

  Once the release is over, I'll revert this change, commit the
following diff, and enable -O2 on mvme88k by default.

  Here's the diff - a straight backport from gcc 3.0. Apply in
gnu/egcs/gcc/, and recompile gcc. Then you can safely use -O2 (until the
next subtle bug is found).

  If you're curious, I have also attached a simple test program,
gccregtest.c, derived from the libc's strtoll() code, which will dump
core on m88k when compiled at -O2 by an unfixed gcc. If you have tried
to compile your libc at -O2, recompiled sshd afterwards, and found you
could not login through ssh anymore, you know what I mean (-:

Have fun,
Miod

Index: caller-save.c
===================================================================
RCS file: /cvs/src/gnu/egcs/gcc/caller-save.c,v
retrieving revision 1.1.1.1
diff -u -p -r1.1.1.1 caller-save.c
--- caller-save.c       1999/05/26 13:34:01     1.1.1.1
+++ caller-save.c       2004/03/19 15:25:16
@@ -504,7 +504,14 @@ mark_referenced_regs (x)
       x = SET_DEST (x);
       code = GET_CODE (x);
       if (code == REG || code == PC || code == CC0
-         || (code == SUBREG && GET_CODE (SUBREG_REG (x)) == REG))
+         || (code == SUBREG && GET_CODE (SUBREG_REG (x)) == REG
+             /* If we're setting only part of a multi-word register,
+                we shall mark it as referenced, because the words
+                that are not being set should be restored.  */
+             && ((GET_MODE_SIZE (GET_MODE (x))
+                  >= GET_MODE_SIZE (GET_MODE (SUBREG_REG (x))))
+                 || (GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)))
+                     <= UNITS_PER_WORD))))
       return;
     }
   if (code == MEM || code == SUBREG)

For the people interested, the test program was:

/*
 * This is a ``simple'' test program, derived from OpenBSD libc's strtoll()
 * function, which exposes a bug in gcc 2.95 -fcaller-saves feature on m88k.
 *
 * When compiled with -fcaller-saves, this program will dump core on m88k,
 * unless caller-save.c has the 1.32 revision fix, which was available in
 * gcc 3.0 onwards.
 */

#include <sys/types.h>

#include <ctype.h>
#include <errno.h>
#include <limits.h>
#include <stdlib.h>

u_quad_t __qdivrem(u_quad_t, u_quad_t, u_quad_t *);
quad_t my__divdi3(quad_t, quad_t);

/*
 * This is a slightly simplified version of strtoll(), as found in
 * /usr/src/lib/libc/stdlib/strtoll.c
 *
 * It will not always produce correct results anymore ! Its purpose
 * is _only_ to trigger the bug!
 *
 * Most of the code has been kept in order to:
 * - have a lot of _living_ local variables
 * - invoke __divdi3
 */

long long
mystrtoll(const char *nptr, char **endptr, int base)
{
        const char *s;
        long long acc;
        long long cutoff;
        int c;
        int neg, cutlim;

        s = nptr;
        do {
                c = (unsigned char) *s++;
        } while (isspace(c));
        if (c == '-') {
                neg = 1;
                c = *s++;
        } else {
                neg = 0;
                if (c == '+')
                        c = *s++;
        }

        cutoff = neg ? LLONG_MIN : LLONG_MAX;
        cutlim = cutoff % base;

        /*
         * The following statement will be invoked with incorrect value
         * in the second parameter when compiled with -O2
         */
        cutoff = my__divdi3(cutoff, (quad_t)base);

        for (acc = 0;; c = (unsigned char) *s++) {
                if (isdigit(c))
                        c -= '0';
                else
                        break;

                if (neg) {
                        if (acc < cutoff || (acc == cutoff && c > cutlim)) {
                                acc = LLONG_MIN;
                        } else {
                                acc *= base;
                                acc -= c;
                        }
                } else {
                        if (acc > cutoff || (acc == cutoff && c > cutlim)) {
                                acc = LLONG_MAX;
                        } else {
                                acc *= base;
                                acc += c;
                        }
                }
        }
        return (acc);
}

int
main(void)
{
        return mystrtoll("42", NULL, 10);
}

quad_t
my__divdi3(a, b)
        quad_t a, b;
{
        u_quad_t ua, ub, uq;

        ua = a;
        ub = b;
        /* printf("qdivrem(%lld, %lld)\n", ua, ub); */
        uq = __qdivrem(ua, ub, (u_quad_t *)NULL);
        return (uq);
}

(I guess it's sort of a good thing I keep archives of too many things, because, more than 20 years later, I have absolutely no recollection of that compiler bug at all.)

(Follow this link to go forward to the next part.)