the [...] MULU/MULS instructions are very slow, taking about 70 cycles. The exact number of cycles is data-dependent because the microcode uses an iterative algorithm.)

So, (assuming that pndc is correct and the 68000 is microcoded). What "iterative algorithm" is this? I can imagine something naive like

```
(let (mulu_impl (lambda (ct x y)
(if (eq? ct 0)
y
(mulu_impl (decrement ct) x (add x y))))))
(mulu (lambda (x y) (mulu_impl x x y))
(* untested; could be full of brainfarts *)
```

which basically counts x downward, adding the old x to y each time. But this would be naive and slow for large values of x. They probably had a better trick up their sleeves.

the comp.sys.m68k faq says about the 68060:

The chip is all hardwired - there is no microcode in it.

I find this quote rather hard to believe, since I think that if you implement something so complex without microcode you end up decreasing the clock speed to allow for signal propagation delays; I could be wrong though. But if it is the case, then my guess is that `MULU`

/`MULS`

now need to be implemented differently from what I did in Lisp up there.

**How was the multiplier(s) implemented in the 68000 and the 68060?**

Dr Sheldon 09/20/2018.

Although none of my databooks directly state how multiplication is done, we can infer several things from what they do say.

My copy of *M68000 8-/16-/32-bit Microprocessor User's Manual* shows the execution times of the MC68000 multiply instructions on page 8-4:

```
MULS 70(1/0)+*
MULU 70(1/0)+*
```

where 70 is the number of processor cycles, 1 means one read cycle, 0 means no write cycles, `+`

means additional cycles to fetch the effective address should be added, and `*`

means that the number 70 is the *maximum* number of cycles. Below this is exactly the text that @DroidW quotes in his answer.

On page 10-5 of the same databook, the execution times for the MC68010 are listed:

```
MULS 42(1/0)+*
MULU 40(1/0)*
```

Note how fewer clock cycles are needed for this processor. I'm not sure why this `MULU`

instruction is missing the `+`

for effective address calculation (perhaps a typo). Also, *this* table is missing the text that @DroidW quotes in his answer.

The MC68000 indeed uses microcode (and in some cases nanocode). The processor simply doesn't have enough combinational logic to perform a complete multiplication calculation in one cycle. Instead, it repeats some basic operations, with a loop controlled by the microcode. There are no barrel shifters, just single-bit shifts.

Let's call the multiplicand *D*, the multiplier *R*, and the product *P*.

To multiply **unsigned** numbers (MULU), we perform a long multiplication algorithm:

(1) Clear the product *P* = 0.

(2) Shift *P* one bit left.

(3) Shift *D* one bit left, and check the bit shifted out (the MSB). If it is zero, skip to step 5.

(4) Add *R* to the product *P*.

(5) Repeat steps 2 to 4 for each bit of *D*.

Because the branch in step 3 jumps ahead to step 5, it makes sense that the number of cycles will be based on 2x(number of ones in the multiplicand). This is confirmed in @DroidW's quote.

In later chips such as the 68010, we can add more logic to accomplish steps 2 to 4 in parallel. Also, instead of the branch in step 3, we can mask *R* with the bit from step 3, and then always add that result to *P*. These measures reduce the number of machine cycles.

In later processors, you can unroll the loop and use a large number of adders. It uses a lot more hardware and is harder to design, but is much faster. For example, to multiply binary numbers `1101`

and `0101`

:

```
1101 value of R
1101 value of R
0000 R is masked
0000 R is masked
=======
0100111
```

To multiply **signed** numbers (MULS), we use Booth's algorithm.

(1) Set *P* to *D*. An additional bit is added to the right (a new LSB), which is initialized to zero.

(2) Examine the two least significant bits of *P*. If 01, continue to step 3. If 10, skip to step 4. Otherwise, skip to step 5.

(3) Add *R* to the upper word of *P*. Jump to step 5.

(4) Subtract *R* from the upper word of *P*.

(5) Arithmetically shift *P* one bit right.

(6) Repeat steps 2 to 5 for all bits of *D*.

As you can see, there is more work to be done when bit patterns 01 and 10 are seen. This is confirmed by @DroidW's quote.

In advanced processors, you can also unroll this algorithm using hardware.

DroidW 09/19/2018.

Not an answer, but my contribution (found here, credit to witbrock@cs.cmu.edu):

```
DIVS,DIVU - The divide algorithm used by the MC68000 provides less
than 10% difference between the best and the worst case
timings.
MULS,MULU - The multiply algorithm requires 38+2n clocks where
n is defined as:
MULU: n = the number of ones in the <ea>
MULS: n = concatanate the <ea> with a zero as the LSB;
n is the resultant number of 10 or 01 patterns
in the 17-bit source; i.e., worst case happens
when the source is $5555
```

Kind regards.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

- What do the “byte-select signals” in the 68000 do?
- How does the 68060 branch predictor work?
- Why does the Motorola 68000 series require a kernel for Linux installation
- Why were 3D games on the Amiga not faster than on similar 16 bit systems like the Atari ST

- How do I regain my momentum in research after a long period of slacking off?
- What keeps most authors writing after receiving multiple rejections?
- Is it safe to use a laptop if the CPU fan is failing?
- What would be a logical reason to explain space based families having more children than an earth based one
- Did Statistics.com publish the wrong answer?
- Is it correct to say "I was mistaking it with something"?
- Was the use of "puddle jumper" in the Stargate SG-1 episode "Moebius" a continuity error?
- Help with this riddle please
- Force evaluation of user-defined function
- Why doesn't "Ralph Breaks the Internet" have "wreck" in the title?
- Is the Clinton Foundation currently under investigation?
- Parse the Bookworm dictionary format
- Why do I always catch Bus B instead of Bus A?
- How should I ask a potential advisor why they haven't published in the last 3 years?
- linux + files & folders cleanup under /tmp
- How is the change of moment handled in a Weight and Balance calculation with a swept wing aircraft?
- Is it game-breaking if creatures are allowed to be summoned inside other creatures?
- Why does Spain government want to increase minimum wage by such a large percentage at once?
- Why do we call it "combination lock"?
- Traffic light fails to give a green to one's movement even after several cycles -- what can a driver do?
- How best to maneuver inside a large room within a space station using only arm and leg motion?
- How can I convince my manager that I work better with a vertical monitor?
- Do we "open bottles" or we "open caps"?
- Sending transactions to mining nodes only?