QB ON ACID
ISSUE #4 December 21st, 1999
_________________________________________________________________
Infix -> ASM convertor plan, by logiclrd
For every sum-of-terms, reorder it so that the longest terms come
first, e.g.:
1 + 2*3 -> 2*3 + 1
1 + (4/2 - 6*8/3) - 3*4 -> (-6*8/3 + 4/2) - 3*4 + 1
Then, switch the order of dividends with their divisors (the reason
for this will become clear later), e.g.:
4/3 -> 3/4
(6+3)/(2*x) -> (2*x)/(6+3)
Look for terms in which the non-variable factors (ie, the
coefficients) can be simplified, e.g.:
2*x/4 -> x/2
3*6 -> 18
Now convert the string to an expression tree by recursively selecting
the last operation (by order of operations) of the current substring
to be the new subtree's head. e.g.:
4*2+6 -> + -> +
/ \ / \
4*2 6 * 6
/ \
4 2
Swap the left and right children of any node whose left child is a
numeric constant and whose right child is not, e.g.:
+ -> +
/ \ / \
6 * * 6
/ \ / \
3 x x 3
Swap the left and right children of any node whose left child is a
greater power of two than its right child, e.g.:
+ -> +
/ \ / \
8 * * 8
/ \ / \
4 x x 4
Swap the left and right children of any division node whose _LEFT_
child is a power of two (since we changed 4/3 to 3/4), and take the
logarithm base 2 of the new right side, changing the node into a
"shift right" operation:
( x/8 -> 8/x -> ) '/' -> shift right
/ \ / \
8 x x 8
Now, parse the tree in postfix order to generate an RPN expression,
e.g.:
+ -> (x, 4, *, 8, +)
/ \
* 8
/ \
x 4
We're halfway there!
Replace any constant followed by a + or - operator with a special code
"add ", e.g.:
(x, 4, *, 8, +) -> (x, 4, *, add 8)
Replace any constant followed by a shift or bitwise operator into a
single list entry in a similar manner, e.g.:
(x, 3, &, 2, <<) -> (x, and 3, shift left 2)
Replace any power of two followed by a * operator with a special code
"shift left" e.g.:
(x, 4, *) -> (x, shift left 2)
DO
Replace any "shift left" operation followed by a "shift right"
operation with the corresponding single shift, e.g.:
(x, shift left 3, shift right 1, shift left 4) -> (x, shift left 2, shift left
4)
Replace any "shift left/right" operation followed by another similar
"shift left/right" operation with the corresponding single shift,
e.g.:
(x, shift left 2, shift left 3, shift right 1, shift right 2) -> (x, shift lef
t 5, shift right 3)
LOOP WHILE (changes made)
Here's the part that'll most likely require the most code: parsing the
RPN string into ASM, keeping track of what registers are which stack
entries. Treat the registers as though they were the RPN stack, using
MOV EAX,[x] instead of PUSH [x]. This way, all PUSHes and POPs are
virtual (unless you run out of registers, in which case you are forced
to PUSH one of your registers anyway -- this is unlikely, though,
unless you're dealing with a maniac of a coder!), and must be kept
track of within your conversion routine. If you have an item which is
implicated in a subtraction or addition and there is a multiplication
or a division in between the term and it's operator (i.e., the '*'
between '2' and '+' in (4, 2, 3, *, +)), then avoid using EAX or EDX
for that term. If there is an item which has only shifts, special
additions and special subtractions (not '+' or '-', just 'add 3'-type
items) between it and the next multiplication or division, place that
term into EAX. Keep in mind that modulus values can only be placed
into EDX, while the actual division to be performed implicates EAX.
These 'in-between' checks can be performed by going through the RPN
list, keeping track of the stack height at every point. Also, watch
out for changes in the order of stack variables. These are pretty much
guaranteed to be the same for every operation of a given type. I've
documented them below, to give examples. For divisions, divide EAX by
the second-to-topmost item on the stack (this is why we changed 4/3 to
3/4: 3/4 becomes 3,4,/, 4 goes into EAX, and we can do DIV EBX). E.g.:
(x, y, z, add 4, *, +)
would become:
MOV EBX, [x] ; because there is a '*' between 'x' and its '+'
MOV ECX, [y] ; 'z' is between 'y' and '*', and ECX is next available register
MOV EAX, [z] ; only 'add 4' between 'z' and '*'
ADD EAX, 4 ; here's the special 'add 4' routine
MUL ECX ; before operation, stack is: EBX->ECX->EAX, after: EBX->EAX
ADD EAX, EBX ; before operation, stack is: EBX->EAX, after: EAX
And finally, if the resulting value is not in EAX, move it there
(unless you have intelligent condition parsing code). If your last
operation is a commutative operation which uses EAX as an operand but
not as the target, switch the operands around, e.g.:
ADD EBX,EAX
MOV EAX,EBX
would become:
ADD EAX,EBX
thereby avoiding copying the value back into EAX.
You're done! (but don't forget to keep track of which registers were
destroyed, and don't forget to PUSH all the registers with values you
need when doing function calls)
By this method, this expression:
(x / 8) & 63 + 64 * ((y / 8) & 63
would become:
MOV EAX,[y]
SHR EAX,3
AND EAX,63
SHL EAX,6
MOV EBX,[x]
SHR EBX,3
AND EBX,63
ADD EAX,EBX
which, I'm sure you'll agree, is pretty damn good output.
By logiclrd