QB CULT MAGAZINE
Issue #4 - June 2000

CODING A COMPILER SERIES #1

Writer: Gabriel Fernandez

(Check out COMPILER.BAS which is a part of this tutorial -ed)

Hi, welcome to the first Part of the my compiler tutorial. I made this tutorial to help the 'QB Cult Magazine', which, I think is the best qbasic magazine.

Well, I think you know Gabasic, it is a basic compiler I made in qbasic, check http://gab_soft.tripod.com to download it, I hope this tut is going to leave you ready to create a compiler like(or maybe better) than Gabasic in QB.

I'm going to teach how to make a compiler that works in RealMode, because this is the mode that everyone knows how it works.

You must have assembler knowledge to read this tutorial.

Well, let's start the tutorial.

Chapter 1. First Step

First, make:- A string using DIM SHARED called Text$
- A variable using DIM SHARED called linen&

Here are the most important subs and functions (if you don't understend something, don't worry because all the subs will be explained later):

- A sub to show program errors
'SUB ShowError (ErrorNumber)'
- A lot of subs for Keywords
'SUB Keyword.NAME (Parameters)'
Example: SUB Keyword.CLS ()
- Functions to return words, and full sentences
'FUNCTION GetWord$ ()'
'FUNCTION GetFullWord$ ()'
- The math Parser(more info later)
'SUB Parser.Math (Mode)'
- The string Parser
'SUB Parser.String ()' *
- A function that will check if a text must be parse using the math parser or the string parser. This is needed for IF, LOOP
'FUNCTION Gettexttype ()' *
More functions and subs will be added later.
* = This functions are not going to be explained in this Part of this tutorial

1.1 The Main Program:

The main program should look like this:

DIM SHARED text$
DIM SHARED linen&

OPEN Inputfile$ FOR INPUT AS #1
OPEN Outasm$ FOR OUTPUT AS #2

DO WHILE NOT EOF(1)
  Line input #1, text$
  linen& = linen& + 1

  ; Here we will check keywords, subs, etc.
LOOP

CLOSE

The main program is very simple, it just read each line of the InputFile$ and it adds one to linen&(which has the number of the current line read), until we reach the end of the file. Of course we are going to add a lot of stuff to this program.

1.2 The math Parser:

Now, I will explain what a 'Parser' is.

Parser: A parser analize and generates assembler (not necesary assembler) code for math or string operations. A Parser is the 50% of the compiler.

A math parser will read and generate code of a text that only has math operations on it, like "1 + 2 / 3 + 4".

The math Parser that we will create will parse text and leave the result in AX, so, if i use the math parser with the following text:

"5 + 4 * 2 / 6"

it will leave the result('3') in AX.

Our parser will support two modes of work, the 16 bit mode(integer mode) and the 32 bit mode(long mode), when our parser works in integer mode, the asm code will use 16 bit registers(AX, BX,...), when it works in long mode, it will use 32 bit registers(EAX, EBX,...)

The Parser is very usefull to parse keywords parameters, and leave the result of the math operations on the CPU registers(AX, BX, CX, ...), to call an assembly function later. So if my text is "LOCATE 12 + 1, 2 - 1" , I will call the parser with the first parameter("12 + 1"), and the parser will leave the result in AX, then I PUSH this value(on the stack), and then I call my parser with the second parameter("2 - 1"), it will leave the result again in AX, now I POP the pushed value into BX, and now i call the "LOCATE" asm function. Here's the code that will be generated with "LOCATE 12 + 1, 2 - 1":

Mov ax, 12
Add ax, 1
PUSH AX
Mov ax, 2
Sub ax, 1
POP BX
CALL Locate

Here's the Keyword.LOCATE sub:

SUB Keyword.LOCATE ()
  a$ = getfullword$
  b$ = getfullword$
  IF a$ = "eof" OR b$ = "eof" THEN ShowError 1
  Math.Parser a$
END SUB

I'm going to explain the STRING Parser later, when we finish the Math parser(this will take a long time).

1.3 ShowError SUB

The show error sub will be called when an error is found on the program, this will make the compiler programming a lot easier.

' ShowError sub
SUB ShowError (Errornumber)

  Print "An error was found in the line: ", linen&

  SELECT CASE Errornumber
    CASE 1: PRINT "Argument-count mismatch"
    CASE 2: PRINT "Unkown command"
  END SELECT
  END
END SUB

1.4 Functions to get words and sentences

1.4.1. GetWord$

Getword$ will be a function that will return each word of Text$

- Example of Getword$:

Text$ = "PRINT A$ + B$ + 'Hello'"

Each call to Getword$ will return the next word. Here is what Getword$ will return on each call:

1st call: Getword$ will return "PRINT"
2nd call: Getword$ will return "A$"
3rd call: Getword$ will return "+"
4th call: Getword$ will return "B$"
5th call: Getword$ will return "+"
6th call: Getword$ will return "'Hello'"
7th call: Getword$ will return 'eof'
1.4.2. GetFullWord$

GetFullWord$ will return a full sentence, a sentence will be all the text before a comma (",").

So, if text$ = "LOCATE 13 + 2, 12 - col%", it will return on each call:

1st call: "LOCATE 13 + 2"
2nd call: "12 - col%"
3rd call: the text 'eof'

All this functions must return the text "eof" when the end of the line is reached. Also, when it finds a '(comment char), it must return 'eof' too, becuase there is where a comment starts.

Chapter 2. Creating a very simple compiler

Well, now we are going to add the following commands to our compiler:

- CLS (Clears the screen)
- WAITKEY (Waits until a key is pressed)
- END (Finish the program)

You must have the Getword$ and Getfullword$ functions finished, check the 'compiler.bas' for this functions.

To add keywords, we will add the code like this one to our main program:

keyword$ = getword$

SELECT CASE UCASE$(Keyword$)
   CASE AnyKeyword: Keyword.ANYKEYWORD
END SELECT

Check now the main program with the above code added:

DO WHILE NOT EOF(1)
  Line input #1, text$
  linen& = linen& + 1

  keyword$ = getword$

  SELECT CASE UCASE$(Keyword$)  ' Keywords list
    CASE "CLS": Keyword.CLS
    CASE "END": Keyword.END
    CASE "WAITKEY": Keyword.WAITKEY
  END SELECT

LOOP

Let's build the subs for CLS, END, WAITKEY:

SUB Keyword.CLS ()
  PRINT #2, "CALL CLS"
END SUB

SUB Keyword.END ()
  Print #2, "MOV AX, 4C00h"
  Print #2, "INT 21h"
END SUB

SUB Keyword.WAITKEY ()
  Print #2, "XOR AX, AX"
  Print #2, "INT 16h"
END SUB

Great! Now our compiler can do CLS, END and WAITKEY.

Of course you have to add the CLS asm routine, a CLS routine will look like this:

CLS:          ; - CLS routine
  PUSH ES
  mov ax, 0B800h
  mov es, ax
  mov di, 0
  mov cx, 2000
  mov ax, 0
  REP STOSW
  POP ES
RET

Let's create a SUB called Addasmroutines, this sub will add the asm routines at the end of our Outasm$ file.

SUB Addasmroutines ()
  ' Cls routine
  PRINT #2, "CLS:"
  PRINT #2, "  PUSH ES"
  PRINT #2, "    mov ax, 0B800h"
  PRINT #2, "    mov es, ax"
  PRINT #2, "    mov di, 0"
  PRINT #2, "    mov cx, 2000"
  PRINT #2, "    mov ax, 0"
  PRINT #2, "    REP STOSW"
  PRINT #2, "  POP ES"
  PRINT #2, "RET"
END SUB

We will call the sub Addasmroutines when our main program ends, add 'Addasmroutines' before the Closing all the file in our main program.

It is very easy to add keywords that doesn't use parameters.

CHAPTER 3. Building a math parser

We are going to make a very simple math parser right now, it will support +, *, /, -, and numbers, we aren't going to add variable support in this part of the tutorial.

Our parser will work in the following way:

* Load the first number in the text to AX, and set
CurrentOp(eration) to 1

[-- Loop

  _ Use a$ = Getword$
  _ If a$ returns 'eof', exit the sub

  * Compare CurrentOp with 1, if CurrentOp = 1 then
    -  Set CurrentOp to 2
    -  Get the operation type(+,-,/,*)
  * If not one
    -  Get number from a$
    -  Do the math operation
    -  Set CurrentOp to 1
  * END the IF block

Loop --]

Now, this is the SUB Parser.MATH, this parser is going to support only integer mode (for now).

DEFINT A-Z
SUB Parser.Math ()
  a$ = getword$
  Print #2, "MOV AX," + a$
  CurrentOp = 1
  DO
    a$ = getword$

    IF a$ = "eof" THEN Exit Sub

    IF CurrentOp = 1 THEN
      CurrentOp = 2
      SELECT CASE a$
        Case "+": MathOp = Add:  Goto label1
        Case "-": MathOp = Subs: Goto label1
        Case "/": MathOp = Div:  Goto label1
        Case "*": MathOp = Mul:  Goto label1
      END Select
      ShowError 3
    ELSE
      SELECT CASE MathOp
        Case Add:  PRINT #2, "ADD AX, " + a$
        Case Subs: PRINT #2, "SUB AX, " + a$
        Case Mul:  PRINT #2, "MOV DX, 0"
                   PRINT #2, "MOV BX, " + a$
                   PRINT #2, "MUL BX"
        Case Div:  PRINT #2, "MOV DX, 0"
                   PRINT #2, "MOV BX, " + a$
                   PRINT #2, "DIV BX"
      END SELECT
      CurrentOp = 1
    END IF
  label1:
  LOOP

END SUB

That was a very simple math parser, but it does the work. Now, you can create keywords that use parameters.

Now, let's create the LOCATE keyword.

  1. Add 'CASE "LOCATE": Keyword.LOCATE' to the keywords list.
  2. Create a sub for locate:
    SUB Keyword.LOCATE ()
      parameter1$ = getfullword$
      parameter2$ = getfullword$
      Text$ = parameter1$
      Getcharpos = 0
      Parser.Math
      PRINT #2, "MOV [Textx], ax"
      Text$ = parameter2$
      Getcharpos = 0
      Parser.Math
      PRINT #2, "MOV [Texty], ax"
    END SUB
    

Don't worry about the getcharpos variable, look the compiler.bas to understand that.

Here ends the first part of my tutorial. I hope you like it. In next part, we are going to create a great math parser, and maybe we are going to add variables support.