Kinem's 3D Graphics: Part 1

by kinem

In these lessons, I will explain the techniques I use to create 3d graphics engines in Basic. If you just want to start making games and applications that use 3-d graphics, this is not aimed at you. If you want to delve into the fundamentals of making your own 3-d engine from scratch, here is where I'll explain how I did it. There are other discussions of 3-d techniques in FreeBasic; think of this one not as a general tutorial, but as documentation of one particular approach.

The basics of 3d graphics will be covered, but in later installments the focus will shift towards practical programming issues and advanced buffering techniques for fairly fast, easy to use rendering of 3d objects, building towards a first person shooter / robot tactics game that I hope to create.

I'll use FreeBasic for the examples, because it's fairly easy to write fairly readable code in it and it gives me a chance to explore object-oriented features such as operator overloading, but you can use similar techniques in other languages, such as QB64 or QB 4.5.

I have written a 3d engine for FreeBasic / QB 4.5 / QB 64 but re-writing it from scratch for FreeBasic this way allows me both to make it more readable and to re-examine and streamline my algorithms.

Certain concepts from linear algebra will be introduced, such as vectors and matrices and various ways to manipulate them.

Vectors

In physics they are used to describe things such as position and velocity, which have direction. A vector indicates both 'how much' of the quantity there is (its magnitude) and its direction.

In 3d a vector is defined by 3 numbers: its components in the x,y,z directions.

It will be convenient to define a variable type called Vector to store coordinates of points in space.

Type Vector
	As double x, y, z
End Type

The components can be accessed using this notation:

dim p as vector: p.x = 10, p.y = 0, p.z = 0

The vector p now points in the x-direction, and has a magnitude of 10. If more than one component is nonzero, we'd use the Pythagorean theorem to find the magnitude.

But what direction is 'x'? The coordinate system I'll use is fairly standard:

the x direction will point towards the right side of the screen

the y direction will point towards the top of the screen

the z direction will point towards the person watching the screen

-	Projecting 3-d points to a 2-d screen:
                     wall     
                |------------(x,-z)         
                |           /
                |          /
                |         /
                |        /
                |screen / 
                |column/
                |-----o 
                |    /| 
                |   / | 
                |  /  | eye distance to screen = constant
                | /   |
                |/    |
----------------o-----------------
             (your eye) 
 
(screen column position - center column) / (eye distance to screen) = -x/z = tangent(Angle)

In the coordinate system to be used here, +x is to the right, -z is forward, +y is up, but note that the 'screen coordinate' in the y direction increases as you go _down_ the screen.

On screen 18, the horizontal coordinate ranges left to right from 0 to 639, and the vertical coordinate ranges from 0 (top) to 479 (bottom). I'll use that screen mode as screens with even more pixels tend to be too slow.

Given a point 'pt', we can find its coordinates on the screen based on the above formula:

#define scrx(pt) (centerx - scrscale * pt.x/pt.z)

Note: The - sign in the above is from -z being in front of you.

#define scry(pt) (centery + scrscale * pt.y/pt.z)

Note: A second - sign from screen y being down cancels the one from -z.

The centerx (=319.5) and centery (=239.5) values are where the middle of the screen is.

The scrscale is a scale factor that determines how things will look given the distance of your eye to the screen compared to the distance in pixels across the screen and the physical size of the screen. For a 640 x 480 pixel screen a scale factor of 500 looks ok, simulating a total viewing angle across the screen of about 65 degrees.

Note: It is important to distinguish between screen coordinates and the coordinates of a point in 3-d space. Often the letters x and y are used in both contexts, creating potential confusion. A good rule of thumb to remember is that if something is being drawn to the screen or to an image-related buffer, screen coordinates must be involved. I'll often use variables that start with s, such as sx, for screen coordinates; exceptions may occur because that's not always my habit.

Before we can show points on the screen, there's one more thing to take into account: the position and direction of the 'camera'. Instead of using the main coordinate system, we have to use a relative coordinate system, which moves along with the viewer.

To do this, subtract the viewpoint position from the point coordinates, and then rotate the coordinates to the correct angle using sine and cosine. This is not a 3-d specific topic so I will assume that you know how to do this:

	for i=1 to maxpt
	rel3d(i).x = (point3d(i).x - eye.x) * costh + (point3d(i).z - eye.z) * sinth
	rel3d(i).z = -(point3d(i).x - eye.x) * sinth + (point3d(i).z - eye.z) * costh
	rel3d(i).y = (point3d(i).y - eye.y): next

To save memory, we could use a single array to hold point data, which only has the relative coordinates. But it's better to use seperate arrays for absolute and relative coordinates. Otherwise, rotating over and over can introduce distortions in the positions. Seperate arrays will also be convenient later, when movable 3d objects are introduced.

We could now write a program to display 3d points on the screen, but a wireframe engine is a bit more interesting, and will be used for the the example program.

Projecting 3-d lines onto the screen

If the points are both in front of the viewer, it's easy to draw a line connecting them, because a straight line in 3-d space projects onto a flat screen as a straight line on the screen. That's obvious if you visualize it, but linearity of projection is the basis of most of what I'll be doing, so I'll look at the details.

Let the coordinates of the 3-d points p1 and p2 be (x1,y1,z1) and (x2,y2,z2) respectively.

The on-screen projections of these points are:

sx1 = centerx - scrscale * x1/z1
sy1 = centery + scrscale * y1/z1
 
sx2 = centerx - scrscale * x2/z2
sy2 = centery + scrscale * y2/z2

A quantity that will come in handy for drawing a line on screen is the (change in sx) / (change in sy), which is

dsx/dsy = (sx2 - sx1) / (sy2 - sy1) = -((x2/z2) - (x1/z1)) / ((y2/z2) - (y1/z1))

I may call this the "slope" - even though it's _not_ the usual kind of slope, which is

(change in horizontal) / (change in vertical)

Note: The reason I'm not using the usual slope has to do with the way screen memory is stored. When drawing a surface to a screen or buffer, it's faster to proceed using horizontal lines, because those memory locations are consecutive. To do this, we need the horizontal screen locations (sx) of the edges of the surface as a function of the vertical coordinate (sy).

For a straight line, the ratio of these changes is constant all along the line. Is that true here? Consider a point p somewhere along the 3-d line between p1 and p2. It will have 3-d coordinates

p = (x, y, z) = (x1 + a (x2 - x1),  y1 + a (y2 - y1),  z1 + a (z2- z1))

where 0 < a < 1. Its projection on screen will be

sx = centerx - scrscale * x/z = centerx - scrscale * (x1 + a (x2 - x1)) / (z1 + a (z2- z1))
sy = centery + scrscale * y/z = centery + scrscale * (y1 + a (y2 - y1)) / (z1 + a (z2- z1))

The ratio of changes in sx to changes in sy, after doing a bit of algebra and cancelling out what we can, is independent of a and equal to the previously found ratio, proving that the projection is a straight line:

dsx/dsy = (sx - sx1) / (sy - sy1) = -(z1 * x2 - z2 * x1) / (z1 * y2 - z2 * y1) 
                                  = -((x2/z2) - (x1/z1)) / ((y2/z2) - (y1/z1))

Line endpoints that can be in back of you or to the side

A more interesting question is what to do if one of the points is in back of the viewpoint (z > 0),

or to the side of it (z = 0).

In these cases, if the line appears on screen at all, the other point (call it point 1) must be in front of the viewpoint, so that gives us a place to start. Such a line will always go beyond the edges of the screen, since it leads to a point in back or to the side of the viewer. So if we know the direction of the line on the screen, we just have to draw a line starting at the projection of point 1, and going off-screen in the right direction.

The formula I derived above still works for the slope, since every point p somewhere along the line that appears on-screen will have 3-d coordinates and screen coordinates given by the above expressions. The form of the 'slope' expression which has z's being multiplied instead of divided by will come in handy, since the other endpoint could have z2 = 0.

The slope give us (change in x) / (change in y). Once we know the slope, that narrows it down to two possible directions, since it can go either way from point 1. If we use the original formula for change in x:

(sx2 - sx1) = -((x2/z2) - (x1/z1))

that was great when both points were in front of the viewer (z1 < 0 and z2 < 0) but it goes in the wrong direction when z2 > 0, and doesn't make sense at all when z2 = 0.

But if we muliply numerator and denominator of the slope by the z's, getting the form

dsx/dsy = -(z1 * x2 - z2 * x1) / (z1 * y2 - z2 * y1)

then it doesn't matter if z2 = 0. What's more, since z1 < 0, then if (and only if) z2 > 0, the numerator and denominator will have changed sign compared to the -((x2/z2) - (x1/z1)) form, meaning the numerator now gives us the right direction for (change in x) no matter what z2 is:

change in x is proportional to -(z1 * x2 - z2 * x1), change in y is proportional to (z1 * y2 - z2 * y1)

Now we know the starting point (sx1, sy1) for the on-screen line and the direction of the line, but what is the actual change in x (or y)?

If point 2 is in front of you, of course the line just goes to (sx2, sy2).

If the point is in back of you or to the side, the length of the line is however long it needs to be to go off-screen. We could find the exact point where it reaches the edge of the screen and stop there; we'll need to do that if we are finding the edges of a tile for use in drawing a surface. But for now, it's easier to just multiply the expressions for "change in x" (or y) by some large number; any number will do as long as the line goes off-screen, and the limits of the line drawing command are not exceeded.

In the example program this is implemented as follows:

sub wire(byval p1 as vector, byval p2 as vector, byref c as uinteger)
dim as integer num,den
 
	if cint(p1.z)>=0 and cint(p2.z)>=0 then exit sub 'both points to side or in back of you
 
	if cint(p1.z)<0 and cint(p2.z)<0 then
 
		line (scrx(p1),scry(p1))-(scrx(p2),scry(p2)),c 'both points in front
 
	else 'p1 or p2 is in back or to the side of you
    
		if cint(p1.z) >= 0 then swap p1,p2 'Now p1.z < 0, meaning it's in front of you.
		'This swap is why p1 and p2 were sent to the sub byval instead of byref
 
		num = -(p1.z * p2.x - p2.z * p1.x): den = p1.z * p2.y - p2.z * p1.y
 
		line (scrx(p1), scry(p1)) - (scrx(p1) + scrscale * num, scry(p1) + scrscale * den),c 
		'The line should be long enough to go off-screen
 
	end if
 
end sub

The tests for p1.z < 0 and p2.z < 0 was made after first converting them to integers because if the values are very close to 0, for example due to an imprecise rotation (which if done exactly might have made p2.z = 0 exactly, but instead p2.z = 10e-15 or something like that), then the points' projection on-screen could have been too far out. (Yes, in my examples, a coordinate value of 1 is small.) Even if both points were really in front of the plane of the viewpoint, there is no harm in using the second method to draw the line as long as it goes off-screen, which it will if one of the points has a really small z value.

Triangles vs. quadrilaterals

At this point, we have all the tools we need to draw wireframes in 3-d. So what to draw?

A cube is a good test shape for 3d graphics, because we know what it should look like, and graphics bugs would tend to ruin its symmetry.

It's common to use triangles to represent 3-d shapes. Any flat shape can be made from them, and even curved surfaces can be approximated using a lot of small triangles. It's easy and flexible.

Because 3 points determine a plane, any 3 points form a flat triangle as long as they are not co-linear.

I will not use triangles, but instead use convex quadrilaterals. The main reason is that most of the shapes I use are boxy, and it's faster to draw one square than to draw two triangles to form the same square.

It's easy to mix in actual triangles by designating one point - I use point 0 - as a placeholder. If the fourth point of a quadrilateral has a value of 0, then no lines are drawn to that point, and the tile becomes a triangle. A single array of tiles holds both quadrilaterals and triangles this way.

"Convex" means that any line segment drawn from one point on the shape to another will not go outside; for example, a U-shape is not convex, but a rectangle is. If the quadrilaterals are convex, then it is guaranteed that for any given horizontal slice on the screen there is at most one right edge and one left edge. If I were to allow non-convex (concave) quadrilaterals, then I would need to provide for up to two edges of each side.

3 points determine a plane, so if we're not careful, we could end up with 4 points that are not coplanar. That would result in distortion, making the tile look different from different angles. If they are nearly coplanar, the distortion will be small and probably not noticable.

type quad
    as integer p1,p2,p3,p4
    as uinteger col 'color
end type

Note that my quad type stores points as integers, not as vectors. That's because points are often shared by more than one face of an object. By giving the index of the point in the point3d/rel3d vector arrays, it ensures that once a point is rotated or moved, it affects all faces that share the point.

For the wireframe engine, I'll draw the 4 sides of each quadrilateral, as well as the 2 cross-lines connecting opposite corners, making it look a little more filled-in. For a triangle, I'll draw the 3 sides.

Frontfaces vs. backfaces

There's one more thing to consider: Tiles that face towards the viewer will normally be visible, while ones that face away (for example, on the far side of a cube) would be blocked from view if we were to fill in the tiles facing the viewer. If we were filling in everything, it would speed up the drawing process a lot if we didn't have to draw the backfaces which can never be seen anyway. Knowing which way a tile faces will also turn out to be convenient for telling us whether a line is a left edge or a right edge, which will help when filling in tiles.

For a wireframe engine, it's less of an issue - we'll see the wire backfaces in between the wires of the frontfaces. But it's still nice to be able to tell them apart. I eventually plan to incorporate a wireframe engine into a 3-d object editor, so it'll be important to know which way a face points. And it just looks a lot more sophisticated if we 'gray out' the backfaces, as I've done in the example.

To distinguish the backfaces, we could use a vector that points in the direction perpendicular to (normal to) each face, and rotate it along with the points as needed. Then we could check to see if that vector points to the half-space on the side of the viewer, or if it points away from the viewpoint.

It turns out that we don't need to put in vectors like that by hand. There is a commonly used and fairly simple way, called the cross product, to find a vector perpendicular to a given pair of vectors. The vectors I'll use are (point 2 - point 1) and (point 3 - point 1) of the quadrilateral, so both starting vectors lie along the plane of the tile, and their cross product will be perpendicular to it.

	'"a cross b" is perpendicular to both a and b
	'|a cross b| = |a| |b| sine(angle between them)
	function cross(byref a as vector, byref b as vector) as vector
	cross = Type(a.y*b.z - a.z*b.y, a.z*b.x - a.x*b.z, a.x*b.y - a.y*b.x)
	end function
 
	'normal vector perpendicular to the plane a tile is in
	#define normal(p1,p2,p3) cross(p2 - p1, p3 - p1)

To find out whether the cross product points towards the eyepoint or away from it, another common vector product will be used: the dot product, which is a number that depends on the angle between two vectors. If you are not familiar with the dot and cross products, it's a good idea to look them up on Wikipedia.

	'a dot b = |a| |b| cosine(angle between them)
	function dot(byref a as vector, byref b as vector) as double
	dot = a.x*b.x + a.y*b.y + a.z*b.z
	end function

These products could have been defined using macros instead of functions, but this way it's easy to plug in other formulas going into the product, as in cross(p2 - p1, p3 - p1), which wouldn't work if cross(a,b) were just a #define'd macro.

The direction of the cross product as compared to the viewer will depend on whether the three points appear in clockwise or counterclockwise order when facing the viewer. I'll use the convention that a front-face has counterclockwise points while a backface has clockwise points. The four points of a quadrilateral must all be in either CCW or CW order to make this work.

	#define faceside(q) sgn(dot(normal(rel3d(q.p1),rel3d(q.p2),rel3d(q.p3)) , rel3d(q.p1)))

Not that rel3d(q.p1) is the vector from the eyepoint to a point on the tile q (in this case, p1). If p1 is straight ahead, in the middle of the screen, then rel3d(q.p1) points in the -z direction. If the "faceside" is < 0, then the face points towards the viewer, and its normal vector points in the half-space towards the eyepoint (such as in the +z direction).

	for i=1 to maxpt*6/8: f = faceside(qface(i))
	if f<0 then wcol = qface(i).col else wcol=gray
	wirequad(qface(i),wcol): next

The factor of 6/8 in the above code is due to the use of cubes, since a cube has 8 points and 6 faces.

Movement

Finally, we need a way to move the eyepoint and to rotate the view. Keyboard input can be taken using the multikey function. Since it's not always easy to remember which numbers mean which keys, I #define'd some names that make it clear. In the example program, side arrows rotate the view sideways; adding alt makes it strafe instead. Up/down arrows move you forward or back. The a and z keys move you up or down.

When the view rotates, directions like "forward" change compared to the original coordinate system, and this is dealt with by using a "forward" vector and transforming it using sine and cosine. For example, if the right arrow key is pressed, the view should rotate off to the left of the screen, while the forward direction rotates to the right in the original (absolute) coordinate system.

Future plans

I plan to continue this series, though not necessarily at regular intervals.

Next time, I'll introduce rotation matrices, used for more general rotations. Then I'll get into tile filling and hidden surface removal / blocking. Techniques to be discussed include 1/z buffers, my own variant front-buffer and f-x buffer, and potentially s-buffers and other advanced techniques.

A simple scheme for creating and animating 3-d objects will be introduced; I hope to create a 3-d object editor, using my own file format.

Texturing the tiles will be discussed, and other shading techniques may be as well.

2.5-d graphics will be introduced. Essentially, I want to make 3-d objects that can be placed on a largely 2-d map. The walls and floors need not be full-fledged 3-d objects, but using 2-d 'raycasting' techniques, will still look like a 3-d environment. Extra touches, like a sloping floor, can be tacked on using 3-d objects. The advantage of this is speed: When determining whether a wall is blocked by another wall, you just have to check once for every vertical stip across the screen. Similarly, only those 3-d objects that are not blocked by walls need to be drawn. This makes it easier to create a large game level without slowing things down too much. Use of portals and regions to speed things up further may be discussed.

As I continue to work on my project I'll try to document other aspects of it, such as AI.

Download the code:  graf-w.bas