DATA MANIPULATION TUTORIALS, PARTS 1 & 2 · Hacker@alphalink.com.au

Part 1 - Encryption

Disclaimer: Sorry this is necessery due to the nature of this article. I assume NO responsibility for any damage(s) caused by this data. Some of the theory in this article has certain laws covering it, so you have been warned if you are a american citizen you should send some letters to congress saying to support the bill that they are trying to have passed. For more information please consult www.crypto.com

Note: all the math, and code here is calculated by hand (no calculator or help from another window with basic in it. Meaning that there may be a few mistakes so please bear with me for this one.. ok..)

Intro: Here im goin to introduce you to some basic encryption styles, Xor, Bit shifting, Bit Switching.

Xor: This is the simplest form of encryption that there is, Any person wanting to crack a 8 bit xor can do it in under 1 second on a low end pc. Thus making it useless without some modification. For all those that dont know what xor is, it is a simple operator. Its on almost every scientific calculator. Now a standard xor command looks like this:

done = dat XOR passwd

Now lets assume that dat is 32 (00100000) and passwd is 69 (01000101). another view of that is (2 binary digits that have the same value = 0 whilst 2 with diffrent values = 1)

+--+--+--+--+--+--+--+--+
| 0| 0| 1| 0| 0| 0| 0| 0| 32
+--+--+--+--+--+--+--+--+

Xor

+--+--+--+--+--+--+--+--+
| 0| 1| 0| 0| 0| 1| 0| 1| 69
+--+--+--+--+--+--+--+--+

Answer (Below)

+--+--+--+--+--+--+--+--+
| 0| 1| 1| 0| 0| 1| 0| 1| 101
+--+--+--+--+--+--+--+--+

Hope this is simple enough for everyone to understand. Now we can add to our simple XOR excryption by lets say adding 1 for each byte so for the first the password is 69 then the next byte is 70 then the next one 80, then the next one 90 and so on. This will work fine until you reach 255, hmm then you have a small problem. you need to add a small 'line' to your encryption 'loop' here is a code sample. Sorry about the HEAVY commenting, some people might not understand if I didnt.

' Xor Encryption Example

' Clear the screen
CLS

' The message about to be encrypted
dat$ = "Hey, Look. Im about to be encrypted"

' Password. Can be changed to any 8 bit number
passwd = 128

' Dump Data to screen
COLOR 4: PRINT "Original : "; : COLOR 1: PRINT dat$

' Move passowrd into a temporary integer
passwdz = passwd

' Start a loop. increments X
FOR x = 1 TO LEN(dat$)
       
' Xor's the password with 1 byte from string. Adds to DONE$     
        done$ = done$ + CHR$(ASC(MID$(dat$, x, 1)) XOR passwdz)
       
' Adds 1 to temp integer
        passwdz = passwdz + 1
       
'if temp integer = 256 (no longer 8 bit) then make it 0
        IF passwdz = 256 THEN passwdz = 0

' loops back
NEXT x

' Dumps data to screen
COLOR 4: PRINT "Encrypted: "; : COLOR 1: PRINT done$

' Re-init data
dat$ = done$
passwdz = passwd
done$ = ""

' Start a loop. increments X
FOR x = 1 TO LEN(dat$)
      
' Xor's the password with 1 byte from string. Adds to DONE$   
        done$ = done$ + CHR$(ASC(MID$(dat$, x, 1)) XOR passwdz)
      
' Adds 1 to temp integer
        passwdz = passwdz + 1
      
'if temp integer = 256 (no longer 8 bit) then make it 0
        IF passwdz = 256 THEN passwdz = 0

' loops back
NEXT x

' Dumps data to screen
COLOR 4: PRINT "Decrypted: "; : COLOR 1: PRINT done$

 

Bit Shifting: This is a LOT simpler than it sounds (once you have the Idea) The theory behind this is if you add X bits before a entire string then it is unreadable because It throws everything off whack. But with this you have to be careful if the number is divisable by 8 then all you do is add 1 byte to the data. Below there is a subroutine that converts a binary number (must be 8 bit) into a ascii charecter.

FUNCTION bin2asc$ (bit$)
'*************************************
'** Function: bin2asc$              **
'** Author: Hacker@alphalink.com.au **
'** Info: Converts 1 byte into a    **
'**       binary number.            **
'*************************************

FOR X = 1 TO LEN(bit$) STEP 8
    dat$ = MID$(bit$, X, 8)
            FOR y = 1 TO 8
                IF MID$(dat$, y, 1) = "1" AND y = 1 THEN num = num + 128
                IF MID$(dat$, y, 1) = "1" AND y = 2 THEN num = num + 64
                IF MID$(dat$, y, 1) = "1" AND y = 3 THEN num = num + 32
                IF MID$(dat$, y, 1) = "1" AND y = 4 THEN num = num + 16
                IF MID$(dat$, y, 1) = "1" AND y = 5 THEN num = num + 8
                IF MID$(dat$, y, 1) = "1" AND y = 6 THEN num = num + 4
                IF MID$(dat$, y, 1) = "1" AND y = 7 THEN num = num + 2
                IF MID$(dat$, y, 1) = "1" AND y = 8 THEN num = num + 1
            NEXT y
    b$ = b$ + CHR$(num)
    num = 0
NEXT X
bin2asc$ = b$
END FUNCTION

This should help with writing the finished file, and writing a routine to do the opposite. Ive got one that does what is required BUT it is part of HCRYPT and it does MUCH more than just that (Its part of the encrypt process). So.. err.. I dont think that I want to distribute that JUST yet. The above routine is also from Hcrypt but it was seperate from the save file procedure (I do a bit more processing before the file is saved). Hope this sheds some light on Bit shifting, And working with a BITSTREAM instead of a plain string.

 

Bit Switching: This is a simple form of encryption that envolves more than 1 bitstream. What this does it has 2 or more bitstreams. The first is the original data, the second(and onwards) is a stream of bits. They are switched making the original data completely garbled and hard to decipher. The downside to this is that the password becomes impossible for a human to remember so we start relying on a "PASSWORD FILE" (Hcrypt does this) to decrypt we do the opposite to encrypting. We check for the 'password' bits and replace them with the old bits. This data would be stored all in the password file. Here is a small example (All the data is random):

Original Bitstream:

0111011011011011100110110011010110011100110011001100110011011010

Password Bitstream TABLE:

Old New
0111 1010
0110 1100
0010 0101
1100 0011
1101 0111
1011 0110
1001 1011
0011 0000
0101 1111
1010 1000

(* having the same number on both sides is a good idea when the program is coded propperly)

Encryped Bitstream:

0111011011011011100110110011010110011100110011001100110011011010 (Old)
----====----====----====----====----====----====----====----====
1010110001110110101101110000111110110011001100110011001100111000
(New)

Ive included both encrypted and un-encrypted bit streams so you can check up on my work and its less confusing :). the ===='s and ----'s are there so you can see which 4 bit section is. As you can see the original is nothing like what we end up with and the password file isnt too big. Then you can all ways find a nice way to hide the password file. in this case its not TOO long(and in 4 bit nibbletts) so you could write a little util that adds 64 to each of the binary numbers and displays their ascii values (remember all the letters start at 64). This makes it easy to carry a sophisticated password around in your head. BUT what happens when you need more powerfull encryption than that. Well who is stopping you from running the data through again after adding a 4 'filler' bits (2 to the start and 2 to the end). or maybe making the binary 'password' bigger. or a combination of these. But when you make the encryption more complex make sure that the decryption process is up to speed. It's no use writing a encryption program, converting your companies entire financial records, deleting the originals, then finding out your de-cryption process only follows 16 out of 96 steps.

Laws concerned: The US goverment BANS all messages outgoing from america using encryption that is stronger than 40 bit(Last time I heared).According to BYTE magazine the NSA has a 'chip' that can crack 128 bit encryption in about 2 seconds. This is not too nice for all you budding young cryptographers out there so here comes our next section.

Hiding Data: almost every file format can have data 'added' somwhere without too much distortion, or problems. This teqnique is used by many people that want to send data without people necesserly knowing something is there. A alternate name for this is EMBEDDING. In issue 12 you are taught how to APPEND data to the end of a .EXE. this can be done to any format without much trouble but it does kinda stick out (If you know what you are looking for you will find it in less than a minute). So why dont we replace data in another file. Lets take a WAVE file for example. You have the HEADDER. then you have the DATASTREAM. The wave format dosent paticuarly CARE if you replace data, as long as you have a MONO, LOW QUALITY soundfile it is masked nearly completely. so lets say that we have a wave file and a simple un-encrypted message to put in it:

Magic Number (which byte to replace):
6 (ie 6th byte 12th byte 18th byte and so on)

Message:
HELLO

Wave Data stream (I picked lower case so you can see better):
ajwkhsds;lkjuhfasd;dj4w986yajhfjhz;djkasgj

Encrypted Data
ajwkhHds;lkEuhfasL;dj4wL86yajhOjhz;djkasgj

The bold letters point out where our data is. now you have a problem seeing this with the knowledge of where the data is. If your data dosent have a recgonisable header or marking no one will be able to notice. Do you know how many losers send simpsons/soutpark/some brand new tv show .WAVS, via e-mail. instead of just mailing the url to the other person. about 2^65535 (now lets se you work that one out on a calculator). So if someone is 'browsing' through a BBS's es sound section they will see a normal file. That plays normally without problems but the people that do know will be downloading that encrypted data with a large evil grin. Another thing to remember is to keep the magic number fairly high. I would personally keep it above 100 because the smaller it is the more distortion. 6 is a extremely LOW number and the original file when being played it may have a LOT of background static. Also the .WAV format is the easiest format to 'hide' data in. some other formats may requre you to do a lot more processing Like a .MP3 for example would probbably need you to understand the format enough to know where data can be kept without much distortion or detectability.

End Note: This article is huge, and without much code. Hmm its pretty big. more than 1 third of the size of the first basix fanzine. Next article I will do will be on storage. I will also have to brush up on my "3d" coding because I'll be covering a 3d model and maybe its skin, possible way of storing it and the article will have a few DATAFILES that could also be sent along with it. (The model) and hopefully I will post one of my 'LIBRARYS' but I will have to clean up the code a lot. While all you people are waiting for the article go outside(It took me longer to figure out the concept of outside than how to code in delphi). and buy a CD called Smells Like Children. By Marilyn Manson

- hacker@alphalink.com.au






 

Part 2 - Processing

Intro: Today we will have fun. Im going to take you through creation of your own file format, reading some other file formats,alternate ways of storing data,and possible various utils. But first your own file format.

Creating a file format: Now we have all been writing a program and wanting to store data right? When we want to store information let it be anything from a high score table to a graphic image to even the source code to unix it has to be stored in somthing, usually a file. When creating a file format first we need to figure out what we want to store, why, and any restrictions. Lets say that we want to create a new image format for distrabution over internet in this format we want to keep it as user friendly when opening as possible, so lets first give up any hope of re-inventing .JPG or .GIF because they store their data in a linear way. Becuase we are transfering the data over internet we have to be able to cater for the people with slow connections and short attention spans. So lets store the data in this way

Each number represents which pixel is where in the file. (note there is a error in there. I stuffed up the order somewhere. only notaiced when I got 3 numbers in a row and I got 101)

+---+---+---+---+---+---+---+---+---+---+
|001|014|027|040|051|062|071|002|015|028|
+---+---+---+---+---+---+---+---+---+---+
|083|094|099|100|101|003|016|029|041|052|
+---+---+---+---+---+---+---+---+---+---+
|072|084|095|004|017|030|042|053|063|073|
+---+---+---+---+---+---+---+---+---+---+
|085|005|018|031|043|054|064|074|086|006|
+---+---+---+---+---+---+---+---+---+---+
|019|032|044|055|065|075|087|007|020|033|
+---+---+---+---+---+---+---+---+---+---+
|045|056|066|076|088|008|021|034|046|057|
+---+---+---+---+---+---+---+---+---+---+
|077|089|096|009|022|035|047|058|067|078|
+---+---+---+---+---+---+---+---+---+---+
|079|010|023|036|048|059|068|080|090|011|
+---+---+---+---+---+---+---+---+---+---+
|024|037|049|060|069|081|092|012|025|038|
+---+---+---+---+---+---+---+---+---+---+
|093|097|070|082|098|013|026|039|050|061|
+---+---+---+---+---+---+---+---+---+---+

[Ed's note - sorry about keeping that as an ASCII table, but I'll be buggered if I'm going to convert that lot to HTML! :) ]

But the theory behind this is that each 13th byte is printed so at first you start of with something very blury then less then less until you have the full image. Now with designing that format you have to remember to add some basics. A headder lets say FRMT!> which needs to have some meaning the letters FRMT stand for FORMAT the ! stands for the version number (ascii value) and the > might be a sign for the loader to take know where to stop reading the headder. Also you have to add 'UN-USED' space into the file for future expansion, Also when re-designing the format (this does happen) make it as backwards compatable as possible. Or in other words readable by a 'OLD' editor even though some new features are un-avalible in the version, comatability is VERY important.

READING OTHER PEOPLES FILE FORMATS:
This may sound like you are trying to rip other people off but there are many formats that you use today and you dont feel like you are stealing .GIF , .PCX , .JPG, .TXT,.HTML. All these formats were invented by other people and are commonly used. I will discuss a Common file format that is being used Widly and previously un-discussed by me .WAV (the other most common format i .PCX which my tutorial can be found in issue 12) The .WAV format comes in a few diffrent flavours but the standard un-compressed one will be covered here. One recomended FTP site is
ftp.uu.net/vendor/microsoft this site contains a bit of info on microsoft products and creations (including formats). As all good formats the .WAV format starts with a headder in this case RIFF the next 12 bytes are organized in the following way:

WORD wFormatTag
WORD wChannels
DWORD dwSamplesPerSec
DWORD dwAvgBytesPerSec

WORD stands for word. it is used in asm programming and it means that a string is going to be 2 bytes long

DWORD stands for double word it is used in asm programming a bit it means that a string is going to be 4 bytes long

All this information you can ignore quite safely. Take a look at this program taken from Basic Internet Fanzine Issue 1. If your reading this PETER COOPER im sorry for mangaling your code ;). Take a read of this code if you dont understand read earlier fanzies.

13-wplay.bas

Ok that was probbably over commented (sorry again.) but by adding that code in you can read it to understand the format better. As the saying goes 'A code snippet is worth a million words'.

ALTERNATE WAYS OF STORING DATA: Hmm I just had a idea why dont I take you through the process of writing a 3d game. (I'll have to brush up on my BSP algorythm info though) Ok lets say that we have a 3d object. We will keep it simple lets say that it is kept at:

( 4, 4, 4) ( 2, 4, 4) ( 2, 2, 4) ( 2, 4, 4)
( 4, 4, 2) ( 2, 4, 2) ( 2, 2, 2) ( 4, 2, 2)

is a simple cube. Now we have many ways of storing it. We can keep it in text format, Making it easier for people to modify but making it harder for people to create a loader. we can store it in binary form (I will use CHR$(x) instead of the real charecters) And the file would look like this.

CHR$(4)CHR$(4)CHR$(4)CHR$(2)CHR$(4)CHR$(4)CHR$(2)CHR$(2)CHR$(4)CHR$(2)CHR$(4)CHR$(4)

Now thats without a texture. Where would you put that? in a seperate file? into the model file? where in that? at the start, or at the end? Now these are some of the decisions faced by all those cutting edge programmers at sierra that will soon be releasing the first beta copy of a new game called PAC-MAN. (hmm, maybe not exactly our problems, but similar ones) If we were to keep the texture data behind the 3d wireframe it would be easier to code BUT we are faced with 'sorting' the data into a image this is where we must map a texture onto the image A painfull process if you have never done any 3d coding before and still a major pain the the nads if you have. Another possible option is scraping the idea of a wireframe and keeping the pixel color,and position in the same object. The file would look like this (triangle):

( 0, 1.5, 1, 4)
( 1, 0, 1, 4)
( 1, 1, 1, 4)

The format is (X,Y,Z,COL) these are the points but all the pixels being drawn would be each given their own x,y,z that can be anywhere making this easy to code and extremely easy to write a decoder (earlier issues go into detail) for. A tip though Remember to cache, I got _FLAMED_ because he said my routine was not to much faster than in-built command, so here is a question to all those who flamed me before, Who can "LOCK" at 32+ FPS. I'll put money on it that none of you can whilst only using PSET(160x100). This is a small routine that I will be refering to a lot in the near future so take a look at it and Learn what it does. Even run it.

DECLARE SUB DUMPdata ()
DECLARE SUB BENCHMARK ()
DECLARE SUB VRT ()
DECLARE SUB PREloaddata ()

DEF SEG = &HA000
DIM SHARED precache&(160, 100), endNPOKE#, startNPOKE

SCREEN 13

PREloaddata
   
    startNPOKE# = TIMER
    CLS
        FOR col% = 0 TO 15
            VRT
                FOR x% = 1 TO 159
                    FOR y% = 1 TO 99
                        POKE precache&(x%, y%), col%
                    NEXT y%
                NEXT x%
        NEXT col%
    endNPOKE# = TIMER

    SCREEN 0: WIDTH 80
    PRINT 16 / (endNPOKE# - startNPOKE#); " FPS with New Routine "
    PRINT "Time taken with New Routine:"; endNPOKE# - startNPOKE#

SUB PREloaddata
    FOR x = 80 TO 240
        FOR y = 50 TO 150
            precache&(x - 80, y - 50) = ((320 * y + x))
        NEXT y
    NEXT x
END SUB

SUB VRT
' Waits for Vertical Retrace. Slows us down, But Hey it gets rid of white
' "Static" and Un-Needed Flicker. If we are rushed for speed then we
' remark this line OUT. For benchmark stake We will keep it in.
    WAIT &H3DA, 8: WAIT &H3DA, 8, 8
END SUB

There is still some optimization that could happen before this runs at what I would like to. I would like to compile a list of speeds so can you run this and mail me the results with info about your PC (Cpu speed, Memory, OS running at time of test). Now That that is over I HAVE to explain about

CREATING UTILITIES: This step in creating a file format isnt necessery But it is a hell of a lot easier to have a small program straight away that handles basic functions Like Viewing the file, editing the file and converting data from a existing format that is usefull (a .PCX file can be added to a 3d model as a texture) Without at least something to view the file you cant really be sure that what you just coded will work (we are assuming that you coded a converter from a existing format) There is allways a chance that you missed something so also be sure to test it with more than one file preferabbly with every file that you can imagine that is usable. This will hopefully ensure that everything is in working order. Also Be sure to try to anticipate EVERYTHING before you release your format. You wouldnt like to give out utils,examles,build a web page and even sell your format to some company only to find out that if a file is above a certain size it "accidentally" crases. Or even worse if your writing some kind of "language" only to find out its possible to write directly to memory above what it is ment to.

End Note: Sorry about no model file Its just that I couldnt find a example Good enough to add, and another downside is I dont have the specs for quake2's .MD2 format :(. Once I can get that I'll give out a reader for it. Here is a chalange for everyone with a copy of qb4.5 (Ive been trying to do this but I just cant do it)

I have my main module compiled into a .EXE it has the following routines in it

readdata
drawpixel
grid

The .EXE also uses a routine in the Library called

MAIN

In my library I have the following routines

MAIN
ITEMS
LIST
STORY

The routine Main uses the following functions

readdata (IN .EXE)
drawpixel (IN .EXE)
grid (IN .EXE)
ITEMS
LIST
STORY

I need to be able to compile the .EXE and distribute it without the source code. BUT I ALSO NEED to be able to distribute the source code to the library so people can compile it and redistribute it for themselves. the only problem with this is that the Library wont compile without the routines in the .EXE being there. Im not sure how to fix this but it has become a problem. I would give out the source to the entire thing BUT I will be selling the program to the public (its HCRYPT) and I want to be able to let users create their own custom plugins. (All I really need to do is figure out how to declare a routine EXTERNAL)

- hacker@alphalink.com.au

BTW please help with that survey with the new engine. I need to know what to set as a minimum requirement





This article originally appeared in The BASIX Fanzine Issue 13 from December 1998.