Data Representation

From Modcrafter Wiki

Jump to: navigation, search

One thing normally overlooked when presenting information about game modification is a discussion on data representation. While some people know the very basics behind the concept of binary, sometimes this isn't enough. This is especially true with StarCraft, where some modifications start to change data contained in the engine. When that begins to happen, it is crucial to know how the computer is going to interpret the data.


There are 2 main schemes of data representation that need to be addressed - Binary and Hexadecimal. If you already have a basic understanding of this and want to get into the interesting stuff, you can move on to how data is stored in the MPQ Archives.


Binary

Binary is the simplest number system around, in that it only has 2 options. In each place, there is either a 1 or a 0. Each spot can be thought of like a switch. Each switch is used in combination with others to relay some sort of message. Now, the important thing is, how many different things can be said with a certain number of switches? Well, that's actually pretty easy to figure out. For example, if I have 1 switch, I can say 2 things. If I have 2, I can now think of 4 different sets of messages I can say. This doubles each time you add a switch.

Still, this doesn't tell you what the messages mean. Unless you define what they mean, they don't mean anything. Commonly binary numbering systems are use to represent just that - numbers. But which ones? Well, let's just do things logically. Zero is 0b(a b afterward means it is a binary number). Makes sense and it's something we already recognize. So, one is 1b. That's good too. However, what happens when we want to make 2? We can't use a 2, so we have to move a place over and restart the count in the previous position, just like if you were going from 9 to 10. So, two is 10b. If we keep going, we get the following:

Decimal Binary
0 0000b
1 0001b
2 0010b
3 0011b
4 0100b
5 0101b
6 0110b
7 0111b
8 1000b
9 1001b
10 1010b
11 1011b
12 1100b
13 1101b
14 1110b
15 1111b

One thing you might notice right away is that even though I have 16 different combinations, the highest number that I can reach is 15. The next thing you might notice is that there is no negative numbers. There is a special way that negative numbers are represented. More on that later. First we'll go over the way computers use binary.

In computers, these switches are called bits. Computers have a set of well defined sizes they use for representing numbers. First off is a byte. Bytes contain 8 bits. So, the number of combinations that a byte uses is 256. However, the highest number is 255. This may sound familiar to you if you've ever done any modding. When someone says something can't go any higher(like the number of upgrades), the reason is because only 1 byte has been set aside to hold the number.

An easy formula for the highest number that x number of bits can represent is 2^x - 1.

Negative numbers use a system called two's compliment. The reasons for doing this are a bit beyond the scope of this article. The basic idea is that half of the numbers that used to be positive will now be interpreted as negative. To get the lowest number in a two's compliment system, you take the combinations and divide it by 2, and then that is the lowest negative number. So, for 8 bits, it's 256 / 2 = 128. The lowest number a byte can be when treated as signed(same as two's compliment) is -128. That leaves 127 as the highest number. How the bits represent this is a little more technical than we can go into. If you want to know how the bits represent the numbers, a quick search on the web will link you to articles that explain what they are. For our purpose, the max and min numbers are the important thing, as they are the limits you have to work in.

Hexadecimal

(TO BE FINISHED)

Hexadecimal is a number system that uses 16 different digits for each place. Commonly, the digits above the standard 0-9 are A-F. Often, computers will display numbers as hex rather than binary. There simple explanation for this is that showing the data in hex makes it a lot more compact. With a 32, or even worse, a 64 bit number, that really makes a difference. For example, one billion in a 32 bit binary number is 00111011100110101100101000000000, but in hex, it is 0x3B9ACA00. As you start wanting to show larger amounts of data, hexadecimal is a very appealing alternative to binary.

One thing about it that makes it so useful is that each digit in hex represents 4 bits exactly. With 4 bits, you get 16 options. This property makes conversion between them rather easy. To expand the earlier table -

Decimal Binary Hexadecimal
0 0000b 0x00
1 0001b 0x01
2 0010b 0x02
3 0011b 0x03
4 0100b 0x04
5 0101b 0x05
6 0110b 0x06
7 0111b 0x07
8 1000b 0x08
9 1001b 0x09
10 1010b 0x0A
11 1011b 0x0B
12 1100b 0x0C
13 1101b 0x0D
14 1110b 0x0E
15 1111b 0x0F

Using this table, you can convert any hex number to binary and vice versa. For example, 0x1234 is 0001 0010 0011 0100 in binary(the spaces are to show how each digit is exactly 4 bits).

Hexadecimal is really just a shorter way of representing numbers, and for the most part modding utilities will go even further and change it to decimal for you.

Flags

Okay, one last bit, combining both hex and binary information together. Sometimes you want to store a lot of boolean values. Well, binary makes it really easy to do that. Each bit is a boolean. However, since computers show numbers in hex, it's important to know what hex values get which bits. Normally, you want to get a single bit, no matter if it's on or off. Far be it to delve into boolean algebra, but there's a nice little operation known as a bitwise-AND that can do it for you granted you tell it which bit you want. All you have to do is make a number with only the bit you want set to 1. So, if I want the second bit in a number, I need to give it 0010b or 0x02. The third bit would be 0100b or 0x04. Go up high enough, and you'll see it's always a power of 2 that is a single bit on. This is by far something that is much more useful to know when dealing with the flags themselves, but it is important enough in SC to mention here. Here's a table for reference. It's only for 16 bit numbers, but the pattern is easy enough to see:

Bit Hex MASK
1 0x0001
2 0x0002
3 0x0004
4 0x0008
5 0x0010
6 0x0020
7 0x0040
8 0x0080
9 0x0100
10 0x0200
11 0x0400
12 0x0800
13 0x1000
14 0x2000
15 0x4000
16 0x8000

Designed by poiuy_qwert