Big or Little

Every number has an ending. Sometimes it’s big, sometimes it’s little. When you are dealing with a number, that’s greater than what fits into 1 byte, you’ll need more space to store the number. Then the order of the bytes matters. This post is about the endianess, whether it’s big-endian or little-endian.

Luckily, the order of bits within a byte is not that relevant and can be ignored for most of the times. So we usually don’t need to care about the order on the bit level. Although the number of bits that constitutes a byte wasn’t always 8, today you can be sure, that 1 byte has 8 bits. If you read protocol specifications like RFCs you’ll encounter the word “octet”, which means a byte with 8 bits to avoid any ambiguity.

One byte can represent the decimal number 0 to 255. If we use a signed number, it will be in the range from -128 to 127. How many bytes to you need to store the decimal number 123456789? You can calculate the needed bits: n = log_2(123456789) = log_10(123456789)/log_2(10) = 26.879.... This means, we need at least 27 bits to store this number or 4 bytes. The number written in hexadecimal form is 0x075BCD15. The data type we would need in a programming language would be a 32-bit integer.

Now, when we want to transmit the number of the wire, byte by byte, which byte is first? Is it 0x07 or 0x15? This is exactly what big-endian or little-endian is about. The problem not only occurs when sending or receiving data from a byte stream like with Java’s java.io.InputStream or java.io.OutputStream. It also occurs, when receiving such a number via UART or a sensor using the I2C protocol. In all these cases, single bytes are transmitted one after another. And the receiver needs to interpret the bytes and reconstruct the number. If it is not clear, in which order the bytes are transmitted, then the numbers will be wrong.

Big Endian

Let’s first look into the big endian case. This is also known as “most significant byte first”. With our sample number from above, this means, that the byte 0x07 will be transmitted first. Interestingly, this endianness is also used for the notation of the whole number, whether the number is written as decimal or as a hexadecimal number.

Here are all the bytes in the correct order for big endian:

07 5B CD 15

The big endian byte order is used usually in the internet protocols and is therefore also referred as network byte order. This means, that an IPv4 address, which consists or 4 bytes, is transmitted as big endian. E.g. the address 192.168.0.1 becomes 0xC0A80001.

Big endian is also used for IBM z/Architecture, Atmel AVR32, Motorola 68k.

Little Endian

Little endian is also known as “least significant byte first”. The order of bytes with the example number from above becomes:

15 CD 5B 07

It is exactly the reverse of the big endian notation. Note that these 4 bytes still represent the number 0x075BCD15.

Little endian is used for Intel’s x86 32-bit and 64-bit architectures and the STM32 Cortex M4 microcontrollers. Note that some ARM chips also support kind of a dynamic endianness and can switch between little and big endian. See Bi-endianness.

Determining the endianness

If you don’t know, whether your current architecture uses big or little endianness, then you can figure it out in C by using a union:

#include <stdint.h>
#include <stdio.h>

int main(int argc, const char **argv) {
    union {
        uint32_t number;
        uint8_t bytes[4];
    } data;
    data.number = 123456789;
    if (data.bytes[0] == 0x07) {
        printf("Big Endian\n");
    } else {
        printf("Little Endian\n");
    }
}

This idea is actually from the German wikipedia article.

Unicode uses a Byte Order Mark or BOM, to determine the byte order for UTF-16: It’s either UTF-16LE or UTF-16BE. Same is for UTF-32LE/BE. UTF-8 doesn’t strictly require a BOM, since the byte order is ambiguous. The value of the byte order mark is the unicode codepoint U+FEFF. In UTF-16BE, this is encoded as FE FF, while in UTF-16LE it’s FF FE. For UTF-32BE it’s 00 00 FE FF and for UTF-32LE it’s FF FE 00 00. This means, reading the first 4 bytes is enough, to determine the unicode encoding. For UTF-8, this byte order mark is always EF BB BF.

Converting endianness

Sometimes, you might a number of bytes in big endian format and want to convert it into little endian, or the other way round. There are some function defined, such as htonl (host order to network order long) or ntohl (network order to host order long). For short number, similar functions exist: htons and ntohs. These functions always convert to or from the network order. If the program is compiled for an architecture, that uses natively big endian, then these functions do nothing.

Here’s an example for converting a 16-bit/2-byte number:

uint16_t number = 0x1234;
uint16_t littleEndian = ((number & 0xff) << 8) | ((number & 0xff00) >> 8);
uint16_t bigEndian = ((number & 0xff00) << 8) | ((number & 0xff) >> 8);

See bits/byteswap.h for more examples.

Java has the java.nio.ByteBuffer class to help in interpreting bytes as little or big endian:

import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public class Endianness {
    public static void main(String[] args) {
        ByteBuffer buffer = ByteBuffer.allocate(4);
        buffer.order(ByteOrder.LITTLE_ENDIAN);
        buffer.putInt(123456789);
        for (int i = 0; i < 4; i++) {
            System.out.printf("%02X ", buffer.get(i));
        }
        System.out.println("(little endian)");

        buffer.clear();
        buffer.order(ByteOrder.BIG_ENDIAN);
        buffer.putInt(123456789);
        for (int i = 0; i < 4; i++) {
            System.out.printf("%02X ", buffer.get(i));
        }
        System.out.println("(big endian)");
    }
}

The output is:

15 CD 5B 07 (little endian)
07 5B CD 15 (big endian)
Andreas Dangel | subscribe via RSS | adangel | © Copyright 2017. adangel.org (27 October 2017)