Part 1: Reading A WAVE File

Before I start building the engine, I need to be able to read a .wav file into memory so we have some data to pass around. There are plenty of libraries that could handle this for me, but I'd prefer to do everything by hand here. Let's start off by looking at the format of a .wav file and how it's laid out.

The WAVE Format

The WAVE file format is a subset of Microsoft's RIFF format. A file starts out with a RIFF header, followed by a sequence of chunks, typically a fmt chunk that describes the format of the data, and a data chunk containing the actual audio data. There may be additional chunks present too to store things like album artwork, artist/album/track names, or tempo/MIDI data.

The RIFF header is 12 bytes, and each chunk after has a 4-byte chunkName, a 4-byte chunkSize, then the actual chunk's data.

       __
      | "RIFF"      (4 bytes)       // RIFF title
RIFF  |  File Size  (4 bytes)       // File size
      | "WAVE"      (4 bytes)       // File format
       ‾‾
       __
      | "fmt "      (4 bytes)       // Chunk name
      | Chunk Size  (4 bytes)       // Chunk size
      | PCM Flags   (2 bytes)       // 1, 2, 3 or 65534
      | Channels    (2 bytes)       // Channel count
      | Sample Rate (4 bytes)       // Samples/second
      | Byte Rate   (4 bytes)       // Bytes per second
FMT   | Block Align (2 bytes)       // One audio frame length
      | Bits Per Sample (2 bytes)   // Bit Depth
      |  __
      | | Extension Size     (2 bytes)  // Optional extension
      | | Valid Bits/Sample  (2 bytes)
      | | Channel Mask       (4 bytes)  // Surround config
      | | Sub Format GUID    (16 bytes) // If PCM Flags = 65534
      |  ‾‾
       ‾‾
       __
      | "data"      (4 bytes)      // Chunk name
DATA  | Chunk Size  (4 bytes)      // Chunk size
      | Audio Data  (Chunk Size bytes)
       ‾‾

Opening and Validating the File

First, we have to open the file in raw binary mode "rb", and validate the "RIFF" and "WAVE" headers.

FILE *file = fopen("./path_to_file.wav", "rb");
if (!file) {
  throw std::runtime_error("Error opening file");
}

std::string riff = readString(file, 4);    // (4 bytes) Riff Title
uint32_t fileSize = readU32(file);         // (4 bytes) File Size
std::string format = readString(file, 4);  // (4 bytes) File Format

if (riff != "RIFF" || format != "WAVE") {
  fclose(file);
  throw std::runtime_error("Only .wav files are supported");
};

Looping the Chunks

We can loop over the chunks by first reading the chunkName and chunkSize, then seeking to the end of the chunk when we are done.

char chunkName[5] = {}; // 5 to include the string terminator \0
uint32_t chunkSize;
while (fread(&chunkName, 4, 1, file) && fread(&chunkSize, 4, 1, file)) {
  long chunkEnd = ftell(file) + chunkSize + (chunkSize & 1);
  // ...
  fseek(file, chunkEnd, SEEK_SET);
}

The Format Chunk

The fmt chunk holds the main metadata for the file.

if (std::strcmp(chunkName, "fmt ") == 0) {
  uint16_t pcmFlags = readU16(file);        // (2 bytes) PcmFlags
  uint16_t channels = readU16(file);        // (2 bytes) Channel Count
  uint32_t sampleRate = readU32(file);      // (4 bytes) Sample Rate
  uint32_t byteRate = readU32(file);        // (4 bytes) Byte Rate
  uint16_t blockAlign = readU16(file);      // (2 bytes) Block Align
  uint16_t bitsPerSample = readU16(file);   // (2 bytes) Bits Per Sample

  // Extension if it exists
  if (chunkSize != 16) {
    uint16_t extensionSize = readU16(file); // (2 bytes) Extension Size

    // If wave format extensible
    if (pcmFlags == 65534) {
      uint16_t validBits = readU16(file);   // (2 bytes) Valid Bits/Sample
      uint32_t channelMask = readU32(file); // (4 bytes) Channel Mask
      uint32_t subFormat = readU32(file);           // (4 bytes) Sub Format
    }
  }
}

The Data Chunk

The audio data is stored as a bunch of samples one after another on disc. They are raw signed integers, 16, 24, or 32-bits wide. A 24-bit sample, for example, spans the range -8388608 to 8388607. The engine will be expecting 32-bit floats in the -1 to 1 range, so for each sample we need to copy it into a 32-bit container, fix the sign, and normalize it to a float. Now that we know bitsPerSample from the fmt chunk, we know where one sample ends and the next starts.

     16-bit sample       16-bit sample      16-bit sample
   ________________    ________________    ________________
  |                |  |                |  |                |
  00000000, 00000000, 00000000, 00000000, 00000000, 00000000


          24-bit sample                24-bit sample
   __________________________    __________________________
  |                          |  |                          |
  00000000, 00000000, 00000000, 00000000, 00000000, 00000000


                         32-bit sample
             ____________________________________
            |                                    |
            00000000, 00000000, 00000000, 00000000

Initializing Vectors and Looping

First, we need to create a vector of floats to store the final samples. We can calculate the total number of samples from bitsPerSample and chunkSize.

if (std::strcmp(chunkName, "data") == 0) {
  int bytesPerSample = bitsPerSample / 8;
  int numberOfSamples = chunkSize / bytesPerSample;
  auto samples = std::vector<float>(numberOfSamples);

  // ...
}

Now we can loop over each sample and process it. We will read all the bytes from the file into memory first, so that we are not doing a file read for each individual one, then we can start by copying each sample into a 4-byte integer.

std::vector<uint8_t> rawBytes(chunkSize);
fread(rawBytes.data(), 1, chunkSize, file);

for (int i = 0; i < numberOfSamples; i++) {
  uint32_t rawValue = 0; // 4 bytes
  memcpy(&rawValue, rawBytes.data() + (i * bytesPerSample), bytesPerSample);
  // ...
}

Copying the Sign

When we copy a 16-bit or 24-bit signed integer into a 32-bit container, the signed bit will end up in the wrong place. Computers represent negative numbers with two's complement so when we copy 2 or 3 bytes into a 4-byte container, we need to populate the new empty leading bytes with the sign.

            signed bit
                  ↓
                  00000000 00000000 10000000
                  |________________________|
                          3 bytes

          gets copied here
                 ↓
               ______
              |      |
              00000000 00000000 00000000 10000000
              |_________________________________|
                              4 bytes

We can do this with a bit shift. << shifts every digit left one digit in binary. So we can shift everything left by the number of extra bits until the sign bit is all the way to the left, then shift it back right. When shifting right on a signed int, the CPU automatically fills the new leftmost bits with the sign.

int numExtraBits = 32 - bitsPerSample;
int32_t sample = (int32_t)(rawValue << numExtraBits) >> numExtraBits;

Normalization

Now we have to normalize the values to be within the -1 to 1 range instead of the raw integer representation. To make sure we end up with the same ratio between the values after conversion, we need to multiply each value by a normalizationScale that captures the ratio of newMax / oldMax.

The newMax will be 1, and we can calculate the oldMax with 2^(bitsPerSample - 1). We subtract one bit to account for the signed bit which is only used to determine + or -. We also need to account for there being one more possible negative value in the integer range. We can do this by adding 1 to our old oldMax so it's an even ratio.

The best way to compute 2^anything is actually with a bit shift as well. Since the << operator shifts everything left one digit in binary, it's the same as multiplying by 2 each time. So 1 << 3 goes from 0001 to 1000 or (1 × 2 * 2 * 2) = 8.

uint32_t maxValue = 1u << (bitsPerSample - 1);
float normalizationScale = 1.0f / (1 + maxValue);

Finally we can apply the normalization, add it to the final vector, and return the samples.

float normalizedValue = sample * normalizationScale;
samples[i] = normalizedValue;
return samples;

Conclusion

That's the gist of it. There is also the possibility of audio already being stored as 32-bit floats, but it's not as common. It's easy enough to handle though — if the fmt chunk indicates pcmFlags == 3 (IEEE Float), we can just copy the samples straight across and return early. We can also return an AudioFile struct instead of just the samples so the engine has access to metadata like channelCount and sampleRate.

Here is what the code ends up looking like at the end.

// Final struct to return
struct AudioFile {
  std::vector<float> samples;
  uint32_t sampleRate;
  uint16_t channels;
  uint16_t bitsPerSample;
};

// Main function
AudioFile readAudioFile(std::string path) {
  FILE *file = fopen(path.c_str(), "rb");
  if (!file) {
    throw std::runtime_error("Error opening file: " + path);
  }

  // Riff validation
  std::string riff = readString(file, 4);   // 4 bytes - Riff Title
  uint32_t fileSize = readU32(file) + 8;    // 4 bytes - File Size
  std::string format = readString(file, 4); // 4 bytes - File Format

  if (riff != "RIFF" || format != "WAVE") {
    fclose(file);
    throw std::runtime_error("Only .wav files are supported");
  };

  bool isFloatData = false;

  // Chunks
  char chunkName[5] = {};
  uint32_t chunkSize;
  while (fread(&chunkName, 4, 1, file) && fread(&chunkSize, 4, 1, file)) {
    long chunkEnd = ftell(file) + chunkSize + (chunkSize & 1);

    // Format Chunk
    if (std::strcmp(chunkName, "fmt ") == 0) {
      uint16_t pcmFlags = readU16(file);   // 2 bytes - PcmFlags
      uint16_t channels = readU16(file);   // 2 bytes - Channel Count
      uint32_t sampleRate = readU32(file); // 4 bytes - SampleRate
      uint32_t byteRate = readU32(file);   // 4 bytes - ByteRate
      uint16_t blockAlign = readU16(file); // 2 bytes - BlockAlign
      uint16_t bitDepth = readU16(file);   // 2 bytes - BitDepth

      audioFile.bitsPerSample = bitDepth;
      audioFile.sampleRate = sampleRate;
      audioFile.channels = channels;

      if (pcmFlags == 3) {
        isFloatData = true;
      }

      if (chunkSize != 16) {
        uint16_t extensionSize = readU16(file); // 2 bytes - Extension Size

        // Wave Format Extensible
        if (pcmFlags == 65534) {
          uint16_t validBits = readU16(file);   // 2 bytes - Valid Bits
          uint32_t channelMask = readU32(file); // 4 bytes - Channel Mask
          uint32_t subFormat = readU32(file);   // 4 bytes - Sub Format

          audioFile.bitsPerSample = validBits;
          if (subFormat == 3) {
            isFloatData = true;
          }
        }
      }
    }

    // Data Chunk
    if (std::strcmp(chunkName, "data") == 0) {
      uint16_t bitsPerSample = audioFile.bitsPerSample;

      // Final Vector
      int bytesPerSample = bitsPerSample / 8;
      int numberOfSamples = chunkSize / bytesPerSample;
      audioFile.samples = std::vector<float>(numberOfSamples);

      if (isFloatData) {
        fread(audioFile.samples.data(), sizeof(float), numberOfSamples, file);
      } else {

        // Normalization scale
        float maxValue = 1u << (bitsPerSample - 1);
        float normalizationScale = 1.0f / (1.0 + maxValue);

        // All raw bytes
        std::vector<uint8_t> rawBytes(chunkSize);
        fread(rawBytes.data(), 1, chunkSize, file);

        // Construct each sample
        for (int i = 0; i < numberOfSamples; i++) {
          uint32_t rawValue = 0;
          memcpy(&rawValue, rawBytes.data() + (i * bytesPerSample),
                 bytesPerSample);

          // Shift signed bit
          int unusedBits = 32 - bitsPerSample;
          int32_t sample = (int32_t)(rawValue << unusedBits) >> unusedBits;

          // Normalize
          float normalizedValue = sample * normalizationScale;
          audioFile.samples[i] = normalizedValue;
        }
      }

      fclose(file);
      return audioFile;
    }
    fseek(file, chunkEnd, SEEK_SET);
  }

  fclose(file);
  throw std::runtime_error("Could not locate audio data from file: " + path);
}

// Helpers
std::string readString(FILE *file, uint32_t length) {
  std::string name(length, '\0');
  fread(&name[0], length, 1, file);
  return name;
}

uint32_t readU32(FILE *file) {
  uint32_t value;
  fread(&value, 4, 1, file);
  return value;
}

uint16_t readU16(FILE *file) {
  uint16_t value;
  fread(&value, 2, 1, file);
  return value;
}

Here is a link to the code in its current state.