So for those who don't know, most of the actual data is in cd.wad and the metadata that actually tells you where everything is is cd.hed, which consists of a bunch of 3-uint32_t tuples with the filename_hash, the offset_into_the_wad, and the length. The upside is it's nice and fast to look things up. The downside is it's a nuisance to rip as you lack the filenames.
Here's the hash algorithm. It's a broken version of CRC-32 which may have been "obfuscated"... but for some reason they still kept the usual 0xEDB88320 polynomial so it sticks out like a sore thumb anyway.
Code: Select all
uint32_t namehash(const char *s)
{
uint32_t csum = (uint32_t)(int32_t)-1;
for(int i = 0; s[i] != '\x00'; i++) {
uint32_t a1 = (uint32_t)(uint8_t)s[i];
// Convert to lower case
if(a1 >= 'A' && a1 <= 'Z') {
a1 += 32;
}
// Broken CRC-32 implementation
//
// Normal CRC-32:
// * Shift right, if carry then XOR by 0xEDB88320
//
// THPS2 Special! Wow! edition:
// * Rotate left, shift original XOR'd val right, if carry then XOR by 0xEDB88320
uint32_t v0 = (a1 ^ csum);
uint32_t bits = (v0 & 0xFF);
// Reference implementation
for(int j = 0; j < 8; j++)
{
// Rotate left
csum = ((csum>>31)&1)|(csum<<1);
// Then apply the polynomial
if((bits & 1) != 0) {
csum ^= 0xEDB88320;
}
// Shift along
bits >>= 1;
}
}
return csum;
}
EDIT: Oops, the comment said "rotate right" - it's a left rotate.