So How DO Companies Lose Game Source Code?

Jakartalado · Jun 21, 2019

I work on a very prestigious bank in Brazil. We lost 14 days of transactions that won't be ever recovered.

Shit happens.

Hampig · Jun 21, 2019

Jakartalado said:
I work on a very prestigious bank in Brazil. We lost 14 days of transactions that won't be ever recovered.

Shit happens.

Oof. Stuff like that should not happen though. You build in redundancy and backups. Especially if you're a bank.

It's possible there's a legitimate explanation and it was a freak accident that just couldn't be helped, but 99% of the time an issue like that or losing source code points to bad management.

Deleted member 12790 · Jun 21, 2019

MylesJackWasntDown said:
Dumb question, I know little about the technical side, but no shame:

What's the difference between source code and what actually ships on the disc and was read by the Playstation?

If source code is compiled into a code/language readable by a console, wouldn't said transformation follow a strict enough set of rules that it could be reversed back into the original source code.

The answer is obviously no, or else source code couldn't technically ever be lost, but just wondering about the differences between source and what ships on disc.

The way computer processors work is that, essentially, every type of processor has their own "language" that they speak. Most PCs run on x64 cpus this day, which share a "language" between intel and AMD processors, but there are also things like ARM processors that speak their own "language." These "languages" take the form of electrical currents that run to pins on the processors in specific orders. When a processor detects electrical currents running on certain pins, it interprets that order of electrical currents as a command, such as "add these two numbers together" or "divide this number by that number" or "move this piece of data from this part of RAM to that part of RAM." The commands a processor has is known as it's opcode set.

In the old days of computing, you would write programs to these processors directly by hand. The lexiconical representation of these electrical currents is known as binary, the "1's and 0's" people talk about CPUs reading. For example, the code to tell the Sega Genesis' 68000 CPU to add two numbers together is 1101-0010 1101-0000. This is interpreted as sending an electrical high signal to pins 1, 2, 4, 7, 9, 10, and 12. When the CPU gets this signal, it then adds two numbers together and dumps them into memory. This specific type of binary programming is called bytecode, it's the pure instruction set for a program.

Writing in bytecode is monstrously difficult, so most CPUs offer a higher level mnemonic that is more human readable. The lowest type of this mnemonic is called Assembler, every processor has it's own assembler language. Assembler is written in text files that can be read by humans. For example, instead of remembering that the add command for the 68000 CPU is 1101-0010 1101-0000, it is agreed upon that the mnemonic "add.w" will accomplish the same task. These human readable text files full of mnemonics are known as source codes. Now, text characters are, in actually, also strings of binary, which adhere to a "rule" or "standard" that everyone agrees upon. The most widely used standard is ascii. This means that, for example, the symbol for the character 'A' is actually agreed upon to be the number 64, which can be represented by the binary string 0100-0000. You might realize then that the mnemonics are not 1:1 representations of the CPU command they are intended to symbolize. In order to turn source code into the correct bytecode, you must compile the code using an external program called a compiler. A compiler is kinda like a translator, it scans the source code, reading the mnemonic binary strings, and interprets them. When it sees the specific binary strings that equals to the word "add.w", it replaces it with the appropriate bytecode string. This simplifies the process of writing code for a CPU greatly.

This same loop - interpreting a string of text as a mnemonic device, can be repeated. Since different processors have different commands, Assembler source codes don't tend to match up between processors. But you can move to a higher level language, like C, which is intended to be cross platform. Languages like C work in that they have an interface layer between the source code and the eventual bytecode, kind of like a "virtual" bytecode language. C syntax to add two numbers together might be something like so:

Code:

int number1 = 1, number2 = 1;
int result = number1 + number2;

C has a compiler as well, which will interpret the above code when compiled into assembler code, which then gets compiled into bytecode, which is the binary string a program runs. By writing compilers for all sorts of processors, you can dictate the conversion process from C to bytecode so that, for different CPUs, the same C code will turn into different end binaries to run for each appropriate processor.

Now going back to my code example above. You'll notice I assign some arbitrary variable identifiers to help me keep track of what I'm doing. I named one variable Number1, and one Number2, and stored them in a variable called result. This is done entirely for the benefit of people reading the source code, so they can quickly interpret what the code is supposed to do. The names people use to identify variables and functions and other stuff in their source codes are known as symbols. Symbols are largely meaningless to the actual CPU running the program, so when a source code is compiled, symbols are stripped out (removed) from the resulting binary. Without the correct symbols, what the code intends to do is more difficult to read at a glance. Take for example the same code, without symbols:

Code:

int i = 1; ii=2;
int iii = i + ii;

Now, since this is a super simple example of just adding two numbers together, it's still somewhat readable, but the arbitrarily chosen symbols we used to represent our variables makes seeing the logic a lot more difficult. The more complex your program, the more valuable these symbols are. Now, you are correct that a compiled binary in the end is indeed a string of instructions that represents the end result of a program, and you can totally reinterpret those instructions back into assembler. Doing this is called a disassembly. It's generally not possible to go from assembly back to, say, C or any higher level language except by hand, interpreting the code through a human. But even going from bytecode to assembler presents problem, because assembly source codes also have symbols to make them more readable. Consider the following, this is part of the source code for Sonic the Hedgehog. This is from a disassembly that has been worked on by community members for years, and thus the symbols have been slowly added back in:

Code:

; ||||||||||||||| S U B R O U T I N E |||||||||||||||||||||||||||||||||||||||

; sub_272E: PalLoad2:
PalLoad_Now:
    lea    (PalPointers).l,a1
    lsl.w    #3,d0
    adda.w    d0,a1
    movea.l    (a1)+,a2
    movea.w    (a1)+,a3

    move.w    (a1)+,d7
    dbf    d7,-

    rts
; End of function PalLoad_Now

And here is the same code as the result of a mechanical disassembly:

Code:

; ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ S U B    R O U T    I N E ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ


sub_2764:                ; CODE XREF: ROM:000042E4p
        lea   (a4).l,a1
        lsl.w    #3,d0
        adda.w    d0,a1
        movea.l    (a1)+,a2
        movea.w    (a1)+,a3
        suba.l    #$B00,a3
        move.w    (a1)+,d7

loc_277A:                ; CODE XREF: sub_2764+18j
        move.l    (a2)+,(a3)+
        dbf    d7,loc_277A
        rts   
; End of function sub_2764

Even with no experience at all, you should be able to deduce this code does something regarding loading a palette, because of the symbols in the annotated source code. The mechanically disassembled code lacks all symbols, replaced by arbitrary identifiers the disassembler chose mechanically, and thus it becomes indecipherable.

Additionally, modern compilers don't just spit out 1:1 code from the mnemonics anymore, they do automatic optimizations. Some of these kinds of optimizations can be complex to understand for humans, but intuitive for machines. This means that the disassembled code you get out might not even match the original source code put in. Now, they'll function the same, but that means even the original author of the source code might have problems following their disassembled code as it doesn't necessarily match what they originally wrote. And, because I said all CPUs basically speak their own "language," many times the disassembled code is largely useless for porting a game to a new platform anyways, because they don't disassemble into a higher enough level language like C to be portable in the first place.

As for why source codes got lost, back in the old days of computing, development standards like SVN didn't exist. Old programs were simple enough to be coded by 1 or 2 guys by themselves, and thus they didn't need to share their code with lots of people. A single programmer could understand how their entire code worked, every bit of it. A lot of source code subversion practices these days were born out of necessity, the need for multiple people to be able to work on one source code at once. So things like redundant backups and forks and such have emerged as logistical solutions. These problems are compounded in japan. In the west, game development is closely linked to computer science, but in the 80's and 90's, japanese developers were cowboy coders, not necessarily formally trained computer programmers. Comp Sci in the west has made developers approach writing code in a more scientific manner, with standardized practices for source distribution. In Japan in the 80's and 90's, every coder might have their own esoteric way of building code or maintaining their source. Some old japanese coders would just write their entire game in a single text file that they'd pass around from person to person, which is insane to think about regarding modern programmers. Additionally, japan has a problem with space. Japan is a small country, that has a limited amount of land. Storing source codes takes servers, disks, etc. Those take space. When you're working out of a tiny office, you don't necessarily have the space for proper backup and maintenance of source codes.

Jakartalado · Jun 21, 2019

Hampig said:
Oof. Stuff like that should not happen though. You build in redundancy and backups. Especially if you're a bank.

It's possible there's a legitimate explanation and it was a freak accident that just couldn't be helped, but 99% of the time an issue like that or losing source code points to bad management.

We had like 7 or 8 systems failing at the same time, backups were between maintenance, felt like doomsday at work. We got people literally screaming and crying in the office.

Hampig · Jun 21, 2019

Jakartalado said:
We had like 7 or 8 systems failing at the same time, backups were between maintenance, felt like doomsday at work. We got people literally screaming and crying in the office.

Well, there's that 1% freak accident. It's almost funny how terrible that sounds.

Jonscrambler · Jun 21, 2019

backups can fail, backups of backups can fail too

Shinobido Heart · Jun 21, 2019

I hope they didn't lose the game code for Brave Fencer Musashi. :(

Deleted member 12790 · Jun 21, 2019

Mugy said:
What are some cases of games with lose source code?

Virtually every game ever prior to about 2000-ish.

If a company doesn't have the source code of a title, but want to make a port, what options do they have?

You can hire people who are explicitly good about reading disassembled sources and transfering the logic, by hand, to a higher level language. Christian "Taxman" Whitehead and Simon "Stealth" Thomley of Sonic Mania fame are infamous for being able to do this. M2 is another company that is famous for hiring such experts.

Alternatively, you can build an emulator. Emulators don't attempt to make sense of the original logic of the source code as a human readable source. Rather, they run the compiled bytecode binary as-is, and instead recreate the original hardware in software. Think of emulators like instant translators, like a virtual piece of hardware, that maps opcode commands from old hardware, to their modern equivalents on the hardware they run on. I.e. letting an x64 CPU, run 68000 CPU opcodes.

The last way you can do this is by eye. Like, have the programmers play the shit out of the game, and just try to recreate it as best they can. These types of ports are notoriously awful, and aren't really done anymore. Sonic Genesis on the GBA is an example of such a port. Compare this:

To Simon "Stealth" Thomley's port to the GBA, which was done by manually reinterpreting the disassembly into something that runs natively on the GBA:

hikarutilmitt · Jun 21, 2019

Roy said:
By engaging in foolhardy decision making.

Hindsight, folks. Doesn't get much better than this.

The reality of a lot of already mentioned things like storage, archival necessity, business decisions etc being what they are, a lot of things are flat out lost in fires, earthquakes and other natural phenomena. It's not just source code to things, fans have been working for years to try and get a complete set of Doctor Who put together, whole seasons are gone, several of them are in shambles to the tune of only an episode or two.

It's insane, now, to think that people wouldn't want to preserve these things, but you just didn't do this because it didn't really matter as the post-release wasn't even thought about.

le-seb · Jun 21, 2019

Vault said:
Storing gigs of data in the 90s was a lot harder

Tape drives existed, though.

ReyVGM · Jun 21, 2019

MylesJackWasntDown said:
What's the difference between source code and what actually ships on the disc and was read by the Playstation?

Layman example:
Let's say you are an architect and someone wants you to rebuild a house they had when they were little, but they don't have the plans anymore, they just have a bunch of pictures.

The plans is the source code. The pictures is the ROM someone dumped on the Internet.

Rytheran · Jun 21, 2019

Take this example of when Beamdog was getting together the original assets for their Baldur's Gate remaster.

We had drafted the original deal in the context of a Baldur's Gate: HD. Our plan was simple: grab the original artwork, clean it up, re-render it at higher resolutions and with better materials, thus creating stunning versions of the areas everyone remembers. We planned to take the character models and re-render them with many more frames of animation and add new orientations to the movement to make the game smoother. We nailed down the core terms, got everyone on the same page and we got our first drop of the assets from BioWare.

A few days later we noticed a large hole where the source art should be -- stuff like 3DS Max files and texture images. "No problem," I said; I contacted Derek French over at BioWare and he dug further and sent us more data. We again dug through and failed to find the source art. I made arrangements to visit BioWare with a removable drive and work with Derek and the IS department to find the assets.

After two days of searching we came to the horrible realization that the source artwork was stored on a departmental drive and not a project drive, and as such was not frequently backed up. We dug through tape backups to no avail. The source art was lost.

source: https://gamasutra.com/view/feature/190432/postmortem_overhaul_games_.php?page=4

In this case it was as simple as part of the project being stored on the wrong drive and now part of it is lost forever.

Professor Beef · Jun 21, 2019

It always bugs me when people try and wag their fingers at companies for losing source codes from the 90s. Like yeah, hindsight is 20/20, but nobody could've properly predicted how tech and consoles would have evolved since then. The cloud was basically science fiction.

Komo · Jun 21, 2019

Mugy said:
What are some cases of games with lose source code? If a company doesn't have the source code of a title, but want to make a port, what options do they have?

Phantom Dust wasn't made with it's source. Was all hacked to make it work with Xbox One.

LakeEarth · Jun 21, 2019

"Why would someone in 2019 want to play old games?"

There was a lot of thinking like this. Like when in the PS1/N64 days, 2D games were thought to be obsolete. Thus, their value was underappreciated and so their archival was not given priority.

Edward850 · Jun 21, 2019

Krejlooc said:
Even with no experience at all, you should be able to deduce this code does something regarding loading a palette, because of the symbols in the annotated source code. The mechanically disassembled code lacks all symbols, replaced by arbitrary identifiers the disassembler chose mechanically, and thus it becomes indecipherable.

To expand on this for MylesJackWasntDown and others who are interested:
Even without code optimisations, disassembled code can produce some truly incomprehensible situations. With the advent of languages like ANSI C (and especially C++), the compiled result has a lot of indirect assumptions about what the actual code should be doing as well as things that make sense when you run the code directly, but trying to read it static loses vital context. This problem even affects well known CPU instruction sets like x86. A CPU doesn't really care what is or isn't code and will happily execute stuff that wasn't originally/explicitly written as code, or even rewrite what is code. There's no real difference between RAM used to store ingame variables and where the code exists to run it, except for OS based protections of where the executing code is stored. This is where arbitrary execution in old NES/SNES games come from that allow rewriting a game into a Twitch client. For this same reason, none of the assembly code can necessarily taken at face value because the code we say may never be executed, or the code we are looking for doesn't even look like code and doesn't exist where it should be as the runtime execution changes how the data is supposed to be interpreted.

On one of the disassemblies we've done, we were constantly hit with instructions that mathematically were impossible (such as bitshifts that were either too large or would always produce zero anyway) because they were either weird assortments of C code that depended on exacting undefined x86 behaviour or were just a combination of bytecode instructions that looked like one thing but produced a very different result due to the way they were organised.
You can also get a lot of problems with self modifying code. DRM does this a lot (this video is a good example, but may be a tad technical), though we also strike this a lot in old DOS games. Doom notably did this to speed up its plane drawing code, as the source code is released nobody really notices it, but if you are writing an x86 emulator and lack support for self modifying code the plane drawing ends up producing nothing but noise. This can make taking a disassembly code at face value basically impossible.

We strike this often with games designed for entirely different CPUs as well. The PS1 and N64 can sometimes produce disassemblies that are outright impossible, simply due to the not-fully documented nature of their instruction sets (this is usually easy to spot and resolve though after enough time, but the symptoms have a habit of changing per game).

Mage_of_Cinder · Jun 21, 2019

30 years ago, data archiving and preservation was not a priority. Who knew that long ago there would be a demand for it?

Rickenslacker · Jun 21, 2019

udivision said:
Physical Past, ladies and gentlemen

Lol

:)

Liam Allen-Miller · Jun 21, 2019

Hampig said:
Oof. Stuff like that should not happen though. You build in redundancy and backups. Especially if you're a bank.

It's possible there's a legitimate explanation and it was a freak accident that just couldn't be helped, but 99% of the time an issue like that or losing source code points to bad management.

Seriously though! Shit just happens sometimes. Archival is tough on the best of days, archiving 100% off all records that people in the future may need is honestly borderline impossible!

Borman · Jun 21, 2019

Because preservation is hard. Preserving source code is only part of it. You need the assets to go with it, tools, you need the dependencies, sometimes you need specific hardware or dongles, and then you need the knowledge to make it make sense.

The fact is, few companies have archivists working at them. Some do, particularly nowadays, but few do. It needs to be someone's job to preserve, it's not just something you do in a day. Writing documentation is hard too. And to top it all off, it changes often, sometimes multiple times a day. No one is going to take their time and make every change to a document as they go along, especially when it isn't something that they are even sure will work.

Even if you have literally everything, from the source, the assets, the tools, the knowledge in the form of documents, etc, it still isn't easy. It can be a multi-year project to revive a single game by a talented team.

And then, lets say you did that all once. Preservation is a lifelong problem. It doesn't stop, which is something the movie industry has found out real quickly after they did so much digitization. Who is going to keep migrating formats? Who is going to ensure that a bit hasn't flipped?

And what happens when the person doing all that leaves? Who is going to follow up next? And if the studio moves, who is staying on top of making sure that data makes it out properly? I can tell you from experience, the answer usually is no one, and I've occasionally gotten lucky to save things from the literal trash.

Finally, what happens when the studio shuts down? Many companies that seemed as if it would be impossible to lose have gone away. Are the assets being bought by another company? If so, look above for all the challenges that come even if they have data. If not, where does it go?

Then there are elements that are nearly impossible to preserve, APIs and libraries that only exist on servers that you only pay for access to. When they go, you are stuck having to rebuild something from scratch, and then the question is have you made something new, or is that preservation?

I'm lucky enough to be a curator at The Strong National Museum of Play, but I can tell you first hand that game preservation is difficult. It is our job to try to predict what someone today, and what someone possibly 100 years in the future wants to not only play but research and learn from. Even when a company as an archivist or even a team devoted to preservation, data is lost. Historians do their best to fill in the blanks of lost information regardless of the trade or industry, gaming is no different.

Dernhelm · Jun 21, 2019

Liam Allen-Miller said:
Seriously though! Shit just happens sometimes. Archival is tough on the best of days, archiving 100% off all records that people in the future may need is honestly borderline impossible!

Basically this. The question is less 'how do companies lose game source codes' and more 'how do companies manage to maintain virtually any and all data relevant to their clients in the long term'.

jb1234 · Jun 21, 2019

Krejlooc said:
The way computer processors work is that, essentially, every type of processor has their own "language" that they speak. Most PCs run on x64 cpus this day, which share a "language" between intel and AMD processors, but there are also things like ARM processors that speak their own "language." These "languages" take the form of electrical currents that run to pins on the processors in specific orders. When a processor detects electrical currents running on certain pins, it interprets that order of electrical currents as a command, such as "add these two numbers together" or "divide this number by that number" or "move this piece of data from this part of RAM to that part of RAM." The commands a processor has is known as it's opcode set.

In the old days of computing, you would write programs to these processors directly by hand. The lexiconical representation of these electrical currents is known as binary, the "1's and 0's" people talk about CPUs reading. For example, the code to tell the Sega Genesis' 68000 CPU to add two numbers together is 1101-0010 1101-0000. This is interpreted as sending an electrical high signal to pins 1, 2, 4, 7, 9, 10, and 12. When the CPU gets this signal, it then adds two numbers together and dumps them into memory. This specific type of binary programming is called bytecode, it's the pure instruction set for a program.

Writing in bytecode is monstrously difficult, so most CPUs offer a higher level mnemonic that is more human readable. The lowest type of this mnemonic is called Assembler, every processor has it's own assembler language. Assembler is written in text files that can be read by humans. For example, instead of remembering that the add command for the 68000 CPU is 1101-0010 1101-0000, it is agreed upon that the mnemonic "add.w" will accomplish the same task. These human readable text files full of mnemonics are known as source codes. Now, text characters are, in actually, also strings of binary, which adhere to a "rule" or "standard" that everyone agrees upon. The most widely used standard is ascii. This means that, for example, the symbol for the character 'A' is actually agreed upon to be the number 64, which can be represented by the binary string 0100-0000. You might realize then that the mnemonics are not 1:1 representations of the CPU command they are intended to symbolize. In order to turn source code into the correct bytecode, you must compile the code using an external program called a compiler. A compiler is kinda like a translator, it scans the source code, reading the mnemonic binary strings, and interprets them. When it sees the specific binary strings that equals to the word "add.w", it replaces it with the appropriate bytecode string. This simplifies the process of writing code for a CPU greatly.

This same loop - interpreting a string of text as a mnemonic device, can be repeated. Since different processors have different commands, Assembler source codes don't tend to match up between processors. But you can move to a higher level language, like C, which is intended to be cross platform. Languages like C work in that they have an interface layer between the source code and the eventual bytecode, kind of like a "virtual" bytecode language. C syntax to add two numbers together might be something like so:

Code:

int number1 = 1, number2 = 1; int result = number1 + number2;

C has a compiler as well, which will interpret the above code when compiled into assembler code, which then gets compiled into bytecode, which is the binary string a program runs. By writing compilers for all sorts of processors, you can dictate the conversion process from C to bytecode so that, for different CPUs, the same C code will turn into different end binaries to run for each appropriate processor.

Now going back to my code example above. You'll notice I assign some arbitrary variable identifiers to help me keep track of what I'm doing. I named one variable Number1, and one Number2, and stored them in a variable called result. This is done entirely for the benefit of people reading the source code, so they can quickly interpret what the code is supposed to do. The names people use to identify variables and functions and other stuff in their source codes are known as symbols. Symbols are largely meaningless to the actual CPU running the program, so when a source code is compiled, symbols are stripped out (removed) from the resulting binary. Without the correct symbols, what the code intends to do is more difficult to read at a glance. Take for example the same code, without symbols:

Code:

int i = 1; ii=2; int iii = i + ii;

Now, since this is a super simple example of just adding two numbers together, it's still somewhat readable, but the arbitrarily chosen symbols we used to represent our variables makes seeing the logic a lot more difficult. The more complex your program, the more valuable these symbols are. Now, you are correct that a compiled binary in the end is indeed a string of instructions that represents the end result of a program, and you can totally reinterpret those instructions back into assembler. Doing this is called a disassembly. It's generally not possible to go from assembly back to, say, C or any higher level language except by hand, interpreting the code through a human. But even going from bytecode to assembler presents problem, because assembly source codes also have symbols to make them more readable. Consider the following, this is part of the source code for Sonic the Hedgehog. This is from a disassembly that has been worked on by community members for years, and thus the symbols have been slowly added back in:

Code:

; ||||||||||||||| S U B R O U T I N E ||||||||||||||||||||||||||||||||||||||| ; sub_272E: PalLoad2: PalLoad_Now: lea (PalPointers).l,a1 lsl.w #3,d0 adda.w d0,a1 movea.l (a1)+,a2 movea.w (a1)+,a3 move.w (a1)+,d7 dbf d7,- rts ; End of function PalLoad_Now

And here is the same code as the result of a mechanical disassembly:

Code:

; ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ S U B R O U T I N E ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ sub_2764: ; CODE XREF: ROM:000042E4p lea (a4).l,a1 lsl.w #3,d0 adda.w d0,a1 movea.l (a1)+,a2 movea.w (a1)+,a3 suba.l #$B00,a3 move.w (a1)+,d7 loc_277A: ; CODE XREF: sub_2764+18j move.l (a2)+,(a3)+ dbf d7,loc_277A rts ; End of function sub_2764

Even with no experience at all, you should be able to deduce this code does something regarding loading a palette, because of the symbols in the annotated source code. The mechanically disassembled code lacks all symbols, replaced by arbitrary identifiers the disassembler chose mechanically, and thus it becomes indecipherable.

Additionally, modern compilers don't just spit out 1:1 code from the mnemonics anymore, they do automatic optimizations. Some of these kinds of optimizations can be complex to understand for humans, but intuitive for machines. This means that the disassembled code you get out might not even match the original source code put in. Now, they'll function the same, but that means even the original author of the source code might have problems following their disassembled code as it doesn't necessarily match what they originally wrote. And, because I said all CPUs basically speak their own "language," many times the disassembled code is largely useless for porting a game to a new platform anyways, because they don't disassemble into a higher enough level language like C to be portable in the first place.

As for why source codes got lost, back in the old days of computing, development standards like SVN didn't exist. Old programs were simple enough to be coded by 1 or 2 guys by themselves, and thus they didn't need to share their code with lots of people. A single programmer could understand how their entire code worked, every bit of it. A lot of source code subversion practices these days were born out of necessity, the need for multiple people to be able to work on one source code at once. So things like redundant backups and forks and such have emerged as logistical solutions. These problems are compounded in japan. In the west, game development is closely linked to computer science, but in the 80's and 90's, japanese developers were cowboy coders, not necessarily formally trained computer programmers. Comp Sci in the west has made developers approach writing code in a more scientific manner, with standardized practices for source distribution. In Japan in the 80's and 90's, every coder might have their own esoteric way of building code or maintaining their source. Some old japanese coders would just write their entire game in a single text file that they'd pass around from person to person, which is insane to think about regarding modern programmers. Additionally, japan has a problem with space. Japan is a small country, that has a limited amount of land. Storing source codes takes servers, disks, etc. Those take space. When you're working out of a tiny office, you don't necessarily have the space for proper backup and maintenance of source codes.

This is fascinating, thanks for writing!

GaimeGuy · Jun 23, 2019

Jakartalado said:
We had like 7 or 8 systems failing at the same time, backups were between maintenance, felt like doomsday at work. We got people literally screaming and crying in the office.

how does a financial institution even recover from something like that? You lose everything from people withdrawing 5 bucks to pay for candy at a gas station to security deposits, land acquisitions, Utility payments, and payroll distributions

sibarraz · Jun 23, 2019

Ardiloso said:
I think it's strange how this is a thing with japanese devs only. I've never heard a western dev talking about lost source code.

In Japanese culture people normally throw away things that they thing are no longer worthy of use because they dont want to waste space, last year I remeber that Falcom posted on Twotter that they were going to throw away to the trash dozens of games from the 80-90's, thankfully a vg museum saw this and asked them to gove them over.

Also thanks to this culture a friend who visited Japan bought a pc engine for a VERY low price in a flea market

Kieli · Jun 23, 2019

Roy said:
By engaging in foolhardy decision making.

I dunno if source control even existed back then. And even if it did, storage was at a premium. It isn't like it now where you have access to unlimited storage via cloud solutions like AWS Simple Storage Service or MS Azure Blob Storage.

Deleted member 12790 · Jun 23, 2019

le-seb said:
Tape drives existed, though.

I am what some might call a digital hoarder. I have disks and tapes and such from as far back as 1988 that I still have today. Literally, on a server in my closet, I have files that old, backed up redundantly, from their original mediums. I still have tape drives, I still have tapes. All these disks and tapes and tape drives and such take up a lot of space. I have an entire room dedicated to old storage mediums on shelves. In Japan, that space was a premium. It's not just that the data storage requirement was so high, the physical space required to store those mediums was also prohibitive.

Ragnorok64 · Jun 23, 2019

Just want to say thanks for all the responses. Some have been too in depth for me to parse yet, but this has been educational.

Sumio Mondo · Jun 23, 2019

Think of all of the lost movies from the 80s/90s that have only been released on VHS since theatrical release or music only released on vinyl. Some of it just gets lost to time. It's sad but it happens.

Deleted member 12790 · Jun 23, 2019

Edward850 said:
We strike this often with games designed for entirely different CPUs as well. The PS1 and N64 can sometimes produce disassemblies that are outright impossible, simply due to the not-fully documented nature of their instruction sets (this is usually easy to spot and resolve though after enough time, but the symptoms have a habit of changing per game).

This really goes beyond just the PS1 and N64, to clarify. When talking about old consoles, what made them so special were the unique and custom chips inside that were used to offload specific tasks. Going all the way back to the late 70's and early 80's, the main CPUs for virtually every video game console until about the arrival of the Playstation 2 were generally well known, off the shelf CPUs. Typically some sort of z80, 68000, or 6502 variant. But beyond the main CPU, consoles were a collection of bespoke ASICs and co-processors, and lots of those were proprietary and not well understood. Sega's Video Display Processor (a misnomer actually, since it doesn't run code; video controller would be more apt) has roots that go back to the Texas Instruments TMS9918, but had 4 generations of bespoke Sega specific functionality bolted on by the time the Sega Saturn arrived. If you are, for example, reading a disassembly or building an emulator to run the code of the main CPU, those types of processors are well understood and documented and thus slightly easier to interpret. These custom chips had their own bytecode "languages" just like the CPUs, spoken to through data ports or dedicated buses. It's often the code that interfaces with these special chips that cause the most problems. Any Panzer Dragoon Saga source code found, for example, would include lots of low level assembly meant to talk to the Saturn's dual VDPs through custom bytecode. Merely finding people today with enough expertise to comprehend that is difficult in its own right. The recent ports of Shenmue I and II actually ran into this very same problem, because Yu Suzuki's crew would write very, very low level Dreamcast assembly.

impingu1984 · Jun 23, 2019

The pinnacle station dlc source code for mass effect 1 was corrupted... This is why it's not included in the mass effect trilogy on PS3

https://web.archive.org/web/2016031...er.com/ryan__warden/status/264097809735749632

Shit happens

Ragnorok64 · Jun 23, 2019

impingu1984 said:
The pinnacle station dlc source code for mass effect 1 was corrupted... This is why it's not included in the mass effect trilogy on PS3

https://web.archive.org/web/2016031...er.com/ryan__warden/status/264097809735749632

Shit happens

Woah, that is surprisingly recent.

Akela · Jun 23, 2019

collige said:
Decompiling is technically possible in some programming languages, but it's generally a one way process and the decompiled code isn't going to be as human readable as the original source. That's just talking about a normal PC program, getting into extracting assets, compiling for consoles rather than dev kits, and other game-specific stuff makes it not a real solution for preservation.

Plus it's not just the source code in many cases, it's also the art assets that go missing - that can be catastrophic if you ever want to go back and remaster the game as the original source assets for textures/sound tend to be much higher quality (eg. Photoshop files with individually editable layers, high poly sculpts of assets that can be rebaked into higher res normal maps, textures authored at double or triple the resolution of that appears in the final game, etc.) So many games with pre-rendered backgrounds have not only lost the original scene files that could have allowed the backgrounds to be re-rendered, but they've even lost the original renders themselves, meaning the all that exists is the low-res backgrounds that appeared on the final discs. When you consider that so much time is spent creating fairly non-destructive workflows for the express purpose of making it easy to go back and make changes to art assets after the fact, the fact that in many cases that no longer possible is a massive shame. Unless a box of floppies is discovered in a store room sometime in the future, the original backgrounds for games such as Grim Fandango and Final Fantasy IX are lost to time.

Non-game example, but to see how much of a difference archiving this stuff can make, Pixar have been able to not only re-release Toy Story 1 & 2 at higher resolutions then the original 900p cinema release, but even re-render the films in 4K with (virtual) stereoscopic cameras, since they had the foresight to keep the original source assets for the entire film. They were even able to release a series of texture packs using assets straight from the original film projects for people to use (not that a pack of fairly low res tiling textures from the 90's is particularly useful, but still).

Here's the original DVD release:

a0%20John%20Lasseter%20Toy%20Story%20%20Woody%20%20Buzz%20Lightyear%20DVD%20Review%201812-o.jpg

Compared to the Blu-ray:

Imagine if all that existed of the original film was the fairly low res master on 35mm (or worse the DVD or VHS release). So many games today find themselves in that position, sadly.

Deleted member 17388 · Jun 23, 2019

Sometimes maybe like how Toy Story 2 got almost deleted? :v

https://www.youtube.com/watch?v=8dhp_20j0Ys

Andri · Jun 23, 2019

Ardiloso said:
I think it's strange how this is a thing with japanese devs only. I've never heard a western dev talking about lost source code.

I think that is more of a case that there are not that many western devs from that era left.

GameDev · Jun 23, 2019

You'd be surprised how many basic rules of project management get broken in the gaming industry.

I have worked on titles with massive budgets where we ended up spending more than a week trying to find a binary file required to run the game properly because it just got lost in the project repository.

There are a lot of incompetent people in important positions in the gaming industry.

Cucurbitacée · Jun 23, 2019

That reminds me that up to the early 80s printing the source code on paper was the normal way of archiving it. Imagine doing that with a modern project!

Deleted member 12790 · Jun 23, 2019

Andri said:
I think that is more of a case that there are not that many western devs from that era left.

This isn't true at all

Borman · Jun 23, 2019

Cucurbitacée said:
That reminds me that up to the early 80s printing the source code on paper was the normal way of archiving it. Imagine doing that with a modern project!

Yep, we (meaning the Museum of Play) have the source code to River Raid, for instance, and it is all printed code. And it takes a lot of pages just for an Atari game.

tommyv2 · Jun 23, 2019

Shift+Del

I'd be fired if I did that at work. I have no idea how this happens in the wild.

senj · Jun 23, 2019

whoops submitted too soon

Teh_Lurv · Jun 23, 2019

I recall reading predictions that pre-smartphone mobile video games are the next genre of games in danger of being lost forever.

justiceiro · Jun 23, 2019

Just like everything else, you just do. When you move, when people leave. It just happens.

Imran · Jun 23, 2019

Mugy said:
What are some cases of games with lose source code? If a company doesn't have the source code of a title, but want to make a port, what options do they have?

In some cases, Nintendo keeps an archive of source code for games on their system, but I believe this was only a practice until 1999 or so.

This is why Collection of Mana is only on Switch; Square Enix lost the code to the original game, so they had to get it from Nintendo. However, doing that means the game only releases on Nintendo systems, because of course Nintendo isn't going to give you the source code to a game to release it on the PS4.

In the case of Kingdom Hearts HD, some developers essentially just rebuilt the game from scratch without access to the original source code.

senj · Jun 23, 2019

MylesJackWasntDown said:
Dumb question, I know little about the technical side, but no shame:

What's the difference between source code and what actually ships on the disc and was read by the Playstation?

If source code is compiled into a code/language readable by a console, wouldn't said transformation follow a strict enough set of rules that it could be reversed back into the original source code.

The answer is obviously no, or else source code couldn't technically ever be lost, but just wondering about the differences between source and what ships on disc.

It's a lossy transformation for a lot of reasons.

Fundamentally our computers are mostly Van Neumann machines, which means that any given set of bytes in an executable could be instructions for the computer or could just be data for instructions to act on (concrete example — the value hexadecimal 07 could represent either the number 7, perhaps hitpoints for a low level enemy, OR the opcode instruction for popping the stack pointer), and there's no actual way to tell them apart unless and until while the program is executing it gets to a state where it wants to execute those bytes (they must represent instructions — or the program has really gone off the rails, and it's trying to execute data, a bad bug).

So there's that.

Decompilers use a lot of heuristics and simulated executions to make educated guesses at distinguishing instructions from data, so that helps. But fundamentally what they give you doesn't look much like the original source code. The structure of the code is largely lost or radically transformed when it's compiled — functions you wrote are optimized away, others you didn't write are introduced as optimizations. Names of variables are thrown out when the compiler is done with them, there's no relevant concept like that to the machine, just memory addresses, so what in the original source code was a variable named "mana_pool_size" the decompiler can only assign a random arbitrary name to "localVar4". In fact because of the way high level languages work, and the way decompilers have to guess at what variables existed in the original code, you'll end up with a lot of extraneous extra variables in the decompiled code that were originally just unnamed temporary intermediate values in longer expressions in the original source code.

In short, decompiled code is a bit of a mess, because much of the structure of code is lost during compilation.

Imran · Jun 23, 2019

Ardiloso said:
I think it's strange how this is a thing with japanese devs only. I've never heard a western dev talking about lost source code.

Oh, no, Western devs lose source all the time. There's a joke that in every former 1990s PC studio, there's probably a box containing source code hidden in the walls. Studios close, egos get in the way, different people think they own projects because it's their baby, files get taken home and never returned, etc.

GrayDock · Jun 23, 2019

I think that the objective at the time was the finished game and only the game. Everything else, like art assets, music sheets, the code itself was disposable.
The same thing happened with old cartoons where the storyboards were discarded after the cartoon was ready.

Dreams-Visions · Jun 23, 2019

Patitoloco said:
The cloud wasn't a thing for the most part. The source code (depending on the era) might be in a collection of floppies, cds or hard drives, all of them physical objects than can be lost or broken.

yep. hard drives go bad, CD's, floppies and DVDs can rot. to say nothing of simple mistakes. "oh I thought that HDD was just a spare" when it held the last remaining copy of something.

GMM · Jun 23, 2019

Generally they didn't have good options available for versioning their source code and there was a general lack of foresight that the code could be usable in the future.

It's easy to look back and ask those questions, but it's just another instance of learning from past missteps. Another great example is the BBC taping over episodes of Doctor Who thinking they didn't have a reason to ever broadcast them again and tapes were really expensive, so to this day I believe a fair amount of Doctor Who episodes are still missing because of poor archiving procedures stemming from a lack of experience.

Even today versioning for video games is not easy, especially when dealing with source assets that are not code, archiving every source asset and versions of those assets for something like textures, models, video and audio can be extremely costly and hard to do right since projects can easy be of terabytes or even petabytes in size.

Projects that require extreme amount of data to be archived like creating video games, tv or movies are prone for data loss simply because of the monetary costs associated with it, those type of projects need better solutions for smaller teams.

Dremorak · Jun 23, 2019

Step 1: Back up to tape
Step 2: Move buildings 5 times over 20 years and suddenly find you lost some of tapes
Step 2: Building floods and now all tapes are water damaged
Step 2: etc etc etc etc etc

The Spoony Hou · Jun 23, 2019

OP, so I suppose you've never lost any files on a pc? Nothing ever?

It seems easier to lose stuff than to save it for years.

So How DO Companies Lose Game Source Code?

User requested account closure

User requested account closure

Author - NES Endings Compendium

Official ResetEra™ Chao Puncher

Info Analyst

Software & Netcode Engineer at Nightdive Studios

Digital Games Curator at The Strong Museum

Very low key

Prophet of Regret - One Winged Slayer

Self-requested ban

User requested account closure

User requested account closure

User requested account closure

User requested account closure

Digital Games Curator at The Strong Museum