I work on a very prestigious bank in Brazil. We lost 14 days of transactions that won't be ever recovered.
Shit happens.
Shit happens.
Oof. Stuff like that should not happen though. You build in redundancy and backups. Especially if you're a bank.I work on a very prestigious bank in Brazil. We lost 14 days of transactions that won't be ever recovered.
Shit happens.
Dumb question, I know little about the technical side, but no shame:
What's the difference between source code and what actually ships on the disc and was read by the Playstation?
If source code is compiled into a code/language readable by a console, wouldn't said transformation follow a strict enough set of rules that it could be reversed back into the original source code.
The answer is obviously no, or else source code couldn't technically ever be lost, but just wondering about the differences between source and what ships on disc.
int number1 = 1, number2 = 1;
int result = number1 + number2;
int i = 1; ii=2;
int iii = i + ii;
; ||||||||||||||| S U B R O U T I N E |||||||||||||||||||||||||||||||||||||||
; sub_272E: PalLoad2:
PalLoad_Now:
lea (PalPointers).l,a1
lsl.w #3,d0
adda.w d0,a1
movea.l (a1)+,a2
movea.w (a1)+,a3
move.w (a1)+,d7
dbf d7,-
rts
; End of function PalLoad_Now
; ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ S U B R O U T I N E ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ
sub_2764: ; CODE XREF: ROM:000042E4p
lea (a4).l,a1
lsl.w #3,d0
adda.w d0,a1
movea.l (a1)+,a2
movea.w (a1)+,a3
suba.l #$B00,a3
move.w (a1)+,d7
loc_277A: ; CODE XREF: sub_2764+18j
move.l (a2)+,(a3)+
dbf d7,loc_277A
rts
; End of function sub_2764
Oof. Stuff like that should not happen though. You build in redundancy and backups. Especially if you're a bank.
It's possible there's a legitimate explanation and it was a freak accident that just couldn't be helped, but 99% of the time an issue like that or losing source code points to bad management.
Well, there's that 1% freak accident. It's almost funny how terrible that sounds.We had like 7 or 8 systems failing at the same time, backups were between maintenance, felt like doomsday at work. We got people literally screaming and crying in the office.
If a company doesn't have the source code of a title, but want to make a port, what options do they have?
Hindsight, folks. Doesn't get much better than this.
Tape drives existed, though.
What's the difference between source code and what actually ships on the disc and was read by the Playstation?
We had drafted the original deal in the context of a Baldur's Gate: HD. Our plan was simple: grab the original artwork, clean it up, re-render it at higher resolutions and with better materials, thus creating stunning versions of the areas everyone remembers. We planned to take the character models and re-render them with many more frames of animation and add new orientations to the movement to make the game smoother. We nailed down the core terms, got everyone on the same page and we got our first drop of the assets from BioWare.
A few days later we noticed a large hole where the source art should be -- stuff like 3DS Max files and texture images. "No problem," I said; I contacted Derek French over at BioWare and he dug further and sent us more data. We again dug through and failed to find the source art. I made arrangements to visit BioWare with a removable drive and work with Derek and the IS department to find the assets.
After two days of searching we came to the horrible realization that the source artwork was stored on a departmental drive and not a project drive, and as such was not frequently backed up. We dug through tape backups to no avail. The source art was lost.
Phantom Dust wasn't made with it's source. Was all hacked to make it work with Xbox One.What are some cases of games with lose source code? If a company doesn't have the source code of a title, but want to make a port, what options do they have?
To expand on this for MylesJackWasntDown and others who are interested:Even with no experience at all, you should be able to deduce this code does something regarding loading a palette, because of the symbols in the annotated source code. The mechanically disassembled code lacks all symbols, replaced by arbitrary identifiers the disassembler chose mechanically, and thus it becomes indecipherable.
:)
Seriously though! Shit just happens sometimes. Archival is tough on the best of days, archiving 100% off all records that people in the future may need is honestly borderline impossible!Oof. Stuff like that should not happen though. You build in redundancy and backups. Especially if you're a bank.
It's possible there's a legitimate explanation and it was a freak accident that just couldn't be helped, but 99% of the time an issue like that or losing source code points to bad management.
Basically this. The question is less 'how do companies lose game source codes' and more 'how do companies manage to maintain virtually any and all data relevant to their clients in the long term'.Seriously though! Shit just happens sometimes. Archival is tough on the best of days, archiving 100% off all records that people in the future may need is honestly borderline impossible!
The way computer processors work is that, essentially, every type of processor has their own "language" that they speak. Most PCs run on x64 cpus this day, which share a "language" between intel and AMD processors, but there are also things like ARM processors that speak their own "language." These "languages" take the form of electrical currents that run to pins on the processors in specific orders. When a processor detects electrical currents running on certain pins, it interprets that order of electrical currents as a command, such as "add these two numbers together" or "divide this number by that number" or "move this piece of data from this part of RAM to that part of RAM." The commands a processor has is known as it's opcode set.
In the old days of computing, you would write programs to these processors directly by hand. The lexiconical representation of these electrical currents is known as binary, the "1's and 0's" people talk about CPUs reading. For example, the code to tell the Sega Genesis' 68000 CPU to add two numbers together is 1101-0010 1101-0000. This is interpreted as sending an electrical high signal to pins 1, 2, 4, 7, 9, 10, and 12. When the CPU gets this signal, it then adds two numbers together and dumps them into memory. This specific type of binary programming is called bytecode, it's the pure instruction set for a program.
Writing in bytecode is monstrously difficult, so most CPUs offer a higher level mnemonic that is more human readable. The lowest type of this mnemonic is called Assembler, every processor has it's own assembler language. Assembler is written in text files that can be read by humans. For example, instead of remembering that the add command for the 68000 CPU is 1101-0010 1101-0000, it is agreed upon that the mnemonic "add.w" will accomplish the same task. These human readable text files full of mnemonics are known as source codes. Now, text characters are, in actually, also strings of binary, which adhere to a "rule" or "standard" that everyone agrees upon. The most widely used standard is ascii. This means that, for example, the symbol for the character 'A' is actually agreed upon to be the number 64, which can be represented by the binary string 0100-0000. You might realize then that the mnemonics are not 1:1 representations of the CPU command they are intended to symbolize. In order to turn source code into the correct bytecode, you must compile the code using an external program called a compiler. A compiler is kinda like a translator, it scans the source code, reading the mnemonic binary strings, and interprets them. When it sees the specific binary strings that equals to the word "add.w", it replaces it with the appropriate bytecode string. This simplifies the process of writing code for a CPU greatly.
This same loop - interpreting a string of text as a mnemonic device, can be repeated. Since different processors have different commands, Assembler source codes don't tend to match up between processors. But you can move to a higher level language, like C, which is intended to be cross platform. Languages like C work in that they have an interface layer between the source code and the eventual bytecode, kind of like a "virtual" bytecode language. C syntax to add two numbers together might be something like so:
Code:int number1 = 1, number2 = 1; int result = number1 + number2;
C has a compiler as well, which will interpret the above code when compiled into assembler code, which then gets compiled into bytecode, which is the binary string a program runs. By writing compilers for all sorts of processors, you can dictate the conversion process from C to bytecode so that, for different CPUs, the same C code will turn into different end binaries to run for each appropriate processor.
Now going back to my code example above. You'll notice I assign some arbitrary variable identifiers to help me keep track of what I'm doing. I named one variable Number1, and one Number2, and stored them in a variable called result. This is done entirely for the benefit of people reading the source code, so they can quickly interpret what the code is supposed to do. The names people use to identify variables and functions and other stuff in their source codes are known as symbols. Symbols are largely meaningless to the actual CPU running the program, so when a source code is compiled, symbols are stripped out (removed) from the resulting binary. Without the correct symbols, what the code intends to do is more difficult to read at a glance. Take for example the same code, without symbols:
Code:int i = 1; ii=2; int iii = i + ii;
Now, since this is a super simple example of just adding two numbers together, it's still somewhat readable, but the arbitrarily chosen symbols we used to represent our variables makes seeing the logic a lot more difficult. The more complex your program, the more valuable these symbols are. Now, you are correct that a compiled binary in the end is indeed a string of instructions that represents the end result of a program, and you can totally reinterpret those instructions back into assembler. Doing this is called a disassembly. It's generally not possible to go from assembly back to, say, C or any higher level language except by hand, interpreting the code through a human. But even going from bytecode to assembler presents problem, because assembly source codes also have symbols to make them more readable. Consider the following, this is part of the source code for Sonic the Hedgehog. This is from a disassembly that has been worked on by community members for years, and thus the symbols have been slowly added back in:
Code:; ||||||||||||||| S U B R O U T I N E ||||||||||||||||||||||||||||||||||||||| ; sub_272E: PalLoad2: PalLoad_Now: lea (PalPointers).l,a1 lsl.w #3,d0 adda.w d0,a1 movea.l (a1)+,a2 movea.w (a1)+,a3 move.w (a1)+,d7 dbf d7,- rts ; End of function PalLoad_Now
And here is the same code as the result of a mechanical disassembly:
Code:; ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ S U B R O U T I N E ÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛ sub_2764: ; CODE XREF: ROM:000042E4p lea (a4).l,a1 lsl.w #3,d0 adda.w d0,a1 movea.l (a1)+,a2 movea.w (a1)+,a3 suba.l #$B00,a3 move.w (a1)+,d7 loc_277A: ; CODE XREF: sub_2764+18j move.l (a2)+,(a3)+ dbf d7,loc_277A rts ; End of function sub_2764
Even with no experience at all, you should be able to deduce this code does something regarding loading a palette, because of the symbols in the annotated source code. The mechanically disassembled code lacks all symbols, replaced by arbitrary identifiers the disassembler chose mechanically, and thus it becomes indecipherable.
Additionally, modern compilers don't just spit out 1:1 code from the mnemonics anymore, they do automatic optimizations. Some of these kinds of optimizations can be complex to understand for humans, but intuitive for machines. This means that the disassembled code you get out might not even match the original source code put in. Now, they'll function the same, but that means even the original author of the source code might have problems following their disassembled code as it doesn't necessarily match what they originally wrote. And, because I said all CPUs basically speak their own "language," many times the disassembled code is largely useless for porting a game to a new platform anyways, because they don't disassemble into a higher enough level language like C to be portable in the first place.
As for why source codes got lost, back in the old days of computing, development standards like SVN didn't exist. Old programs were simple enough to be coded by 1 or 2 guys by themselves, and thus they didn't need to share their code with lots of people. A single programmer could understand how their entire code worked, every bit of it. A lot of source code subversion practices these days were born out of necessity, the need for multiple people to be able to work on one source code at once. So things like redundant backups and forks and such have emerged as logistical solutions. These problems are compounded in japan. In the west, game development is closely linked to computer science, but in the 80's and 90's, japanese developers were cowboy coders, not necessarily formally trained computer programmers. Comp Sci in the west has made developers approach writing code in a more scientific manner, with standardized practices for source distribution. In Japan in the 80's and 90's, every coder might have their own esoteric way of building code or maintaining their source. Some old japanese coders would just write their entire game in a single text file that they'd pass around from person to person, which is insane to think about regarding modern programmers. Additionally, japan has a problem with space. Japan is a small country, that has a limited amount of land. Storing source codes takes servers, disks, etc. Those take space. When you're working out of a tiny office, you don't necessarily have the space for proper backup and maintenance of source codes.
how does a financial institution even recover from something like that? You lose everything from people withdrawing 5 bucks to pay for candy at a gas station to security deposits, land acquisitions, Utility payments, and payroll distributionsWe had like 7 or 8 systems failing at the same time, backups were between maintenance, felt like doomsday at work. We got people literally screaming and crying in the office.
I think it's strange how this is a thing with japanese devs only. I've never heard a western dev talking about lost source code.
We strike this often with games designed for entirely different CPUs as well. The PS1 and N64 can sometimes produce disassemblies that are outright impossible, simply due to the not-fully documented nature of their instruction sets (this is usually easy to spot and resolve though after enough time, but the symptoms have a habit of changing per game).
Woah, that is surprisingly recent.The pinnacle station dlc source code for mass effect 1 was corrupted... This is why it's not included in the mass effect trilogy on PS3
Shit happens
Decompiling is technically possible in some programming languages, but it's generally a one way process and the decompiled code isn't going to be as human readable as the original source. That's just talking about a normal PC program, getting into extracting assets, compiling for consoles rather than dev kits, and other game-specific stuff makes it not a real solution for preservation.
I think it's strange how this is a thing with japanese devs only. I've never heard a western dev talking about lost source code.
I think that is more of a case that there are not that many western devs from that era left.
Yep, we (meaning the Museum of Play) have the source code to River Raid, for instance, and it is all printed code. And it takes a lot of pages just for an Atari game.That reminds me that up to the early 80s printing the source code on paper was the normal way of archiving it. Imagine doing that with a modern project!
In some cases, Nintendo keeps an archive of source code for games on their system, but I believe this was only a practice until 1999 or so.What are some cases of games with lose source code? If a company doesn't have the source code of a title, but want to make a port, what options do they have?
It's a lossy transformation for a lot of reasons.Dumb question, I know little about the technical side, but no shame:
What's the difference between source code and what actually ships on the disc and was read by the Playstation?
If source code is compiled into a code/language readable by a console, wouldn't said transformation follow a strict enough set of rules that it could be reversed back into the original source code.
The answer is obviously no, or else source code couldn't technically ever be lost, but just wondering about the differences between source and what ships on disc.
Oh, no, Western devs lose source all the time. There's a joke that in every former 1990s PC studio, there's probably a box containing source code hidden in the walls. Studios close, egos get in the way, different people think they own projects because it's their baby, files get taken home and never returned, etc.I think it's strange how this is a thing with japanese devs only. I've never heard a western dev talking about lost source code.
yep. hard drives go bad, CD's, floppies and DVDs can rot. to say nothing of simple mistakes. "oh I thought that HDD was just a spare" when it held the last remaining copy of something.The cloud wasn't a thing for the most part. The source code (depending on the era) might be in a collection of floppies, cds or hard drives, all of them physical objects than can be lost or broken.