I think it's important to keep in mind that the human mind is wired to look for clear, distinct patterns and filter out extremely fine, messy details. We naturally try to draw parallels with AI but it's really, really not how it works.
It's not actually that surprising that nobody knows why this is happening - the way deep learning works makes it hard to explain why anything is happening. It's a tangled web of math that learns to solve itself through trial and error, which rarely leads to understandable patterns (see: interpretability).
A simple but powerful example is adversarial attacks. If you add "strategic noise" to an image - something that looks random but is actually carefully selected - you can dramatically change the results.
The images below show the image before/after adding that strategic noise (source) and what the AI "sees" printed above the image.
Why does one set of noise make that dog consistently "look" like red wine? Why does the other make it "look" like toilet paper? Even directly looking at the pixels responsible doesn't tell us much.
None of that needs to make sense to us. It just needs to (in some subtle way) make a tangled web of equations work out. Maybe it's using something we can perceive, maybe it's not. With millions of trainable parameters and enough training data, it has a lot of flexibility to fit inputs to outputs either way.
It's not actually that surprising that nobody knows why this is happening - the way deep learning works makes it hard to explain why anything is happening. It's a tangled web of math that learns to solve itself through trial and error, which rarely leads to understandable patterns (see: interpretability).
A simple but powerful example is adversarial attacks. If you add "strategic noise" to an image - something that looks random but is actually carefully selected - you can dramatically change the results.
The images below show the image before/after adding that strategic noise (source) and what the AI "sees" printed above the image.
Why does one set of noise make that dog consistently "look" like red wine? Why does the other make it "look" like toilet paper? Even directly looking at the pixels responsible doesn't tell us much.
None of that needs to make sense to us. It just needs to (in some subtle way) make a tangled web of equations work out. Maybe it's using something we can perceive, maybe it's not. With millions of trainable parameters and enough training data, it has a lot of flexibility to fit inputs to outputs either way.
John von Neumann said:With four parameters I can fit an elephant, and with five I can make him wiggle his trunk