So I know that PS1 uses affine texture mapping and that's what causes that distinctive PS1 wobble and that devs would break up their geometry into smaller and smaller triangles.... but what is affine texture mapping exactly?
"Affine" transforms are basically what SNES folks refer to as "mode 7", and which I've sometimes seen as "things you can do with a piece of thick cardboard". You can rotate it around (rotation), you can zoom it in and out (scaling), you can mirror it (reflection), and you can even hold it at an angle (skew). (And of course you can move it from side to side (translation), but most 2D systems support that so it's not very interesting.) And of course you can do all or most of this at once. For doing a 3D representation of a single texture, it's usually enough, and Sega's Super Scaler and similar hardware was doing this for a decade or so before the PS1 was a thing. Indeed, it's also what the Saturn's main method of 3D representation was, but for square images like bitmaps instead of triangles; Saturn didn't have texture mapping so much as it manipulated the shape of the textures themselves. Using quads instead of triangles meant that some objects tended to not have these issues of texture warping to the same degree on the Saturn, but it also meant that Saturn games had issues when either trying to represent games made for the PS1, or for stuff where the models weren't made around squares.
In effect, you're representing 3D spaces by doing a few different 2D operations, and most of the time (and especially for simpler textures) this works well enough, and because it's fast and doesn't require as much math was one of the reasons the PS1 had a competitive price to the N64 and still could push more detailed textures and fairly elaborate geometry.
While the basic idea for how PS1 rendered its polygons is that it would scale them, then shear (skew) them, the N64's calculations effectively did both at once in scaling by the z-coordinate, which meant that texture perspective was retained. This was how 3D rendering (rarely expected to be done in real-time) was done as a standard by that point. It was genuinely very expensive -- the GPU on the N64 was based on the Silicon Graphics workstations of the time, but while those were extremely powerful state-of-the-art machines, they were also priced in the tens of thousands of dollars at the time. Trying to get that tech into something consumer-spec'd meant major compromises, and so a lot of N64 games had relatively choppy frame rates, and the texture memory was compromised as well.
You can do this yourself with the pieces of the triangle in most image manipulation tools by selecting the region, resizing, and then skewing. That's why I think it's useful to think of affine transforms as something being done to the texture as a whole, rather than something being done to the polygon before the texture is applied to it. Similarly, the more triangles you have to split up the object you're texturing, the less shearing has to be done to each triangle, and thus the less egregious the warping winds up being.
And as you may have surmised, if an emulator replaces the scaling logic with the formula used in the N64's polygon calculation, you can get a PS1 that produces perspective-correct texturing. It is more expensive but without question a feasible change on modern hardware, as it's what GPUs have been designed to do for a while now.