Based on my understanding, an emulator is necessary because the machine with the emulator on it (say Windows), doesn't execute the same machine code as the target platform (6502, for example). So what an emulator does is it interprets the code line by line and executes it itself, similar to how an interpreter works.
With this in mind, would it be possible to create the equivalent of a compiler, that would convert (for example) a .PCE file into a .exe file, so that it could run on Windows without an emulator?
Note: This answer mainly focuses on the NES, since that's what I'm most familiar with.
Yes; this is called static recompilation or static binary translation, and it is theoretically possible -- jamulator by Andrew Kelly does it. However, recompilation can be incredibly difficult (to the point that falling back to interpretation at runtime may be required in some cases). Additionally, the system's non-CPU hardware (such as graphics and sound hardware) still must be emulated.
Statically recompiling machine code is incredibly difficult; determining the behavior of a program by analyzing its code ahead-of-time is in some cases provably impossible. Some of the problems faced when statically recompiling a retro video game ROM include:
The "obvious" way to recompile a NES ROM is to take a single pass, reading a stream of 6502 instructions and outputting a stream of x86 instructions. However, the ROM also includes data bytes, which will produce garbage if interpreted as instructions. These garbage instructions should never be executed, since the program would never jump to that address, but the presence of non-instruction data poses two problems:
1) The program must be able to read its own ROM. This isn't really a hard problem at all; we just need to include an uncompiled copy of the ROM in the compiled program binary to read data bytes from.
2) We can't write a simple single-pass recompiler. Instructions can be multiple bytes wide, so if we try to recompile a block of non-executable data, we might end up misaligned when recompiling subsequent instructions. (As a simple (somewhat contrived) example, in 6502 machine code, the sequence of bytes
69 09 0A is
ADC #09; ASL, while the sequence of bytes
A5 69 09 0A is
LDA $69; ORA #0A. The point at which we start executing drastically affects our results.)
So this means we have to perform much more complex code analysis in order to determine the basic blocks of the program and compile them individually. jamulator does this by starting at the interrupt vectors and following all possible branches from there. However, this approach still has problems with:
This is where the address to execute is computed at runtime. In some cases, the runtime-computed address can be looked up and the corresponding x86 code can be found. However, if the only way to access a basic block is via an indirect jump, the recompiler would likely have assumed that block was not executable code and thus not compiled it.
jamulator mitigates this problem using a hack: hard-coded support in the recompiler to recognize jump table implementations in specific games. This works, but is clearly not a general-purpose solution.
Although jamulator only supports NROM games, many NES games with more complicated mappers could switch regions of code and data in and out of the accessible address space. This means that each jump could go to dozens of different locations in the ROM, depending on which bank is mapped to that address at runtime.
Although this was uncommon for NES games due to the small amount of RAM, if I remember correctly C64 software would occasionally generate and execute code at runtime, or modify code mid-execution. This would be nearly impossible to statically predict.
The NES Picture Processing Unit and Audio Processing Unit still must be emulated. jamulator includes an emulator for the non-CPU hardware in a runtime library, and the generated x86 code calls a library function to handle writes to memory addresses mapped to I/O operations.
Many NES games relied on precise timing between the CPU and PPU, so the generated CPU code must count the number of 6502 clock cycles taken by the executed code. jamulator implements this relatively simply: after each NES instruction, it calls out to a runtime library which runs the rest of the hardware for a specific number of cycles. This approach is simple to implement, but has a few disadvantages:
A more efficient (but much more complex) approach is to predict the times at which accurate synchronization between the CPU and the rest of the hardware is needed, and use these predictions to switch between fast and cycle-accurate emulation modes as needed.
Although static recompilation is often possible, it can be extremely difficult. jamulator sometimes falls back to interpretation when the game does something not properly handled by the static recompiler.
The accuracy of static recompilation could be improved by running the game inside of something like FCEUX's code-data logger, which emulates the game while recording the code paths it takes. Data from an actual run of the game can be used to greatly improve the accuracy of static recompilation. However, the "test run" recorded by the code-data logger must comprehensively exercise the possible code paths taken by the game in order to be useful.
Emulators of newer systems, such as the Dolphin Emulator (which emulates the GameCube and Wii) frequently use just-in-time compilation, where the emulator recompiles sections of the game's code at runtime. This generally provides the best of both worlds: we get the performance improvements of recompiled code, and the improved insight of being able to analyze the game's code at runtime.
The problem is that the emulator is emulating a LOT more than just the CPU. So in addition to transpiling the 6502 code to Intel code (and don't think that's simple - making the timing come out right would be a fascinating problem), you also need to provide code (analogous to the standard libraries that any program uses) that provide an emulated I/O environment that the 6502 code can manipulate.
It'd be much easier to just bundle the emulator and program into one binary and ship that. This is what Ian Bogost did with A Slow Year - the 6502 binary is just wrapped up into a binary with the emulator.
Whether or not it's possible isn't the only defining factor that goes into development. Keep in mind that the quality of an emulator is very much connected to it's ability to create TAS, Savestate, use RAMWatch, and so on, all of which would not be possible if the ROM was converted to a .EXE file. With this in mind, developers haven't been interested in creating such a thing, which is the very reason why it isn't "possible" with the knowledge that the community is likely to have.
There are three things that make this much more complex than it seems:
First, you'd need to emulate the hardware other than the CPU, so the graphics system, the audio system, etc. Along with this you'd need to provide a pre-translated version of any firmware that's expected to be present on your target system so that it can be linked in to the code you're compiling.
Second, you need to identify what parts of the ROM are actually instructions to be executed, and what parts are data (e.g. sprites, audio tracks, game level maps, etc). The data parts need to be handled differently -- ideally, translated into a more easily accessed format (especially multi-byte numbers, either integers or floating point, which could easily be in a format that the host processor doesn't natively understand).
Third, and possibly more interestingly, all of this may in fact be somewhat less legal than just using interpreted emulation. Many software and hardware vendors have granted permission to allow their code (either firmware or games) to be used in emulators, as long as they remain unchanged (e.g. I know that this is the situation with the ROMs for the Sinclair ZX Spectrum, which have been released under terms like these by their current copyright holder, Amstrad PLC). But recompiling it for a new system does change it. Even when copyright permission hasn't been granted, there are a variety of exceptions to copyright that allow minimal copying to be performed (e.g. copying the contents of a cartridge ROM onto disk for easier access) if that's necessary to make it work -- but recompiling isn't necessary, so even where there's no explicit licence grant available it may be less legal to recompile for a new architecture than just to emulate the old one. (I'm not a lawyer, so take this with a very large pinch of salt, but I have spent a reasonable amount of time learning about copyright law, and I'm pretty confident this position)
Not quite a complete answer; however, many emulators of the consoles like playstation and alike do in some sense the same you are asking about.
Instead of precisely executing the code and precisely emulating the hardware, some typical code pieces (mostly connected with 3D transformations and rendering) are recognized as a whole and the end result is just inserted in the machine state or rendered using host 3D capabilities. The reason in doing so is probably lack of detailed hardware information and the speed of host CPU being not enough to emulate everything precilesy.