I spent a lot of time running old OSes, mainly for personal amusement and not for doing any practical work. I guess the lack of practicality is why I’m running into bugs / issues with qemu and older OSes.
Windows NT 4, released in 1996, does in fact support SMP. However, qemu with KVM acceleration will crash if you install the Multiprocessor HAL in Windows NT 4.
Stack trace of thread 2072708: #0 0x00007f2e8a90ba7c __pthread_kill_implementation (libc.so.6 + 0x96a7c) #1 0x00007f2e8a8b7476 __GI_raise (libc.so.6 + 0x42476) #2 0x00007f2e8a89d7f3 __GI_abort (libc.so.6 + 0x287f3) #3 0x000055cbf99545a2 do_patch_instruction (qemu-system-i386 + 0x2f85a2) #4 0x000055cbf995b1d4 process_queued_cpu_work (qemu-system-i386 + 0x2ff1d4) #5 0x000055cbf9d762f8 kvm_vcpu_thread_fn (qemu-system-i386 + 0x71a2f8) #6 0x000055cbf9ef4560 qemu_thread_start (qemu-system-i386 + 0x898560) #7 0x00007f2e8a909b43 start_thread (libc.so.6 + 0x94b43) #8 0x00007f2e8a99ba00 __clone3 (libc.so.6 + 0x126a00)
do_patch_instruction is located in kvmvapic.c. It contains a switch block that evaluates a CPU instruction given to it, based on opcode, and if the opcode of the instruction does not match anything in the switch block, it will fall through to the default case and call libc’s abort().
Whatever NT is doing, it goes through this code and feeds it an instruction it doesn’t understand. The qemu devs had their reasons for calling abort() I suppose.
Comment out, recompile, and bam Windows NT will boot correctly now!
402 static void do_patch_instruction(CPUState *cs, run_on_cpu_data data) 403 { 404 X86CPU *x86_cpu = X86_CPU(cs); 405 PatchInfo *info = (PatchInfo *) data.host_ptr; 406 VAPICHandlers *handlers = info->handler; 407 target_ulong ip = info->ip; 408 uint8_t opcode[2]; 409 uint32_t imm32 = 0; 410 411 cpu_memory_rw_debug(cs, ip, opcode, sizeof(opcode), 0); 412 413 switch (opcode[0]) { 414 case 0x89: /* mov r32 to r/m32 */ 415 patch_byte(x86_cpu, ip, 0x50 + modrm_reg(opcode[1])); /* push reg */ 416 patch_call(x86_cpu, ip + 1, handlers->set_tpr); 417 break; 418 case 0x8b: /* mov r/m32 to r32 */ 419 patch_byte(x86_cpu, ip, 0x90); 420 patch_call(x86_cpu, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]); 421 break; 422 case 0xa1: /* mov abs to eax */ 423 patch_call(x86_cpu, ip, handlers->get_tpr[0]); 424 break; 425 case 0xa3: /* mov eax to abs */ 426 patch_call(x86_cpu, ip, handlers->set_tpr_eax); 427 break; 428 case 0xc7: /* mov imm32, r/m32 (c7/0) */ 429 patch_byte(x86_cpu, ip, 0x68); /* push imm32 */ 430 cpu_memory_rw_debug(cs, ip + 6, (void *)&imm32, sizeof(imm32), 0); 431 cpu_memory_rw_debug(cs, ip + 1, (void *)&imm32, sizeof(imm32), 1); 432 patch_call(x86_cpu, ip + 5, handlers->set_tpr); 433 break; 434 case 0xff: /* push r/m32 */ 435 patch_byte(x86_cpu, ip, 0x50); /* push eax */ 436 patch_call(x86_cpu, ip + 1, handlers->get_tpr_stack); 437 break; 438 default: 439 //abort(); 440 } 441 442 g_free(info); 443 }
There is another workaround for this: you can not use KVM and just use qemu’s TCG acceleration. Which is likely slower. Otherwise you have to limit yourself to one CPU if you want KVM.
One last problem with Windows NT 4.0: the multiprocessor HAL, unlike the uniprocessor HAL, does not sent a HLT instruction to idle CPUs. Meaning, like back in the Windows 95 days, your CPU will burn cycles doing nothing.
This person discusses the issue and resolved it by disassembling the hal.dll and adding the HLT instruction. He offers a replacement version of the hal.dll, but I did not have luck (the system boots and sends the HLT as expected, but it’s extremely sluggish).
Maybe I’ll investigate further. Or maybe not, it’s not that important.