qemu + Windows NT 4 + Multiprocessor HAL = Crash. Fixed.

I spent a lot of time running old OSes, mainly for personal amusement and not for doing any practical work. I guess the lack of practicality is why I’m running into bugs / issues with qemu and older OSes.

Windows NT 4, released in 1996, does in fact support SMP. However, qemu with KVM acceleration will crash if you install the Multiprocessor HAL in Windows NT 4.

Stack trace of thread 2072708:
#0  0x00007f2e8a90ba7c __pthread_kill_implementation (libc.so.6 + 0x96a7c)
#1  0x00007f2e8a8b7476 __GI_raise (libc.so.6 + 0x42476)
#2  0x00007f2e8a89d7f3 __GI_abort (libc.so.6 + 0x287f3)
#3  0x000055cbf99545a2 do_patch_instruction (qemu-system-i386 + 0x2f85a2)
#4  0x000055cbf995b1d4 process_queued_cpu_work (qemu-system-i386 + 0x2ff1d4)
#5  0x000055cbf9d762f8 kvm_vcpu_thread_fn (qemu-system-i386 + 0x71a2f8)
#6  0x000055cbf9ef4560 qemu_thread_start (qemu-system-i386 + 0x898560)
#7  0x00007f2e8a909b43 start_thread (libc.so.6 + 0x94b43)
#8  0x00007f2e8a99ba00 __clone3 (libc.so.6 + 0x126a00)

do_patch_instruction is located in kvmvapic.c. It contains a switch block that evaluates a CPU instruction given to it, based on opcode, and if the opcode of the instruction does not match anything in the switch block, it will fall through to the default case and call libc’s abort().

Whatever NT is doing, it goes through this code and feeds it an instruction it doesn’t understand. The qemu devs had their reasons for calling abort() I suppose.

Comment out, recompile, and bam Windows NT will boot correctly now!

402 static void do_patch_instruction(CPUState *cs, run_on_cpu_data data)
403 {   
404     X86CPU *x86_cpu = X86_CPU(cs);
405     PatchInfo *info = (PatchInfo *) data.host_ptr;
406     VAPICHandlers *handlers = info->handler;
407     target_ulong ip = info->ip;
408     uint8_t opcode[2];
409     uint32_t imm32 = 0;
410     
411     cpu_memory_rw_debug(cs, ip, opcode, sizeof(opcode), 0);
412     
413     switch (opcode[0]) {
414     case 0x89: /* mov r32 to r/m32 */
415         patch_byte(x86_cpu, ip, 0x50 + modrm_reg(opcode[1]));  /* push reg */
416         patch_call(x86_cpu, ip + 1, handlers->set_tpr);
417         break; 
418     case 0x8b: /* mov r/m32 to r32 */
419         patch_byte(x86_cpu, ip, 0x90);
420         patch_call(x86_cpu, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]);
421         break; 
422     case 0xa1: /* mov abs to eax */
423         patch_call(x86_cpu, ip, handlers->get_tpr[0]);
424         break; 
425     case 0xa3: /* mov eax to abs */
426         patch_call(x86_cpu, ip, handlers->set_tpr_eax);
427         break; 
428     case 0xc7: /* mov imm32, r/m32 (c7/0) */
429         patch_byte(x86_cpu, ip, 0x68);  /* push imm32 */
430         cpu_memory_rw_debug(cs, ip + 6, (void *)&imm32, sizeof(imm32), 0);
431         cpu_memory_rw_debug(cs, ip + 1, (void *)&imm32, sizeof(imm32), 1);
432         patch_call(x86_cpu, ip + 5, handlers->set_tpr);
433         break; 
434     case 0xff: /* push r/m32 */ 
435         patch_byte(x86_cpu, ip, 0x50); /* push eax */
436         patch_call(x86_cpu, ip + 1, handlers->get_tpr_stack);
437         break;
438     default:
439         //abort();
440     }
441     
442     g_free(info);
443 }

There is another workaround for this: you can not use KVM and just use qemu’s TCG acceleration. Which is likely slower. Otherwise you have to limit yourself to one CPU if you want KVM.

One last problem with Windows NT 4.0: the multiprocessor HAL, unlike the uniprocessor HAL, does not sent a HLT instruction to idle CPUs. Meaning, like back in the Windows 95 days, your CPU will burn cycles doing nothing.

This person discusses the issue and resolved it by disassembling the hal.dll and adding the HLT instruction. He offers a replacement version of the hal.dll, but I did not have luck (the system boots and sends the HLT as expected, but it’s extremely sluggish).

Maybe I’ll investigate further. Or maybe not, it’s not that important.

Leave a Reply

Your email address will not be published. Required fields are marked *