The issue you're encountering is a known limitation of ReFS in certain high I/O, high-handle-count scenarios, particularly with VHD(X) containers. Given that you're on the latest Windows Server 2025 with patches, the underlying file system driver may still have scalability constraints in your specific environment.
The immediate and most effective solution is to migrate your FSLogix profile/container share to an NTFS volume. NTFS is more mature and robust under high concurrent file operations, especially with the random read/write patterns generated by FSLogix. Before migrating, ensure you have a validated backup of all VHD(X) files.
If migrating to NTFS is not immediately feasible, implement the following mitigations for ReFS:
Disable ReFS integrity streams on the volume. While this sacrifices data checksumming, it reduces metadata overhead. Use PowerShell:
Set-ItemProperty -Path "YourVolume:" -Name "IntegrityStreams" -Value $false
This must be done on an empty volume or after moving data off and reformatting.
Adjust the ReFS volume format settings. When formatting, use the largest allocation unit size (64KB) and disable short names. For an existing volume, you must reformat.
Optimize the SMB server settings on the host. In an elevated PowerShell, run:
Set-SmbServerConfiguration -AsyncHandleCount 1024 -Force
Set-SmbServerConfiguration -MaxMpxCount 1024 -Force
Set-SmbServerConfiguration -ServerHidden $true -AnnounceServer $false -Force
Then restart the server.
Increase the system's handle limit by adding a DWORD ObCaseInsensitive with value 1 under HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel and reboot. Also, consider increasing the PendingFileRenameOperations value in the same registry path to 0x1000 (4096 decimal).
Monitor specific performance counters during high load to identify the bottleneck:
-
ReFS\Metadata I/O Latency -
ReFS\Metadata Operations per Second -
System\File Read Operations/sec -
System\Processes\Handle CountforSystemandFsFilterprocesses
If the instability persists, engage Microsoft Support with a full memory dump of the server during a freeze and the ReFS operational logs (Event ID 129, 130, 131 from Microsoft-Windows-ReFS/Operational). There may be a known hotfix or driver update for your specific storage controller.
I hope you've found something useful here. If it helps you get more insight into the issue, it's appreciated to ACCEPT ANSWER then. Should you have more questions, feel free to leave a message. Have a nice day!
VP