Windows Performance

Trace Windows 7 boot/shutdown/hibernate/standby/resume issues
This is an updated tutorial of the one cluberti posted here.

To get started you need the Windows Performance Tools Kit. Read here how to install it:

http://www.msfn.org/...howtopic=146919

Now open a command prompt with admin rights and run the following commands:

For boot tracing:

xbootmgr -trace boot -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:\TEMP

Attention: Some users reported that they get a bugcheck (BSOD) when using the DRIVERS flag in the boot trace command. If you get this, use system restore to go back to a working Windows and run the command without DRIVERS

xbootmgr -trace boot -traceFlags BASE+CSWITCH+POWER -resultPath C:\TEMP

Also change the name in the command to generate the XML.

I've send some dumps to Microsoft, they look at the issue right now.

For shutdown tracing:

xbootmgr -trace shutdown -noPrepReboot -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:\TEMP

For Standby+Resume:

xbootmgr -trace standby -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:\TEMP

For Hibernate+Resume:

xbootmgr -trace hibernate -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:\TEMP

replace C:\TEMP with any temp directory on your machine as necessary to store the output files

All of these will shutdown, hibernate, or standby your box, and then reboot to finish tracing. Once Vista/Server 2008(R2) or Windows 7 does reboot, log back in as necessary and once the countdown timer finishes, you should now have some tracing files in C:\TEMP. If asked, upload or provide the file(s) generated in C:\TEMP (or the directory you chose) on a download share for analysis.

Analyses of the boot trace:

To start create a summary xml file, run this command (replace the name with the name of your etl file)

xperf /tti -i boot_BASE+CSWITCH+DRIVERS+POWER_1.etl -o summary_boot.xml -a boot

Now you see this picture.:

You have too look at the timing node. All time values are in ms.

The value timing bootDoneViaExplorer shows the time, Windows needs to boot to the desktop.

The value bootDoneViaPostBoot is the time (+10s idle detection) which Windows needs to boot completly after finishing all startup applications.

Quote During the OSLoader phase (shown in the value osLoaderDuration), the Windows loader binary (Winload.exe) loads essential system drivers that are required to read minimal data from the disk and initializes the system to the point where the Windows kernel can begin execution. When the kernel starts to run, the loader loads into memory the system registry hive and additional drivers that are marked as BOOT_START.

Visual Cues

This phase begins approximately when the BIOS splash and diagnostic screens are cleared and ends approximately when the “Loading Windows” splash screen appears.

those values show you a summary.

The MainPathBoot Phase

Quote What Happens in This Phase During the MainPathBoot phase, most of the operating system work occurs. This phase involves kernel initialization, Plug and Play activity, service start, logon, and Explorer (desktop) initialization. To simplify analysis, we divide the MainPathBoot phase into four subphases, as show in the next picture. Each subphase has unique characteristics and performance vulnerabilities.

Visual Cues

Visually, the MainPathBoot phase begins when the “Starting Windows” splash screen appears and lasts until the desktop appears. If auto-logon is not enabled, the time that elapses while the logon screen is displayed affects the measured boot time in a trace.

PreSMSS Subphase

Quote What Happens in This Subphase The PreSMSS subphase begins when the kernel is invoked. During this subphase, the kernel initializes data structures and components. It also starts the PnP manager, which initializes the BOOT_START drivers that were loaded during the OSLoader phase. When the PnP manager detects a device, it loads and initializes the device’s drivers

Visual Cues PreSMSS begins approximately when the “Loading Windows” splash screen appears. There are no explicit visual cues for the end of PreSMSS.

So if the time takes too long for you, look inside the  node which driver is loading too slowly.

SMSSInit Subphase

Quote What Happens in This Subphase The SMSSInit subphase begins when the kernel passes control to the session manager process (Smss.exe). During this subphase, the system initializes the registry, loads and starts the devices and drivers that are not marked BOOT_START, and starts the subsystem processes. SMSSInit ends when control is passed to Winlogon.exe.

Visual Cues There are no explicit visual cues for the start of SMSSInit, but the blank screen that appears between the splash screen and the logon screen is part of SMSSInit. It ends before the logon screen appears.

SMSSInit Performance Vulnerabilities Video drivers are a common source of performance problems in the SMSSInit subphase. The video driver must be initialized first in the system session and then in the user session. Reduction of video driver initialization time leads to a direct wall-clock reduction in boot time. Initialization in the user session is typically much faster than in the system session because Windows performs common initialization tasks during the system session.

So if the SMSSInit Phase takes too long, try to get an graphic card driver update.

WinLogonInit Subphase

Quote What Happens in This Subphase The WinLogonInit subphase begins when SMSSInit completes and starts Winlogon.exe. During WinLogonInit, the user logon screen appears, the service control manager starts services, and Group Policy scripts run. WinLogonInit ends when the Explorer process starts.

Visual Cues WinLogonInit begins shortly before the logon screen appears. It ends just before the desktop appears for the first time.

WinLogonInit Performance Vulnerabilities Many operations occur in parallel during WinLogonInit. On many systems, this subphase is CPU bound and has large I/O demands. Good citizenship from the services that start in this phase is critical for optimized boot times. Services can declare dependencies or use load order groups to ensure that they start in a specific order. Windows processes load order groups in serial order. Service initialization delays in an early load order group block subsequent load order groups and can possibly block the boot process.

If you have too long WinLogonInit Time, open the etl file and scroll to the service graph and look for a long delay.

In this example the service SavService (Sophos Anti-Virus\SavService.exe) is part of the Plug and Play group and causes a delay because the service takes too long to start. Try to get an update for the hanging service or remove the software.

ExplorerInit Subphase

Quote What Happens in This Subphase

The ExplorerInit subphase begins when Explorer.exe starts. During ExplorerInit, the system creates the desktop window manager (DWM) process, which initializes the desktop and displays it for the first time. This phase is CPU intensive. The initialization of DWM and desktop occurs in the foreground, while in the background the service control manager (SCM) starts services and the memory manager prefetches code and data. On most systems ExplorerInit is CPU bound, and timing issues are likely the result of a simple resource bottleneck. Visual Cues ExplorerInit begins just before the desktop appears for the first time. There is no clear visual cue to indicate the end of ExplorerInit.

ExplorerInit Performance Analysis

Applications—such as antivirus programs or application servers—that are created during service start in this or previous phases can consume CPU resources during ExplorerInit. Some services might not be started yet when ExplorerInit is complete.

So if the ExplorerInit phase takes too long, make sure you minimize the services which use a lot of CPU power and make sure your AV Tool doesn't hurt too much. If it doesn't change the tool and try a different.

The PostBoot Phase

Quote What Happens in This Phase The PostBoot phase includes all background activity that occurs after the desktop is ready. The user can interact with the desktop, but the system might still be starting services, tray icons, and application code in the background. Specifically, Xperf samples the system every 100 ms during the PostBoot phase. If the system is 80-percent or more idle (excluding low-priority CPU and disk activity) at the time of the sample, Xperf considers the system to be “idle” for that 100 ms interval. The phase persists until the system accumulates 10 seconds of idle time. Note: When you review traces and report timing results, you should subtract the 10 second idle time that accumulated during PostBoot to determine total boot time.

Visual Cues There are no explicit visual cues for PostBoot. The phase begins after the user’s desktop appears and ends after satisfying the 10-second metric that was explained earlier.

PostBoot Performance Vulnerabilities During PostBoot, Windows examines the entries in the various Run and RunOnce keys (Run, RunOnce, RunOnceEx, RunServices, and so on) in the registry and the Startup folder in the file system, and then starts the listed applications.

If post boot takes too long, reduce the number of running applications at startup with the help of msconfig.exe or AutoRuns.

When you have a HDD (no SSD!) and you want to speedup the boot, run the optimization from this guide:

http://www.msfn.org/...howtopic=140262

Analyses of the shutdown trace:

The shutdown is divided into this 3 parts:

To generate an XML summary of shutdown, use the -a shutdown action with Xperf:

xperf /tti -i shutdown_BASE+CSWITCH+DRIVERS+POWER_1.etl -o summary_shutdown.xml -a shutdown

Open the XML and you see this:

It shows you the most relevant data.



The shutdownTime is in this example 23s. Stopping the services takes 1.5s which is fast.

Next you have an entry for all sessions. Starting with Vista, all services run in Session 0 (Session 0 Isolation) and each user gets his one Session (1,2,..,n).

sessionShutdown sessionID="1" duration="3321">

shows the time which it takes to stop all applications which the user is running. In this example it takes 3.3seconds.

UserSession Phase

Quote What Happens in UserSession During this phase, the Client/Server Runtime Server Subsystem (Csrss.exe) shuts down all applications that are running in the user session—that is, all applications that have session ID 1.

If after 5 seconds any application blocks shut down, Windows displays the dialog box in Figure 24 so that users can choose to force or cancel shutdown.

UserSession Performance Vulnerabilities

Because Windows serially shuts down applications, any delay in a process’s shutdown path contributes to the total shutdown duration. To ensure a speedy shutdown, every application must respond quickly to shutdown notification messages (WM_QUERYENDSESSION and WM_ENDSESSION). Windows uses long time-outs so that applications have sufficient time to shut down and save user data. Therefore, applications can have a significant effect on shutdown performance.

sessionShutdown sessionID="0" duration="1513">

The value sessionShutdown sessionID="0" shows the servicesShutdownDuration. So you can see which service takes too long to stop.

SystemSession Phase

Quote What Happens in SystemSession

This phase includes two subphases: • Preshutdown notification. Windows serially shuts down all services that registered to receive preshutdown notifications. Ordered services—services that have set up the shutdown order of dependent services—are shut down before non-ordered services. • Shutdown notification. All services that registered to receive shutdown notifications are shut down in parallel.

If all services have not exited after 20 seconds (in Windows Vista) or 12 seconds (in Windows 7), the system continues the shutdown. Processes and services that do not shut down in a timely manner are left running as the system shuts down.

SystemSession Performance Vulnerabilities

In the preshutdown notification subphase, the SCM serializes the waits. Therefore, these services block system shutdown until they exit or until the wait hint time-out expires. Services are not guaranteed to have enough time to finish all their work in the shutdown notification subphase before the system shuts down.

In both cases expand the node and look at the shutdownDuration value.

It helps you to identify a hanging application are service.

KernelShutdown Phase

Quote What Happens in KernelShutdown In the KernelShutdown phase, the rest of the system, including all devices and drivers, is shut down.

To calculate the time spent in KernelShutdown, subtract the time that is required to shut down the system and user sessions from shutdownTime.

In my example:

KernelShutdown = 23184 - 3321 - 1513 = 18350

In this case the 18.35 seconds are very slow. In the you see an entry ZeroHiberFile which takes too long. In this expample the user enabled the Option ClearPageFileAtShutdown under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management to 1. This overrides the hiberbation file with 0 to delete personal data. This causes the huge slowdown. Setting this option to 0 would save 12.64 seconds of shutdown time.

That is all you need to analyze slow shutdown issues.

Analyses of the Hibernation trace::

To generate the XML, run this command:

xperf /tti -i hibernate_BASE+CSWITCH+DRIVERS+POWER_1.etl -o summary_hibernation.xml -a suspend

Analyses of the Sleep/Resume trace::

xperf /tti -i standby_BASE+CSWITCH+DRIVERS+POWER_1.etl -o summary_sleep.xml -a suspend

ATTENTION: The guide only works if you use HDD (NOT a SSD!).

To get started you need the Windows Performance Tools Kit. Read here how to install it:

http://www.msfn.org/...howtopic=146919

If you are a Windows 7 User: Make sure that EnablePrefetcher and EnableSuperfetch registry settings are not disabled and that the Superfetch service (sysmain) is running and set to start automatically.

If you are a Windows Vista User: Make sure that EnablePrefetcher registry setting is not disabled and the ReadyBoost service is running and set to start automatically.

Now open a command prompt with admin rights ( http://windows.micro...or-access-token ) and run the following command:

xbootmgr -trace boot -prepSystem -verboseReadyBoot

Now your PC will be restarted 6 times. After the second reboot the MS defragmentation program is running and is placing the files into an optimized layout, so that Windows will boot up faster (for the description read what ReadyBoot is). The last Reboots are training of readyBoot. After the training is finished, you'll notice a huge improvement in startup.

Note! DON'T USE OTHER DEFRAGMENTATION PROGRAMS AFTER THE OPTIMIZATION, USE ONLY THE INCLUDED MS TOOL, BECAUSE EVERY TOOL PLACES THE FILES AT A DIFFERENT OFFSET ON YOUR HDD, BECAUSE ALL TOOL THINK THEY KNOW IT BETTER!

Background:

With Windows XP, MS implemented a prefetcher which loads data into the RAM, when the CPU was busy, starting services, drivers, so that they are already loaded when they are needed in later stages of the boot process.

With Vista, MS improved this prefetcher and named it ReadyBoot:

Quote Windows Vista uses the same boot-time prefetching as Windows XP did if the system has less than 512MB of memory, but if the system has 700MB or more of RAM, it uses an in-RAM cache to optimize the boot process. The size of the cache depends on the total RAM available, but is large enough to create a reasonable cache and yet allow the system the memory it needs to boot smoothly. After every boot, the ReadyBoost service (the same service that implements the ReadyBoost feature just described) uses idle CPU time to calculate a boot-time caching plan for the next boot. It analyzes file trace information from the five previous boots and identifies which files were accessed and where they are located on disk. It stores the processed traces in %SystemRoot%\Prefetch\Readyboot as .fx files and saves the caching plan under HKLM\System\CurrentControlSet\Services\Ecache\Parameters in REG_BINARY values named for internal disk volumes they refer to. The cache is implemented by the same device driver that implements ReadyBoost caching (Ecache.sys), but the cache's population is guided by the ReadyBoost service as the system boots. While the boot cache is compressed like the ReadyBoost cache, another difference between ReadyBoost and ReadyBoot cache management is that while in ReadyBoot mode, other than the ReadyBoost service's updates, the cache doesn't change to reflect data that's read or written during the boot. The ReadyBoost service deletes the cache 90 seconds after the start of the boot, or if other memory demands warrant it, and records the cache's statistics in HKLM\System\CurrentControlSet\Services\Ecache\Parameters\ReadyBootStats, as shown in Figure 2. Microsoft performance tests show that ReadyBoot provides performance improvements of about 20 percent over the legacy Windows XP prefetcher.

Source: http://technet.microsoft.com/en-us/magazin...el.aspx?pr=blog

If you remember XP days, their was a tool called BootVis. The optimization is similar to this here, but the difference is, that is only starts the integrated MS defragmentation program for a better HDD layout, because XP doesn't have ReadyBoot.

To see the improvement in time, run those 2 commands:

xperf -i bootPrep_BASE+CSWITCH_1.etl -o 01_summary_start.xml -a boot xperf -i boot_BASE+CSWITCH_1.etl -o 02_summary_end.xml -a boot

To determine the boot time, open the XML files and look at the value bootDoneViaPostBoot. This value (-10000 = 10seconds) shows you the time, which Windows needs to boot completely.

In the file 02_summary_end.xml it should be much lower.

I hope this small tutorial helps you to make your Windows start faster.

==Open the XMLs and look for long BIOS init times and services/application which take very long to suspend and resume.

For deeper analysis refer to the Sleep and Hibernate Transitions part of theWindows On/Off Transition Performance Analysis Guide from Microsoft.

The pictures Shutdown_cancel.png, Shutdown_picture.png and Boot_MainPathBoot.png were taken from this Windows On/Off Transition Performance Analysis Guide. Read it if you need more information.

// Edit: 2010-11-28

Add the explanation of the boot process

// Edit: 2010-10-11

added the optimization guide

// Edit: 2010-10-09

If you get a BSOD (Bug Check 0x7E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED) while making traces, REMOVE ALL USB DEVICES and reboot! When making a new trace remove the DRIVERS flag from the command line!

// Edit: 2010-02-04

Added the -noPrepReboot command at shutdown tracing to prevent the preparatory reboot during a shutdown/rebootCycle trace. Usually, the reboot is required to ensure a consistent machine state before the first shutdown if multiple traces are being taken.==

How to speed up boot process under Windows Vista or Windows 7
ATTENTION: The guide only works if you use HDD (NOT a SSD!).

To get started you need the Windows Performance Tools Kit. Read here how to install it:

http://www.msfn.org/...howtopic=146919

If you are a Windows 7 User: Make sure that EnablePrefetcher and EnableSuperfetch registry settings are not disabled and that the Superfetch service (sysmain) is running and set to start automatically.

If you are a Windows Vista User: Make sure that EnablePrefetcher registry setting is not disabled and the ReadyBoost service is running and set to start automatically.

Now open a command prompt with admin rights ( http://windows.micro...or-access-token ) and run the following command:

xbootmgr -trace boot -prepSystem -verboseReadyBoot

Now your PC will be restarted 6 times. After the second reboot the MS defragmentation program is running and is placing the files into an optimized layout, so that Windows will boot up faster (for the description read what ReadyBoot is). The last Reboots are training of readyBoot. After the training is finished, you'll notice a huge improvement in startup.

Note! DON'T USE OTHER DEFRAGMENTATION PROGRAMS AFTER THE OPTIMIZATION, USE ONLY THE INCLUDED MS TOOL, BECAUSE EVERY TOOL PLACES THE FILES AT A DIFFERENT OFFSET ON YOUR HDD, BECAUSE ALL TOOL THINK THEY KNOW IT BETTER!

Background:

With Windows XP, MS implemented a prefetcher which loads data into the RAM, when the CPU was busy, starting services, drivers, so that they are already loaded when they are needed in later stages of the boot process.

With Vista, MS improved this prefetcher and named it ReadyBoot:

Quote Windows Vista uses the same boot-time prefetching as Windows XP did if the system has less than 512MB of memory, but if the system has 700MB or more of RAM, it uses an in-RAM cache to optimize the boot process. The size of the cache depends on the total RAM available, but is large enough to create a reasonable cache and yet allow the system the memory it needs to boot smoothly. After every boot, the ReadyBoost service (the same service that implements the ReadyBoost feature just described) uses idle CPU time to calculate a boot-time caching plan for the next boot. It analyzes file trace information from the five previous boots and identifies which files were accessed and where they are located on disk. It stores the processed traces in %SystemRoot%\Prefetch\Readyboot as .fx files and saves the caching plan under HKLM\System\CurrentControlSet\Services\Ecache\Parameters in REG_BINARY values named for internal disk volumes they refer to. The cache is implemented by the same device driver that implements ReadyBoost caching (Ecache.sys), but the cache's population is guided by the ReadyBoost service as the system boots. While the boot cache is compressed like the ReadyBoost cache, another difference between ReadyBoost and ReadyBoot cache management is that while in ReadyBoot mode, other than the ReadyBoost service's updates, the cache doesn't change to reflect data that's read or written during the boot. The ReadyBoost service deletes the cache 90 seconds after the start of the boot, or if other memory demands warrant it, and records the cache's statistics in HKLM\System\CurrentControlSet\Services\Ecache\Parameters\ReadyBootStats, as shown in Figure 2. Microsoft performance tests show that ReadyBoot provides performance improvements of about 20 percent over the legacy Windows XP prefetcher.

Source: http://technet.microsoft.com/en-us/magazin...el.aspx?pr=blog

If you remember XP days, their was a tool called BootVis. The optimization is similar to this here, but the difference is, that is only starts the integrated MS defragmentation program for a better HDD layout, because XP doesn't have ReadyBoot.

To see the improvement in time, run those 2 commands:

xperf -i bootPrep_BASE+CSWITCH_1.etl -o 01_summary_start.xml -a boot xperf -i boot_BASE+CSWITCH_1.etl -o 02_summary_end.xml -a boot

To determine the boot time, open the XML files and look at the value bootDoneViaPostBoot. This value (-10000 = 10seconds) shows you the time, which Windows needs to boot completely.

In the file 02_summary_end.xml it should be much lower.

== ''How to get the cause of high CPU usage caused by applications

'' To get started you need the Windows Performance Tools Kit. Read here how to install it:

http://www.msfn.org/...howtopic=146919

Now open a command prompt with admin rights ( http://windows.micro...or-access-token ) and run the following commands:== xperf -on latency -stackwalk profile

now wait a time while the high CPU usage from an application occurs.

to stop the trace run the following command:

xperf -d latency.etl

This closes the trace and writes the result to the file latency.etl.

In the next step, make a double click on the etl file to run the Viewer.

Now wait until the 2 passes are over.

Go to "Trace"->"Configure Symbol Paths" and type in the following:

srv*C:\symbols*http://msdl.microsoft.com/download/symbols

Click ok, to close the dialog.

Now go to the graph "CPU sampling by CPU" and select the interval, make a right click and select "Load Symbols" and next click "clone selection".

CPUusage_xperfview.png (10.69K) Number of downloads: 188

Now, go to the first graph "Stack Counts by Type", make a right click and select Summary Table.

Now, you have to accept the license agreement to download the public debugging symbols.

(NOTE, THE PDBs ARE SOMETIME VERY HUGE. BE AWARE THAT IT MAY TAKE SOME TIME IF YOU HAVE A SLOW INTERNET CONNECTION.

Here you'll see summary of the calls. Look which process has most "counts". The important thing is the "Stack"

xperfview_cpuusage_stack.png (61.81K) Number of downloads: 245

For me, the high CPU usage from explorer is caused while searching for installed apps to show them inside the Software- Add/remove Dialog.

If you can't see Thread names in the stack, the PDBs are missing. Look at the filename to look which porgram it is and contact the support and send the etl to them, so that they can see which causes the high CPU usage.

// Edit 2010-03-22

To enable Stackwalk on a x64 Windows, you have to set a registry value. Start Regedit.exe and go to the following key:

HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management

and create a DWORD (if it not already exists) DisablePagingExecutive and set the value to 1 and reboot to enable the setting.