Search this site for Microsoft Knowledge Base related information
Custom Search
Article ID: 2288515 - Last Review: July 19, 2010 - Revision: 1.0
Troubleshooting gray agent states in System Center Operations Manager 2007 and System Center Essentials
SUMMARY
The information below describes how to troubleshoot issues where an agent, management server or gateway in System Center Operations Manager 2007 or System Center Essentials 2007 and 2010 is in a gray state.
MORE INFORMATION
An agent, management server or gateway can have one of the following states:
There are several reasons for an agent or a management server or a gateway to have a gray state. Some of the common reasons include:
· Heartbeat failure
· Invalid Configuration
· System workflows failure
· OpsMgr Database/DW Performance
· RMS or Primary MS or Gateway Performance
· Network/Authentication issues
· Health service is not running
First understand the Operations Manager topology and define the scope of the issue prior to engaging in troubleshooting the agent gray issue. The following questions may help to define the scope of the issue.
· How many agents are impacted?
· Are the agents having the problem in the same network segment?
· Do they report to the same Management Server?
· How often the agents turn gray and stay in that state?
· How do you recover from this situation? (for e.g. restart the agent health service, clear the cache, the recovery is automatic, etc)
· Are the Heartbeat failures alerts being generated for these agents?
· Does this issue happen during a certain time of a day?
· Does the issue persists if you failover these agents to another MS/GTW?
· When did this problem start?
· Were there any changes made to the agents/management servers/gateway/management group?
· Are the agents in question Windows Clustered systems?
· Is the Health Service State folder excluded from AV scanning?
· Is this OpsMgr SP1 or R2 environment?
Troubleshooting the gray state issue will be dictated by which component is grayed out, where it falls in the topology and how wide spread the problem is. Let?s consider the following scenarios:
· If the agents reporting to a particular gateway/management server are grayed out then the troubleshooting should start at the gateway/management server level.
· If the gateways reporting to a particular management server are grayed out then the troubleshooting should start at the management server level.
· For agent less systems, Network devices and Unix/Linux servers the troubleshooting should start at the Agent/Management server/Gateway monitoring these objects.
· If all the systems are grayed out then the troubleshooting should start at the Root Management Server.
In other words, start the troubleshooting at a level above the component that is grayed out.
Some of the common scenarios that are seen are:
Scenario 1:
There are only few agents that are impacted and they report to different management servers. Agents stay in this state all the time. Clearing the agent cache helps in resolving the problem temporarily. However the problem comes back after a few days.
Resolution:
In this case the following steps could be taken to resolve the issue:
· Apply the KB 981263 to the affected systems.
· Exclude the Agent cache from Antivirus scanning.
· Stop the Health service
· Clear the Agent cache
· Start the Health Service
Note: It is best to proactively apply KB981263 to all monitored systems including the management servers and exclude the Agent/Management cache from Antivirus scanning to prevent this from happening on other systems.
For further details refer the following KB articles:
· Management servers or assigned agents unexpectedly appear as unavailable in the Operations Manager console in Windows Server 2003 or Windows Server 2008: http://support.microsoft.com/kb/981263 (http://support.microsoft.com/kb/981263)
· Recommendations for antivirus exclusions that relate to MOM 2005 and to Operations Manager 2007: http://support.microsoft.com/kb/975931 (http://support.microsoft.com/kb/975931)
Scenario 2:
There are only few agents that are impacted and they report to different management servers. Agents stay in this state all the time. Clearing the agent cache doesn?t help.
Resolution:
1. Once confirmed the Health service is running on the agent. The next steps is to look for any of the following events in the Operations Manager Event log on the agent:
Event ID: 1102
Event Source: HealthService
Event Description:
Rule/Monitor "%4" running for instance "%3" with id:"%2" cannot be initialized and will not be loaded. Management group "%1"
Event ID: 1103
Event Source: HealthService
Event Description:
Summary: %2 rule(s)/monitor(s) failed and got unloaded, %3 of them reached the failure limit that prevents automatic reload. Management group "%1". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).
Event ID: 1104
Event Source: HealthService
Event Description:
RunAs profile in workflow "%4", running for instance "%3" with id:"%2" cannot be resolved. Workflow will not be loaded. Management group "%1"
Event ID: 1105
Event Source: HealthService
Event Description:
Type mismatch for RunAs profile in workflow "%4", running for instance "%3" with id:"%2". Workflow will not be loaded. Management group "%1"
Event ID: 1106
Event Source: HealthService
Event Description:
Cannot access plain text RunAs profile in workflow "%4", running for instance "%3" with id:"%2". Workflow will not be loaded. Management group "%1"
Event ID: 1107
Event Source: HealthService
Event Description:
Account for RunAs profile in workflow "%4", running for instance "%3" with id:"%2" is not defined. Workflow will not be loaded. Please associate an account with the profile. Management group "%1"
Event ID: 1108
Event Source: HealthService
Event Description:
An Account specified in the Run As Profile "%7" cannot be resolved. Specifically, the account is used in the Secure Reference Override "%6". %n%n This condition may have occurred because the Account is not configured to be distributed to this computer. To resolve this problem, you need to open the Run As Profile specified below, locate the Account entry as specified by its SSID, and either choose to distribute the Account to this computer if appropriate, or change the setting in the Profile so that the target object does not use the specified Account. %n%nManagement Group: %1 %nRun As Profile: %7 %nSecureReferenceOverride name: %6 %nSecureReferenceOverride ID: %4 %nObject name: %3 %nObject ID: %2 %nAccount SSID: %5
Event ID: 4000
Event Source: HealthService
Event Description:
A monitoring host is unresponsive or has crashed. The status code for the host failure was %1.
Event ID: 21016
Event Source: OpsMgr Connector
Event Description:
OpsMgr was unable to set up a communications channel to %1 and there are no failover hosts. Communication will resume when %1 is available and communication from this computer is allowed.
Event ID: 21006
Event Source: OpsMgr Connector
Event Description:
The OpsMgr Connector could not connect to %1:%2. The error code is %3(%4). Please verify there is network connectivity, the server is running and has registered it's listening port, and there are no firewalls blocking traffic to the destination.
Event ID: 20070
Event Source: OpsMgr Connector
Event Description:
The OpsMgr Connector connected to %1, but the connection was closed immediately after authentication occurred. The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration. Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.
Event ID: 20051
Event Source: OpsMgr Connector
Event Description:
The specified certificate could not be loaded because the certificate is not currently valid. Verify that the system time is correct and re-issue the certificate if necessary%n Certificate Valid Start Time : %1%n Certificate Valid End Time : %2
Event Source: ESE
Event Category: Transaction Manager
Event ID: 623
Description: HealthService (<PID>) The version store for instance <instance> ("<name>") has reached its maximum size of <value>Mb. It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back. Possible long-running transaction:
SessionId: <value>
Session-context: <value>
Session-context ThreadId: <value>.
Cleanup:<value>
2. Event ID 1102 and 1103 indicate some of the workflows failed to load. If these are the core system workflows then it could lead to the agent gray issue. The focus in this case should be resolving these events. One of the following Events 1104, 1105, 1106, 1107, and 1108 could lead to Event ID: 1102 and Event 1103. This would be generally due to misconfigured Run as accounts. In R2, the common reason this happens is either because the Run as accounts are configured to be used with the wrong class or configured not to be distributed to the agent.
3. Event ID 4000 indicates the Monitoringhost.exe process crashed. If it?s a Dll mismatch or missing registry keys issue then a quick reinstall of the agent might resolve the problem. If that doesn?t help the following tools could be leveraged:
a. Process Monitor capture until the point the process crashes (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) )
b. Adplus dump in crash mode (http://support.microsoft.com/kb/286350 (http://support.microsoft.com/kb/286350) )
c. If the agent is monitoring network devices and it is running on Windows Server 2003 system you may want to apply the following hotfix:
The monitoring of SNMP devices may stop intermittently in System Center Operations Manager or in System Center Essentials(KB 982501): http://support.microsoft.com/kb/982501 (http://support.microsoft.com/kb/982501)
4. Event ID 21006 indicates there are communication issues between the agent and the management server. If the agent is using certificate for mutual authentication, verify the certificate has not expired; the agent is using the correct certificate or the certificate has expired. If Kerberos is being used, verify the agent can communicate with Active Directory. If authentication is working correctly, it could be the packets from the agent are not making it to the Management Server/Gateway. A simple telnet to port 5723 from the agent to the management server might be a good start. A simultaneous Network trace between the agent and the management server while reproducing the communication failures would help in finding if the packets are making it to the MS; there is no device in between trying to optimize the traffic or dropping some packets. (See http://support.microsoft.com/kb/812953/en-us (http://support.microsoft.com/kb/812953/en-us) )
5. The other possibility is that the Health Service is hung. An Adplus dump in hang mode would help in determining the cause for the hang. (http://support.microsoft.com/kb/286350 (http://support.microsoft.com/kb/286350) )
6. Event ID: 623 typically occur in a large Operations Manager environment in which a management server or an agent computer is managing many workflows. Refer the following KB for further details and resolution:
One or more management servers and their managed devices are dimmed in the Operations Manager Console of Operations Manager 2007:
http://support.microsoft.com/kb/975057 (http://support.microsoft.com/kb/975057)
Scenario 3:
All the agents reporting to one particular management server/gateway are grayed out.
Resolution:
1. In this scenario the first step would be to understand what kind of workloads the management server/gateway is monitoring. For e.g. Number of Network Devices, Cross Platform Agents, Synthetic Transactions and Windows agents, Agentless machines.
2. Like in scenario 2, ensure the Health Service is turned on and currently running on the Management server/Gateway.
3. Check if the Management Server is put in maintenance mode. If it is then remove it from maintenance mode.
4. Check for the similar events as in Scenario 2. If you see Event ID: 21006 the difference here is that the Management Server/Gateway is unable to communicate to its parent server. For MS it would be the RMS or if it?s a gateway it could be the RMS or MS. Verify same things as Step 5 in Scenario 2.
5. In addition to the above if the health service is monitoring network devices and the management server is running on Windows Server 2003 system you may want to apply the following hotfix:
The monitoring of SNMP devices may stop intermittently in System Center Operations Manager or in System Center Essentials(KB 982501): http://support.microsoft.com/kb/982501 (http://support.microsoft.com/kb/982501)
6. Look for the following events in the Operations Manager Event log. The following events generally indicate there are performance issues on the Management Server or SQL Server hosting OperationsManager or OperationsManagerDW:
Event ID: 2115
Event Source: HealthService
Event Description:
A Bind Data Source in Management Group %1 has posted items to the workflow, but has not received a response in %5 seconds. This indicates a performance or functional problem with the workflow.%n Workflow Id : %2%n Instance : %3%n Instance Id : %4%n
Event ID: 5300
Event Source: HealthService
Event Description:
Local health service is not healthy. Entity state change flow is stalled with pending acknowledgement. %n%nManagement Group: %2 %nManagement Group ID: %1
Event ID: 4506
Event Source: HealthService
Event Description: Operations Manager
Data was dropped due to too much outstanding data in rule "%2" running for instance "%3" with id:"%4" in management group "%1".
Event ID: 31551
Event Source: Health Service Modules
Event Description:
Failed to store data in the Data Warehouse. The operation will be retried.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
Event ID: 31552
Event Source: Health Service Modules
Event Description:
Failed to store data in the Data Warehouse.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
Event ID: 31553
Event Source: Health Service Modules
Event Description:
Data was written to the Data Warehouse staging area but processing failed on one of the subsequent operations.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
Event ID:31557
Event Source: Health Service Modules
Event Description:
Failed to obtain synchronization process state information from Data Warehouse database. The operation will be retried.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
7. To troubleshoot Management Server/Gateway Performance and SQL Performance look at the following sections later in this document:
Troubleshooting Management Server/Gateway Performance
Troubleshooting SQL Performance
8. The Event ID: 3155X can also be logged due to incorrect Runas account configuration or missing permissions for the Runas accounts. Review the following blog, it has an excel sheet that lists the permissions for various accounts that are used by OpsMgr.
OpsMgr security account rights mapping - what accounts need what privileges?
http://blogs.technet.com/b/kevinholman/archive/2008/04/15/opsmgr-security-account-rights-mapping-what-accounts-need-what-privileges.aspx (http://blogs.technet.com/b/kevinholman/archive/2008/04/15/opsmgr-security-account-rights-mapping-what-accounts-need-what-privileges.aspx)
Scenario 4: All the agents reporting to one particular management server flip-flop from gray to healthy and healthy to gray state intermittently.
Scenario 5: All the agents reporting in the environment keep flip flopping from gray to healthy and healthy to gray state intermittently.
Resolution: Some of the common causes for temporary gray state are:
· The parent server of the agents temporarily went offline.
· Agents flooding the MS with operational data (alerts, state, discovery, etc) could lead to higher system resource usage on OpsMgr DB and OpsMgr servers
· There were network outages that lead to a temporary communication failure between the parent server and the agents.
· Management Pack changes in OpsMgr Console requires an OpsMgr configuration & MP redistribution to the agents. If the change impacts larger agent base, it could lead to higher system resource usage on OpsMgr DB and OpsMgr servers.
The key piece of information for these two scenarios is to understand: how long the gray state lasted and what time of the day it happened?
This will help in narrowing the scope of the problem quickly and determine the troubleshooting path that needs to be taken.
To troubleshoot Management Server/Gateway Performance and SQL Performance look at the following sections later in this document:
Troubleshooting Management Server/Gateway Performance
Troubleshooting SQL Performance
Troubleshooting Management Server/Gateway Performance:
Root Management Server (RMS):
Configuration update bursts are caused by MP imports and discovery data. When system performance is slow, the top two most likely bottlenecks are, first, the CPU and, second, the disk (OpsMgr installation disk).
The RMS is responsible for generating and sending configuration files to all affected Health Services.
For Workflow reloading (caused by new configuration on RMS), the likely bottlenecks are, again, CPU first and disk (OpsMgr installation disk) second. The RMS is responsible for reading the configuration file, loading and initializing all workflows on the RMS and updating the RMS HealthService store when the configuration file is updated on the RMS.
For local workflow activity bursts (when agents change their availability), the main bottlenecks are the CPU. If the CPU is not maxed out, it could potentially be the disk. The RMS is responsible for monitoring the availability of all agents using RMS local workflows. The RMS also hosts distributed dependency monitors which use the disk.
Management Server (MS):
During configuration update burst (caused by MP import and discovery), the typical bottlenecks are, first, the CPU and, second, the disk (the OpsMgr installation disk). The MS is responsible of forwarding configuration files from the RMS to the target agents.
For Operational data collection, bottlenecks are normally caused by the CPU. The disk may also be maxed out, but not as likely. The MS is responsible for decompressing and decrypting incoming operational data and inserting it into the Operational Database. It also sends acknowledgements (ACKs) back to the agents/Gateways after receiving operational data and uses disk queuing to temporarily store these outgoing ACKs. Lastly, the MS will also forward monitor state changes (again using a disk queue) to the RMS for distributed dependency monitors.
Gateway (GW):
The GW is both CPU- and IO-bound. When it is relaying a large amount of data, both of these may show high usage. Most of CPU cost is due to decompression, compression, encryption, and decryption of the incoming data along with transferring it. All data received by the GW, from the agents, are stored in a persistent queue on disk to be read and forwarded by the GW?s HealthService to the MS. This can cause heavy usage of the disk. This can be significant when the GW is temporarily taken offline and then must handle accumulated agent data that the agents generated and attempted to send while the GW was offline.
Information to collect for each affected Management server/Gateway:
· Exact Windows version, edition and build (i.e. Windows Server 2003 Enterprise x64 SP2)
· Number of processors
· Amount of RAM
· Drive letter containing Health Service State folder
· Is anti-virus configured to exclude Health Service Store? For more details review the following KB:
Recommendations for antivirus exclusions that relate to MOM 2005 and to Operations Manager 2007: http://support.microsoft.com/kb/975931 (http://support.microsoft.com/kb/975931)
· RAID level (0, 1, 5, 0+1 or 1+0) for the drive used by Health Service State. Number of disks used for the RAID
· Is there battery backed write cache enabled on the array controller?
Troubleshooting SQL Performance:
Operational Database (OperationsManager):
For the DB, the most common bottleneck is the disk array. Assuming the disk is not maxed out; the CPU is the next most likely culprit. Slowdown in the DB will sometimes occurs with operational data ?storms? (very high rates of Events, Alerts, Performance Data and/or State Changes for a relatively long period of time). A short burst usually won?t cause any significant delay for an extended period of time.
During operational data insertion, the DB?s disks are primarily being used for writes. CPU usage is usually due to SQL Server churn. This may happen with large, complex queries, heavy data insertion and the grooming of large tables (by default, this occurs at midnight). Grooming the Events and Performance Data tables, even when they are very large, is usually not very costly, but grooming the Alert and State Change tables can be quite CPU-intensive if they ever become very large.
The DB will also be CPU-bound when handing configuration redistribution burst, which are caused by MP imports or huge instance space change. In these cases, the Config Service queries the DB for the agents? new configuration, often causing CPU spikes on the DB, before sending the configuration updates to the agents.
Data Warehouse (OperationsManagerDW):
The most likely bottleneck is the disk array. This is usually caused by very large operational data insertions. In these cases, the disks will be mostly busy doing writes. There generally will not be many reads, except to handle manually-generated Reporting views, which run queries on the DW.
CPU usage is usually due to SQL Server churn. Heavy partitioning activity (when tables become very large and then get partitioned), the executing of complex reports, and large amounts of Alerts in the DB (which the DW must constantly sync up with) may all cause CPU spikes.
Information to collect for each affected SQL database server:
· Exact version, edition and build of SQL (i.e. SQL Server 2005 Enterprise x64 SP2 Build 3355)
· Exact Windows version, edition and build (i.e. Windows Server 2003 Enterprise x64 SP2)
· Number of processors
· Amount of RAM
· Amount of memory allocated to SQL
· If SQL is 32-bit, is AWE enabled?
You can get most of the above information in SQL Management Studio or SQL Enterprise Manager by opening the properties of the server and clicking the General and Memory tabs. The General tab includes the SQL version, Windows version, platform, amount of RAM and number of processors. The Memory tab includes the memory allocated to SQL and (in SQL 2005 and SQL 2008) the AWE option. To find out if AWE is enabled in SQL 2000, run the following in SQL Query Analyzer:
sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'awe enabled'
The config_value and run_value will be 1 if AWE is enabled.
· If OS is 32-bit and RAM is 4 GB or greater, are the /pae and/or /3gb switches in Boot.ini?
These options could be configured incorrectly if the server was originally installed with 4 GB or less RAM ? and RAM was later upgraded.
For 32-bit servers with 4 GB of RAM, the /3gb switch in Boot.ini increases the amount of memory that SQL can address (from 2 GB to 3 GB).
For 32-bit servers with more than 4 GB of RAM, the /3gb switch in Boot.ini could actually limit the amount of memory that SQL can address. For these systems, add the /pae switch in Boot.ini and enable AWE in SQL.
· On a multi-processor system, what is Max Degree of Parallelism (MAXDOP) set to?
On SQL 2005 and SQL 2008, this option is on the Advanced tab in the properties of the server. To determine this setting on SQL 2000, run the following in SQL Query Analyzer:
sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'max degree of parallelism'
The default value is 0, which means all available processors will be used. A setting of 0 is fine for servers with eight or fewer processors. For servers with more than eight processors, the time it takes SQL to coordinate the use of all processors may be counterproductive. Therefore, for servers with more than eight processors, you generally should set Max Degree of Parallelism to a value of 8:
sp_configure 'show advanced options', 1
GO
RECONFIGURE WITH OVERRIDE
GO
sp_configure 'max degree of parallelism', 8
GO
RECONFIGURE WITH OVERRIDE
GO
· Drive letters containing DW and/or Ops and Tempdb files
· Is anti-virus configured to exclude SQL data and log files? It is well-known that AV software cannot scan SQL database files and that attempting to do so can degrade performance.
· Free space on drives containing DW and/or Ops and Tempdb files
· SAN vs. local storage
· RAID level (0, 1, 5, 0+1 or 1+0) for drives used by SQL
· If using SAN storage, the number of spindles on each LUN used by SQL
· On OpsMgr 2007 SP1, is the DW event grooming hotfix (969130) or the SP1 hotfix rollup (971541) applied?
· If the converted Exchange 2007 MP is being used or has ever been used, how many rows are in the LocalizedText table in the Ops DB and the EventPublisher table in the DW DB? To find out, run:
USE OperationsManager SELECT COUNT(*) FROM LocalizedText
USE OperationsManagerDW SELECT COUNT(*) FROM EventPublisher
Counters to identify memory pressure:
· MSSQL$<instance>: Buffer Manager: Page Life expectancy ? How long pages persist in the buffer pool. If this value is below 300 seconds, it may indicate that the server could use more memory. It could also result from index fragmentation.
· MSSQL$<instance>: Buffer Manager: Lazy Writes/sec ? Lazy writer frees space in the buffer by moving pages to disk. Generally, the value should not consistently exceed 20 writes per second. Ideally, it would be close to zero.
· Memory: Available Mbytes - Values below 100 MB may indicate memory pressure. Memory pressure is clearly present when this amount is less than 10 MB.
· Process: Private Bytes: _Total ? This is the amount of memory (physical and page) being used by all processes combined.
· Process: Working Set: _Total ? This is the amount of physical memory being used by all processes combined. If the value for this counter is significantly below the value for Process: Private Bytes: _Total, it indicates that processes are paging too heavily. A difference of more than 10% is probably significant.
Counters to identify disk pressure: Capture these Physical Disk counters for all drives containing SQL data or log files:
· % Idle Time ? How much disk idle time is being reported. Anything below 50% could indicate a disk bottleneck.
· Avg. Disk Queue Length ? This value should not exceed 2 times the number of spindles on a LUN. For example, if a LUN has 25 spindles, a value of 50 is acceptable. However, if a LUN has 10 spindles, a value of 25 is too high. You could use the following formulas based on the RAID level and number of disks in the RAID configuration
RAID 0 ? All of the disks are doing work in a RAID 0 set
Average Disk Queue Length <= # (Disks in the array) *2
RAID 1 ? half the disks are ?doing work? so only half of them can be counted toward Disks Queue
Average Disk Queue Length <= # (Disks in the array/2) *2
RAID 10 ? half the disks are ?doing work? so only half of them can be counted toward Disks Queue
Average Disk Queue Length <= # (Disks in the array/2) *2
RAID 5 ? All of the disks are doing work in a RAID 5 set
Average Disk Queue Length <= # Disks in the array *2
· Avg. Disk sec/Transfer ? The number of seconds it takes to complete one disk I/O.
· Avg. Disk sec/Read ? The average time, in seconds, of a read of data from the disk.
· Avg. Disk sec/Write ? The average time, in seconds, of a write of data to the disk.
The above three counters should be around .020 (20 ms) or below consistently and never exceed.050 (50 ms). Here are the thresholds documented in the SQL performance troubleshooting guide:
Less than 10 ms ? very good
Between 10 - 20 ms ? okay
Between 20 - 50 ms ? slow, needs attention
Greater than 50 ms ? Serious I/O bottleneck
· Disk Bytes/sec ? The number of bytes being transferred to or from the disk per second.
· Disk Transfers/sec ? The number of input and output operations per second (IOPS).
When % Idle Time is low (10% or less) ? which means that the disk is fully utilized ? the above two counters will provide a good indication of the maximum throughput of the drive in bytes and in IOPS, respectively. The throughput of a SAN drive is highly variable, depending on the number of spindles, the speed of the drives and the speed of the channel. The best bet is to check with the SAN vendor to find out how many bytes and IOPS the drive should support. If % Idle Time is low and the values for these two counters do not meet the expected throughput of the drive, engage the SAN vendor to troubleshoot.
he following links are great resource for getting deeper insight into troubleshooting SQL performance:
Troubleshooting Performance Problems in SQL Server 2005: http://technet.microsoft.com/en-us/library/cc966540.aspx (http://technet.microsoft.com/en-us/library/cc966540.aspx)
Troubleshooting Performance Problems in SQL Server 2008:
http://msdn.microsoft.com/en-us/library/dd672789(SQL.100).aspx (http://msdn.microsoft.com/en-us/library/dd672789(SQL.100).aspx)
OpsMgr 2007 Performance counters
The following sections describe the performance counters that can be used to monitor and troubleshoot OpsMgr performance.
Gateway Server Role:Overall performance counters:These counters indicate the overall performance on Gateway:
· Processor(_Total)\% Processor Time
· Memory\% Committed Bytes In Use
· Network Interface(*)\Bytes Total/sec
· LogicalDisk(*)\% Idle Time
· LogicalDisk(*)\Avg. Disk Queue LengthOpsMgr process generic performance counters:These counters indicate the overall performance of OpsMgr processes on Gateway:
· Process(HealthService)\%Processor Time
· Process(HealthService)\Private Bytes (Depend on how many agents this Gateway is managing, this number could be different; it could be several hundred Megabyte.)
· Process(HealthService)\Thread Count
· Process(HealthService)\Virtual Bytes
· Process(HealthService)\Working Set
· Process(MonitoringHost*)\% Processor Time
· Process(MonitoringHost*)\Private Bytes
· Process(MonitoringHost*)\Thread Count
· Process(MonitoringHost*)\Virtual Bytes
· Process(MonitoringHost*)\Working SetOpsMgr specific performance counters:These counters are OpsMgr specific counters; they indicate the performance of OpsMgr?s different aspects on Gateway:
· Health Service\Workflow Count
· Health Service Management Groups(*)\Active File Uploads (This this the number of file transfers this Gateway is handling, e.g. downloading MP files to agents, if it is high for a long time and don?t drop, and there is not much MP importing happening at the moment, then there could be a problem in file transfer.)
· Health Service Management Groups(*)\Send Queue % Used (This is the size of persistent queue, if it?s larger than 10 for a long time and don?t recover, then the queue has been backed up caused by overloaded OpsMgr system, e.g. Management Server/DB is too busy or offline)
· OpsMgr Connector\Bytes Received (This is the number network bytes received by Gateway, i.e. the size of incoming bytes before decompress)
· OpsMgr Connector\Bytes Transmitted (This is the number network bytes sent by Gateway, i.e. the size of outgoing bytes after compress)
· OpsMgr Connector\Data Bytes Received (This is number of data bytes received by Gateway, i.e. the size of incoming data after decompress)
· OpsMgr Connector\Data Bytes Transmitted (This is number of data bytes sent by Gateway, i.e. the size of outgoing data before compress)
· OpsMgr Connector\Open Connections (This is number of connections opened on Gateway. It should be same as the number of agents / MSs that is directly connected to it)
Management Server Role:Overall performance counters:These counters indicate the overall performance on Management Server:
· Processor(_Total)\% Processor Time
· Memory\% Committed Bytes In Use
· Network Interface(*)\Bytes Total/sec
· LogicalDisk(*)\% Idle Time
· LogicalDisk(*)\Avg. Disk Queue LengthOpsMgr process generic performance counters:These counters indicate the overall performance of OpsMgr processes on Management Server:
· Process(HealthService)\% Processor Time
· Process(HealthService)\Private Bytes (Depend on how many agents this Management Server is managing, this number could be different, it could be several hundred M byte.)
· Process(HealthService)\Thread Count
· Process(HealthService)\Virtual Bytes
· Process(HealthService)\Working Set
· Process(MonitoringHost*)\% Processor Time
· Process(MonitoringHost*)\Private Bytes
· Process(MonitoringHost*)\Thread Count
· Process(MonitoringHost*)\Virtual Bytes
· Process(MonitoringHost*)\Working SetOpsMgr specific performance counters:These counters are OpsMgr specific counters; they indicate the performance of OpsMgr?s different aspects on Management Server:
· Health Service\Workflow Count (Number of workflows that is running on this Management Server)
· Health Service Management Groups(*)\Active File Uploads (This this the number of file transfers this Management Server is handling, e.g. downloading MP files to agents, if it is high for a long time and don?t drop, and there is not much MP importing happening at the moment, then there could be a problem in file transfer.)
· Health Service Management Groups(*)\Send Queue % Used (This is the size of persistent queue, if it?s larger than 10 for a long time and don?t recover, then the queue has been backed up caused by overloaded OpsMgr system, e.g. Root Management Server is too busy or offline)
· Health Service Management Groups(*)\Bind Data Source Item Drop Rate (This is the number of data items dropped by Management Server for DB/DW data collection write actions, when this counter is non 0, it means Management Server /DB is overloaded, it can?t handle the incoming data item fast enough or there is a data item burst happening. The data items dropped will be resend by agents and when the overloaded / burst situation is over, these data items will be inserted to DB/DW)
· Health Service Management Groups(*)\Bind Data Source Item Incoming Rate (This is the number of data items received by Management Server for DB/DW data collection write actions)
· Health Service Management Groups(*)\Bind Data Source Item Post Rate (This is the number of data items Management Server wrote to DB/DW for DB/DW data collection write actions)
· OpsMgr Connector\Bytes Received (This is the number network bytes received by Management Server, i.e. the size of incoming bytes before decompress)
· OpsMgr Connector\Bytes Transmitted (This is the number network bytes sent by Management Server, i.e. the size of outgoing bytes after compress)
· OpsMgr Connector\Data Bytes Received (This is number of data bytes received by Management Server, i.e. the size of incoming data after decompress)
· OpsMgr Connector\Data Bytes Transmitted (This is number of data bytes sent by Management Server, i.e. the size of outgoing data before compress)
· OpsMgr Connector\Open Connections (This is number of connections opened on Management Server. It should be same as the number of agents / Root Management Server that is directly connected to it)
· OpsMgr DB Write Action Modules(*)\Avg. Batch Size (This is the number of a data items / batch that is received by DB write action modules, If this number is 5000, it means a data item burst is happening. )
· OpsMgr DB Write Action Modules(*)\Avg. Processing Time (This is the number of seconds a DB write action modules takes to insert a batch to DB, If this number is often larger than 60, it means a we are having a DB insertion performance issue.)
· OpsMgr DW Writer Module(*)\Avg. Batch Processing Time, ms (This is the number of milliseconds for DW write action to insert a batch of data items to DW)
· OpsMgr DW Writer Module(*)\Avg. Batch Size (This is the average number of data items / batch received by DW write action modules)
· OpsMgr DW Writer Module(*)\Batches/sec (This is number of batches received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Data Items/sec (This is number of data items received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Dropped Data Item Count (This is number of data items dropped by DW write action modules)
· OpsMgr DW Writer Module(*)\Total Error Count (This is number of errors happened in DW write action modules)
Root Management Server Role:Overall performance counters:These counters indicate the overall performance on Root Management Server:
Processor(_Total)\% Processor Time
Memory\% Committed Bytes In Use
Network Interface(*)\Bytes Total/sec
LogicalDisk(*)\% Idle Time
LogicalDisk(*)\Avg. Disk Queue LengthOpsMgr process generic performance counters:These counters indicate the overall performance of OpsMgr processes on Root Management Server:
· Process(HealthService)\% Processor Time
· Process(HealthService)\Private Bytes (Depend on how many agents this Root Management Server is managing, this number could be different, it could be several hundred M byte.)
· Process(HealthService)\Thread Count
· Process(HealthService)\Virtual Bytes
· Process(HealthService)\Working Set
· Process(MonitoringHost*)\% Processor Time
· Process(MonitoringHost*)\Private Bytes
· Process(MonitoringHost*)\Thread Count
· Process(MonitoringHost*)\Virtual Bytes
· Process(MonitoringHost*)\Working Set
· Process(Microsoft.Mom.ConfigServiceHost)\% Processor Time
· Process(Microsoft.Mom.ConfigServiceHost)\Private Bytes
· Process(Microsoft.Mom.ConfigServiceHost)\Thread Count
· Process(Microsoft.Mom.ConfigServiceHost)\Virtual Bytes
· Process(Microsoft.Mom.ConfigServiceHost)\Working Set
· Process(Microsoft.Mom.Sdk.ServiceHost)\% Processor Time
· Process(Microsoft.Mom.Sdk.ServiceHost)\Private Bytes
· Process(Microsoft.Mom.Sdk.ServiceHost)\Thread Count
· Process(Microsoft.Mom.Sdk.ServiceHost)\Virtual Bytes
· Process(Microsoft.Mom.Sdk.ServiceHost)\Working SetOpsMgr specific performance counters:These counters are OpsMgr specific counters, they indicate the performance of OpsMgr?s different aspects on Root Management Server:
· Health Service\Workflow Count (Number of workflows that is running on this Root Management Server)
· Health Service Management Groups(*)\Active File Uploads (This this the number of file transfers this Root Management Server is handling, e.g. configuration downloading and downloading MP files to agents, if it is high for a long time and don?t drop, and there is not much discovery or MP importing happening at the moment, then there could be a problem in file transfer.)
· Health Service Management Groups(*)\Send Queue % Used (This is the size of persistent queue)
· Health Service Management Groups(*)\Bind Data Source Item Drop Rate (This is the number of data items dropped by Root Management Server for DB/DW data collection write actions, when this counter is non 0, it means Root Management Server /DB is overloaded, it can?t handle the incoming data item fast enough or there is a data item burst happening. The data items dropped will be resend by agents and when the overloaded / burst situation is over, these data items will be inserted to DB/DW)
· Health Service Management Groups(*)\Bind Data Source Item Incoming Rate (This is the number of data items received by Root Management Server for DB/DW data collection write actions)
· Health Service Management Groups(*)\Bind Data Source Item Post Rate (This is the number of data items Root Management Server wrote to DB/DW for DB/DW data collection write actions)
· OpsMgr Connector\Bytes Received (This is the number network bytes received by Root Management Server, i.e. the size of incoming bytes before decompress)
· OpsMgr Connector\Bytes Transmitted (This is the number network bytes sent by Root Management Server, i.e. the size of outgoing bytes after compress)
· OpsMgr Connector\Data Bytes Received (This is number of data bytes received by Root Management Server, i.e. the size of incoming data after decompress)
· OpsMgr Connector\Data Bytes Transmitted (This is number of data bytes sent by Root Management Server, i.e. the size of outgoing data before compress)
· OpsMgr Connector\Open Connections (This is number of connections opened on Root Management Server. It should be same as the number of agents / Management Server that is directly connected to it)
· OpsMgr Config Service\Number Of Active Requests (Number of configuration / MP requests that is being processing by Config Service)
· OpsMgr Config Service\Number Of Queued Requests (Number of queued config / MP requests sent to Config Service, if it is high for a long time, it means instance space or MP space is changing too frequently)
· OpsMgr SDK Service\Client Connections (Number of SDK connections)
· OpsMgr DB Write Action Modules(*)\Avg. Batch Size (This is the number of a data items / batch that is received by DB write action modules, If this number is 5000, it means a data item burst is happening. )
· OpsMgr DB Write Action Modules(*)\Avg. Processing Time (This is the number of seconds a DB write action modules takes to insert a batch to DB, If this number is often larger than 60, it means a we are having a DB insertion performance issue.)
· OpsMgr DW Writer Module(*)\Avg. Batch Processing Time, ms (This is the number of milliseconds for DW write action to insert a batch of data items to DW)
· OpsMgr DW Writer Module(*)\Avg. Batch Size (This is the average number of data items / batch received by DW write action modules)
· OpsMgr DW Writer Module(*)\Batches/sec (This is number of batches received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Data Items/sec (This is number of data items received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Dropped Data Item Count (This is number of data items dropped by DW write action modules)
· OpsMgr DW Writer Module(*)\Total Error Count (This is number of errors happened in DW write action modules)
Collapse this table
| State | Description |
| Healthy ? green check mark | The agent/management server is running normally. |
| Critical ? red check mark | There is a problem on the agent/management server. |
| Unknown ? gray agent. The check mark and agent name are grayed out. | The health service watcher on the root management server (RMS) that is watching the health service on the monitored computer is not receiving heartbeats from the agent anymore. It had been receiving them previously (and it was reported as healthy), but now it is not. This also means that the management servers are no longer receiving any information from the agent at all. The computer running the agent might be down, or there might be connectivity issues. You can find more information on the Health Service Watcher view. |
| Unknown ? green circle with no check mark | The status of the discovered item is unknown. There is no monitor available for this specific discovered item. |
There are several reasons for an agent or a management server or a gateway to have a gray state. Some of the common reasons include:
· Heartbeat failure
· Invalid Configuration
· System workflows failure
· OpsMgr Database/DW Performance
· RMS or Primary MS or Gateway Performance
· Network/Authentication issues
· Health service is not running
First understand the Operations Manager topology and define the scope of the issue prior to engaging in troubleshooting the agent gray issue. The following questions may help to define the scope of the issue.
· How many agents are impacted?
· Are the agents having the problem in the same network segment?
· Do they report to the same Management Server?
· How often the agents turn gray and stay in that state?
· How do you recover from this situation? (for e.g. restart the agent health service, clear the cache, the recovery is automatic, etc)
· Are the Heartbeat failures alerts being generated for these agents?
· Does this issue happen during a certain time of a day?
· Does the issue persists if you failover these agents to another MS/GTW?
· When did this problem start?
· Were there any changes made to the agents/management servers/gateway/management group?
· Are the agents in question Windows Clustered systems?
· Is the Health Service State folder excluded from AV scanning?
· Is this OpsMgr SP1 or R2 environment?
Troubleshooting the gray state issue will be dictated by which component is grayed out, where it falls in the topology and how wide spread the problem is. Let?s consider the following scenarios:
· If the agents reporting to a particular gateway/management server are grayed out then the troubleshooting should start at the gateway/management server level.
· If the gateways reporting to a particular management server are grayed out then the troubleshooting should start at the management server level.
· For agent less systems, Network devices and Unix/Linux servers the troubleshooting should start at the Agent/Management server/Gateway monitoring these objects.
· If all the systems are grayed out then the troubleshooting should start at the Root Management Server.
In other words, start the troubleshooting at a level above the component that is grayed out.
Some of the common scenarios that are seen are:
Scenario 1:
There are only few agents that are impacted and they report to different management servers. Agents stay in this state all the time. Clearing the agent cache helps in resolving the problem temporarily. However the problem comes back after a few days.
Resolution:
In this case the following steps could be taken to resolve the issue:
· Apply the KB 981263 to the affected systems.
· Exclude the Agent cache from Antivirus scanning.
· Stop the Health service
· Clear the Agent cache
· Start the Health Service
Note: It is best to proactively apply KB981263 to all monitored systems including the management servers and exclude the Agent/Management cache from Antivirus scanning to prevent this from happening on other systems.
For further details refer the following KB articles:
· Management servers or assigned agents unexpectedly appear as unavailable in the Operations Manager console in Windows Server 2003 or Windows Server 2008: http://support.microsoft.com/kb/981263 (http://support.microsoft.com/kb/981263)
· Recommendations for antivirus exclusions that relate to MOM 2005 and to Operations Manager 2007: http://support.microsoft.com/kb/975931 (http://support.microsoft.com/kb/975931)
Scenario 2:
There are only few agents that are impacted and they report to different management servers. Agents stay in this state all the time. Clearing the agent cache doesn?t help.
Resolution:
1. Once confirmed the Health service is running on the agent. The next steps is to look for any of the following events in the Operations Manager Event log on the agent:
Event ID: 1102
Event Source: HealthService
Event Description:
Rule/Monitor "%4" running for instance "%3" with id:"%2" cannot be initialized and will not be loaded. Management group "%1"
Event ID: 1103
Event Source: HealthService
Event Description:
Summary: %2 rule(s)/monitor(s) failed and got unloaded, %3 of them reached the failure limit that prevents automatic reload. Management group "%1". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).
Event ID: 1104
Event Source: HealthService
Event Description:
RunAs profile in workflow "%4", running for instance "%3" with id:"%2" cannot be resolved. Workflow will not be loaded. Management group "%1"
Event ID: 1105
Event Source: HealthService
Event Description:
Type mismatch for RunAs profile in workflow "%4", running for instance "%3" with id:"%2". Workflow will not be loaded. Management group "%1"
Event ID: 1106
Event Source: HealthService
Event Description:
Cannot access plain text RunAs profile in workflow "%4", running for instance "%3" with id:"%2". Workflow will not be loaded. Management group "%1"
Event ID: 1107
Event Source: HealthService
Event Description:
Account for RunAs profile in workflow "%4", running for instance "%3" with id:"%2" is not defined. Workflow will not be loaded. Please associate an account with the profile. Management group "%1"
Event ID: 1108
Event Source: HealthService
Event Description:
An Account specified in the Run As Profile "%7" cannot be resolved. Specifically, the account is used in the Secure Reference Override "%6". %n%n This condition may have occurred because the Account is not configured to be distributed to this computer. To resolve this problem, you need to open the Run As Profile specified below, locate the Account entry as specified by its SSID, and either choose to distribute the Account to this computer if appropriate, or change the setting in the Profile so that the target object does not use the specified Account. %n%nManagement Group: %1 %nRun As Profile: %7 %nSecureReferenceOverride name: %6 %nSecureReferenceOverride ID: %4 %nObject name: %3 %nObject ID: %2 %nAccount SSID: %5
Event ID: 4000
Event Source: HealthService
Event Description:
A monitoring host is unresponsive or has crashed. The status code for the host failure was %1.
Event ID: 21016
Event Source: OpsMgr Connector
Event Description:
OpsMgr was unable to set up a communications channel to %1 and there are no failover hosts. Communication will resume when %1 is available and communication from this computer is allowed.
Event ID: 21006
Event Source: OpsMgr Connector
Event Description:
The OpsMgr Connector could not connect to %1:%2. The error code is %3(%4). Please verify there is network connectivity, the server is running and has registered it's listening port, and there are no firewalls blocking traffic to the destination.
Event ID: 20070
Event Source: OpsMgr Connector
Event Description:
The OpsMgr Connector connected to %1, but the connection was closed immediately after authentication occurred. The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration. Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.
Event ID: 20051
Event Source: OpsMgr Connector
Event Description:
The specified certificate could not be loaded because the certificate is not currently valid. Verify that the system time is correct and re-issue the certificate if necessary%n Certificate Valid Start Time : %1%n Certificate Valid End Time : %2
Event Source: ESE
Event Category: Transaction Manager
Event ID: 623
Description: HealthService (<PID>) The version store for instance <instance> ("<name>") has reached its maximum size of <value>Mb. It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back. Possible long-running transaction:
SessionId: <value>
Session-context: <value>
Session-context ThreadId: <value>.
Cleanup:<value>
2. Event ID 1102 and 1103 indicate some of the workflows failed to load. If these are the core system workflows then it could lead to the agent gray issue. The focus in this case should be resolving these events. One of the following Events 1104, 1105, 1106, 1107, and 1108 could lead to Event ID: 1102 and Event 1103. This would be generally due to misconfigured Run as accounts. In R2, the common reason this happens is either because the Run as accounts are configured to be used with the wrong class or configured not to be distributed to the agent.
3. Event ID 4000 indicates the Monitoringhost.exe process crashed. If it?s a Dll mismatch or missing registry keys issue then a quick reinstall of the agent might resolve the problem. If that doesn?t help the following tools could be leveraged:
a. Process Monitor capture until the point the process crashes (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) )
b. Adplus dump in crash mode (http://support.microsoft.com/kb/286350 (http://support.microsoft.com/kb/286350) )
c. If the agent is monitoring network devices and it is running on Windows Server 2003 system you may want to apply the following hotfix:
The monitoring of SNMP devices may stop intermittently in System Center Operations Manager or in System Center Essentials(KB 982501): http://support.microsoft.com/kb/982501 (http://support.microsoft.com/kb/982501)
4. Event ID 21006 indicates there are communication issues between the agent and the management server. If the agent is using certificate for mutual authentication, verify the certificate has not expired; the agent is using the correct certificate or the certificate has expired. If Kerberos is being used, verify the agent can communicate with Active Directory. If authentication is working correctly, it could be the packets from the agent are not making it to the Management Server/Gateway. A simple telnet to port 5723 from the agent to the management server might be a good start. A simultaneous Network trace between the agent and the management server while reproducing the communication failures would help in finding if the packets are making it to the MS; there is no device in between trying to optimize the traffic or dropping some packets. (See http://support.microsoft.com/kb/812953/en-us (http://support.microsoft.com/kb/812953/en-us) )
5. The other possibility is that the Health Service is hung. An Adplus dump in hang mode would help in determining the cause for the hang. (http://support.microsoft.com/kb/286350 (http://support.microsoft.com/kb/286350) )
6. Event ID: 623 typically occur in a large Operations Manager environment in which a management server or an agent computer is managing many workflows. Refer the following KB for further details and resolution:
One or more management servers and their managed devices are dimmed in the Operations Manager Console of Operations Manager 2007:
http://support.microsoft.com/kb/975057 (http://support.microsoft.com/kb/975057)
Scenario 3:
All the agents reporting to one particular management server/gateway are grayed out.
Resolution:
1. In this scenario the first step would be to understand what kind of workloads the management server/gateway is monitoring. For e.g. Number of Network Devices, Cross Platform Agents, Synthetic Transactions and Windows agents, Agentless machines.
2. Like in scenario 2, ensure the Health Service is turned on and currently running on the Management server/Gateway.
3. Check if the Management Server is put in maintenance mode. If it is then remove it from maintenance mode.
4. Check for the similar events as in Scenario 2. If you see Event ID: 21006 the difference here is that the Management Server/Gateway is unable to communicate to its parent server. For MS it would be the RMS or if it?s a gateway it could be the RMS or MS. Verify same things as Step 5 in Scenario 2.
5. In addition to the above if the health service is monitoring network devices and the management server is running on Windows Server 2003 system you may want to apply the following hotfix:
The monitoring of SNMP devices may stop intermittently in System Center Operations Manager or in System Center Essentials(KB 982501): http://support.microsoft.com/kb/982501 (http://support.microsoft.com/kb/982501)
6. Look for the following events in the Operations Manager Event log. The following events generally indicate there are performance issues on the Management Server or SQL Server hosting OperationsManager or OperationsManagerDW:
Event ID: 2115
Event Source: HealthService
Event Description:
A Bind Data Source in Management Group %1 has posted items to the workflow, but has not received a response in %5 seconds. This indicates a performance or functional problem with the workflow.%n Workflow Id : %2%n Instance : %3%n Instance Id : %4%n
Event ID: 5300
Event Source: HealthService
Event Description:
Local health service is not healthy. Entity state change flow is stalled with pending acknowledgement. %n%nManagement Group: %2 %nManagement Group ID: %1
Event ID: 4506
Event Source: HealthService
Event Description: Operations Manager
Data was dropped due to too much outstanding data in rule "%2" running for instance "%3" with id:"%4" in management group "%1".
Event ID: 31551
Event Source: Health Service Modules
Event Description:
Failed to store data in the Data Warehouse. The operation will be retried.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
Event ID: 31552
Event Source: Health Service Modules
Event Description:
Failed to store data in the Data Warehouse.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
Event ID: 31553
Event Source: Health Service Modules
Event Description:
Data was written to the Data Warehouse staging area but processing failed on one of the subsequent operations.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
Event ID:31557
Event Source: Health Service Modules
Event Description:
Failed to obtain synchronization process state information from Data Warehouse database. The operation will be retried.%rException '%5': %6 %n%nOne or more workflows were affected by this. %n%nWorkflow name: %2 %nInstance name: %3 %nInstance ID: %4 %nManagement group: %1
7. To troubleshoot Management Server/Gateway Performance and SQL Performance look at the following sections later in this document:
Troubleshooting Management Server/Gateway Performance
Troubleshooting SQL Performance
8. The Event ID: 3155X can also be logged due to incorrect Runas account configuration or missing permissions for the Runas accounts. Review the following blog, it has an excel sheet that lists the permissions for various accounts that are used by OpsMgr.
OpsMgr security account rights mapping - what accounts need what privileges?
http://blogs.technet.com/b/kevinholman/archive/2008/04/15/opsmgr-security-account-rights-mapping-what-accounts-need-what-privileges.aspx (http://blogs.technet.com/b/kevinholman/archive/2008/04/15/opsmgr-security-account-rights-mapping-what-accounts-need-what-privileges.aspx)
Scenario 4: All the agents reporting to one particular management server flip-flop from gray to healthy and healthy to gray state intermittently.
Scenario 5: All the agents reporting in the environment keep flip flopping from gray to healthy and healthy to gray state intermittently.
Resolution: Some of the common causes for temporary gray state are:
· The parent server of the agents temporarily went offline.
· Agents flooding the MS with operational data (alerts, state, discovery, etc) could lead to higher system resource usage on OpsMgr DB and OpsMgr servers
· There were network outages that lead to a temporary communication failure between the parent server and the agents.
· Management Pack changes in OpsMgr Console requires an OpsMgr configuration & MP redistribution to the agents. If the change impacts larger agent base, it could lead to higher system resource usage on OpsMgr DB and OpsMgr servers.
The key piece of information for these two scenarios is to understand: how long the gray state lasted and what time of the day it happened?
This will help in narrowing the scope of the problem quickly and determine the troubleshooting path that needs to be taken.
To troubleshoot Management Server/Gateway Performance and SQL Performance look at the following sections later in this document:
Troubleshooting Management Server/Gateway Performance
Troubleshooting SQL Performance
Troubleshooting Management Server/Gateway Performance:
Root Management Server (RMS):
Configuration update bursts are caused by MP imports and discovery data. When system performance is slow, the top two most likely bottlenecks are, first, the CPU and, second, the disk (OpsMgr installation disk).
The RMS is responsible for generating and sending configuration files to all affected Health Services.
For Workflow reloading (caused by new configuration on RMS), the likely bottlenecks are, again, CPU first and disk (OpsMgr installation disk) second. The RMS is responsible for reading the configuration file, loading and initializing all workflows on the RMS and updating the RMS HealthService store when the configuration file is updated on the RMS.
For local workflow activity bursts (when agents change their availability), the main bottlenecks are the CPU. If the CPU is not maxed out, it could potentially be the disk. The RMS is responsible for monitoring the availability of all agents using RMS local workflows. The RMS also hosts distributed dependency monitors which use the disk.
Management Server (MS):
During configuration update burst (caused by MP import and discovery), the typical bottlenecks are, first, the CPU and, second, the disk (the OpsMgr installation disk). The MS is responsible of forwarding configuration files from the RMS to the target agents.
For Operational data collection, bottlenecks are normally caused by the CPU. The disk may also be maxed out, but not as likely. The MS is responsible for decompressing and decrypting incoming operational data and inserting it into the Operational Database. It also sends acknowledgements (ACKs) back to the agents/Gateways after receiving operational data and uses disk queuing to temporarily store these outgoing ACKs. Lastly, the MS will also forward monitor state changes (again using a disk queue) to the RMS for distributed dependency monitors.
Gateway (GW):
The GW is both CPU- and IO-bound. When it is relaying a large amount of data, both of these may show high usage. Most of CPU cost is due to decompression, compression, encryption, and decryption of the incoming data along with transferring it. All data received by the GW, from the agents, are stored in a persistent queue on disk to be read and forwarded by the GW?s HealthService to the MS. This can cause heavy usage of the disk. This can be significant when the GW is temporarily taken offline and then must handle accumulated agent data that the agents generated and attempted to send while the GW was offline.
Information to collect for each affected Management server/Gateway:
· Exact Windows version, edition and build (i.e. Windows Server 2003 Enterprise x64 SP2)
· Number of processors
· Amount of RAM
· Drive letter containing Health Service State folder
· Is anti-virus configured to exclude Health Service Store? For more details review the following KB:
Recommendations for antivirus exclusions that relate to MOM 2005 and to Operations Manager 2007: http://support.microsoft.com/kb/975931 (http://support.microsoft.com/kb/975931)
· RAID level (0, 1, 5, 0+1 or 1+0) for the drive used by Health Service State. Number of disks used for the RAID
· Is there battery backed write cache enabled on the array controller?
Troubleshooting SQL Performance:
Operational Database (OperationsManager):
For the DB, the most common bottleneck is the disk array. Assuming the disk is not maxed out; the CPU is the next most likely culprit. Slowdown in the DB will sometimes occurs with operational data ?storms? (very high rates of Events, Alerts, Performance Data and/or State Changes for a relatively long period of time). A short burst usually won?t cause any significant delay for an extended period of time.
During operational data insertion, the DB?s disks are primarily being used for writes. CPU usage is usually due to SQL Server churn. This may happen with large, complex queries, heavy data insertion and the grooming of large tables (by default, this occurs at midnight). Grooming the Events and Performance Data tables, even when they are very large, is usually not very costly, but grooming the Alert and State Change tables can be quite CPU-intensive if they ever become very large.
The DB will also be CPU-bound when handing configuration redistribution burst, which are caused by MP imports or huge instance space change. In these cases, the Config Service queries the DB for the agents? new configuration, often causing CPU spikes on the DB, before sending the configuration updates to the agents.
Data Warehouse (OperationsManagerDW):
The most likely bottleneck is the disk array. This is usually caused by very large operational data insertions. In these cases, the disks will be mostly busy doing writes. There generally will not be many reads, except to handle manually-generated Reporting views, which run queries on the DW.
CPU usage is usually due to SQL Server churn. Heavy partitioning activity (when tables become very large and then get partitioned), the executing of complex reports, and large amounts of Alerts in the DB (which the DW must constantly sync up with) may all cause CPU spikes.
Information to collect for each affected SQL database server:
· Exact version, edition and build of SQL (i.e. SQL Server 2005 Enterprise x64 SP2 Build 3355)
· Exact Windows version, edition and build (i.e. Windows Server 2003 Enterprise x64 SP2)
· Number of processors
· Amount of RAM
· Amount of memory allocated to SQL
· If SQL is 32-bit, is AWE enabled?
You can get most of the above information in SQL Management Studio or SQL Enterprise Manager by opening the properties of the server and clicking the General and Memory tabs. The General tab includes the SQL version, Windows version, platform, amount of RAM and number of processors. The Memory tab includes the memory allocated to SQL and (in SQL 2005 and SQL 2008) the AWE option. To find out if AWE is enabled in SQL 2000, run the following in SQL Query Analyzer:
sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'awe enabled'
The config_value and run_value will be 1 if AWE is enabled.
· If OS is 32-bit and RAM is 4 GB or greater, are the /pae and/or /3gb switches in Boot.ini?
These options could be configured incorrectly if the server was originally installed with 4 GB or less RAM ? and RAM was later upgraded.
For 32-bit servers with 4 GB of RAM, the /3gb switch in Boot.ini increases the amount of memory that SQL can address (from 2 GB to 3 GB).
For 32-bit servers with more than 4 GB of RAM, the /3gb switch in Boot.ini could actually limit the amount of memory that SQL can address. For these systems, add the /pae switch in Boot.ini and enable AWE in SQL.
· On a multi-processor system, what is Max Degree of Parallelism (MAXDOP) set to?
On SQL 2005 and SQL 2008, this option is on the Advanced tab in the properties of the server. To determine this setting on SQL 2000, run the following in SQL Query Analyzer:
sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'max degree of parallelism'
The default value is 0, which means all available processors will be used. A setting of 0 is fine for servers with eight or fewer processors. For servers with more than eight processors, the time it takes SQL to coordinate the use of all processors may be counterproductive. Therefore, for servers with more than eight processors, you generally should set Max Degree of Parallelism to a value of 8:
sp_configure 'show advanced options', 1
GO
RECONFIGURE WITH OVERRIDE
GO
sp_configure 'max degree of parallelism', 8
GO
RECONFIGURE WITH OVERRIDE
GO
· Drive letters containing DW and/or Ops and Tempdb files
· Is anti-virus configured to exclude SQL data and log files? It is well-known that AV software cannot scan SQL database files and that attempting to do so can degrade performance.
· Free space on drives containing DW and/or Ops and Tempdb files
· SAN vs. local storage
· RAID level (0, 1, 5, 0+1 or 1+0) for drives used by SQL
· If using SAN storage, the number of spindles on each LUN used by SQL
· On OpsMgr 2007 SP1, is the DW event grooming hotfix (969130) or the SP1 hotfix rollup (971541) applied?
· If the converted Exchange 2007 MP is being used or has ever been used, how many rows are in the LocalizedText table in the Ops DB and the EventPublisher table in the DW DB? To find out, run:
USE OperationsManager SELECT COUNT(*) FROM LocalizedText
USE OperationsManagerDW SELECT COUNT(*) FROM EventPublisher
Counters to identify memory pressure:
· MSSQL$<instance>: Buffer Manager: Page Life expectancy ? How long pages persist in the buffer pool. If this value is below 300 seconds, it may indicate that the server could use more memory. It could also result from index fragmentation.
· MSSQL$<instance>: Buffer Manager: Lazy Writes/sec ? Lazy writer frees space in the buffer by moving pages to disk. Generally, the value should not consistently exceed 20 writes per second. Ideally, it would be close to zero.
· Memory: Available Mbytes - Values below 100 MB may indicate memory pressure. Memory pressure is clearly present when this amount is less than 10 MB.
· Process: Private Bytes: _Total ? This is the amount of memory (physical and page) being used by all processes combined.
· Process: Working Set: _Total ? This is the amount of physical memory being used by all processes combined. If the value for this counter is significantly below the value for Process: Private Bytes: _Total, it indicates that processes are paging too heavily. A difference of more than 10% is probably significant.
Counters to identify disk pressure: Capture these Physical Disk counters for all drives containing SQL data or log files:
· % Idle Time ? How much disk idle time is being reported. Anything below 50% could indicate a disk bottleneck.
· Avg. Disk Queue Length ? This value should not exceed 2 times the number of spindles on a LUN. For example, if a LUN has 25 spindles, a value of 50 is acceptable. However, if a LUN has 10 spindles, a value of 25 is too high. You could use the following formulas based on the RAID level and number of disks in the RAID configuration
RAID 0 ? All of the disks are doing work in a RAID 0 set
Average Disk Queue Length <= # (Disks in the array) *2
RAID 1 ? half the disks are ?doing work? so only half of them can be counted toward Disks Queue
Average Disk Queue Length <= # (Disks in the array/2) *2
RAID 10 ? half the disks are ?doing work? so only half of them can be counted toward Disks Queue
Average Disk Queue Length <= # (Disks in the array/2) *2
RAID 5 ? All of the disks are doing work in a RAID 5 set
Average Disk Queue Length <= # Disks in the array *2
· Avg. Disk sec/Transfer ? The number of seconds it takes to complete one disk I/O.
· Avg. Disk sec/Read ? The average time, in seconds, of a read of data from the disk.
· Avg. Disk sec/Write ? The average time, in seconds, of a write of data to the disk.
The above three counters should be around .020 (20 ms) or below consistently and never exceed.050 (50 ms). Here are the thresholds documented in the SQL performance troubleshooting guide:
Less than 10 ms ? very good
Between 10 - 20 ms ? okay
Between 20 - 50 ms ? slow, needs attention
Greater than 50 ms ? Serious I/O bottleneck
· Disk Bytes/sec ? The number of bytes being transferred to or from the disk per second.
· Disk Transfers/sec ? The number of input and output operations per second (IOPS).
When % Idle Time is low (10% or less) ? which means that the disk is fully utilized ? the above two counters will provide a good indication of the maximum throughput of the drive in bytes and in IOPS, respectively. The throughput of a SAN drive is highly variable, depending on the number of spindles, the speed of the drives and the speed of the channel. The best bet is to check with the SAN vendor to find out how many bytes and IOPS the drive should support. If % Idle Time is low and the values for these two counters do not meet the expected throughput of the drive, engage the SAN vendor to troubleshoot.
he following links are great resource for getting deeper insight into troubleshooting SQL performance:
Troubleshooting Performance Problems in SQL Server 2005: http://technet.microsoft.com/en-us/library/cc966540.aspx (http://technet.microsoft.com/en-us/library/cc966540.aspx)
Troubleshooting Performance Problems in SQL Server 2008:
http://msdn.microsoft.com/en-us/library/dd672789(SQL.100).aspx (http://msdn.microsoft.com/en-us/library/dd672789(SQL.100).aspx)
OpsMgr 2007 Performance counters
The following sections describe the performance counters that can be used to monitor and troubleshoot OpsMgr performance.
Gateway Server Role:Overall performance counters:These counters indicate the overall performance on Gateway:
· Processor(_Total)\% Processor Time
· Memory\% Committed Bytes In Use
· Network Interface(*)\Bytes Total/sec
· LogicalDisk(*)\% Idle Time
· LogicalDisk(*)\Avg. Disk Queue LengthOpsMgr process generic performance counters:These counters indicate the overall performance of OpsMgr processes on Gateway:
· Process(HealthService)\%Processor Time
· Process(HealthService)\Private Bytes (Depend on how many agents this Gateway is managing, this number could be different; it could be several hundred Megabyte.)
· Process(HealthService)\Thread Count
· Process(HealthService)\Virtual Bytes
· Process(HealthService)\Working Set
· Process(MonitoringHost*)\% Processor Time
· Process(MonitoringHost*)\Private Bytes
· Process(MonitoringHost*)\Thread Count
· Process(MonitoringHost*)\Virtual Bytes
· Process(MonitoringHost*)\Working SetOpsMgr specific performance counters:These counters are OpsMgr specific counters; they indicate the performance of OpsMgr?s different aspects on Gateway:
· Health Service\Workflow Count
· Health Service Management Groups(*)\Active File Uploads (This this the number of file transfers this Gateway is handling, e.g. downloading MP files to agents, if it is high for a long time and don?t drop, and there is not much MP importing happening at the moment, then there could be a problem in file transfer.)
· Health Service Management Groups(*)\Send Queue % Used (This is the size of persistent queue, if it?s larger than 10 for a long time and don?t recover, then the queue has been backed up caused by overloaded OpsMgr system, e.g. Management Server/DB is too busy or offline)
· OpsMgr Connector\Bytes Received (This is the number network bytes received by Gateway, i.e. the size of incoming bytes before decompress)
· OpsMgr Connector\Bytes Transmitted (This is the number network bytes sent by Gateway, i.e. the size of outgoing bytes after compress)
· OpsMgr Connector\Data Bytes Received (This is number of data bytes received by Gateway, i.e. the size of incoming data after decompress)
· OpsMgr Connector\Data Bytes Transmitted (This is number of data bytes sent by Gateway, i.e. the size of outgoing data before compress)
· OpsMgr Connector\Open Connections (This is number of connections opened on Gateway. It should be same as the number of agents / MSs that is directly connected to it)
Management Server Role:Overall performance counters:These counters indicate the overall performance on Management Server:
· Processor(_Total)\% Processor Time
· Memory\% Committed Bytes In Use
· Network Interface(*)\Bytes Total/sec
· LogicalDisk(*)\% Idle Time
· LogicalDisk(*)\Avg. Disk Queue LengthOpsMgr process generic performance counters:These counters indicate the overall performance of OpsMgr processes on Management Server:
· Process(HealthService)\% Processor Time
· Process(HealthService)\Private Bytes (Depend on how many agents this Management Server is managing, this number could be different, it could be several hundred M byte.)
· Process(HealthService)\Thread Count
· Process(HealthService)\Virtual Bytes
· Process(HealthService)\Working Set
· Process(MonitoringHost*)\% Processor Time
· Process(MonitoringHost*)\Private Bytes
· Process(MonitoringHost*)\Thread Count
· Process(MonitoringHost*)\Virtual Bytes
· Process(MonitoringHost*)\Working SetOpsMgr specific performance counters:These counters are OpsMgr specific counters; they indicate the performance of OpsMgr?s different aspects on Management Server:
· Health Service\Workflow Count (Number of workflows that is running on this Management Server)
· Health Service Management Groups(*)\Active File Uploads (This this the number of file transfers this Management Server is handling, e.g. downloading MP files to agents, if it is high for a long time and don?t drop, and there is not much MP importing happening at the moment, then there could be a problem in file transfer.)
· Health Service Management Groups(*)\Send Queue % Used (This is the size of persistent queue, if it?s larger than 10 for a long time and don?t recover, then the queue has been backed up caused by overloaded OpsMgr system, e.g. Root Management Server is too busy or offline)
· Health Service Management Groups(*)\Bind Data Source Item Drop Rate (This is the number of data items dropped by Management Server for DB/DW data collection write actions, when this counter is non 0, it means Management Server /DB is overloaded, it can?t handle the incoming data item fast enough or there is a data item burst happening. The data items dropped will be resend by agents and when the overloaded / burst situation is over, these data items will be inserted to DB/DW)
· Health Service Management Groups(*)\Bind Data Source Item Incoming Rate (This is the number of data items received by Management Server for DB/DW data collection write actions)
· Health Service Management Groups(*)\Bind Data Source Item Post Rate (This is the number of data items Management Server wrote to DB/DW for DB/DW data collection write actions)
· OpsMgr Connector\Bytes Received (This is the number network bytes received by Management Server, i.e. the size of incoming bytes before decompress)
· OpsMgr Connector\Bytes Transmitted (This is the number network bytes sent by Management Server, i.e. the size of outgoing bytes after compress)
· OpsMgr Connector\Data Bytes Received (This is number of data bytes received by Management Server, i.e. the size of incoming data after decompress)
· OpsMgr Connector\Data Bytes Transmitted (This is number of data bytes sent by Management Server, i.e. the size of outgoing data before compress)
· OpsMgr Connector\Open Connections (This is number of connections opened on Management Server. It should be same as the number of agents / Root Management Server that is directly connected to it)
· OpsMgr DB Write Action Modules(*)\Avg. Batch Size (This is the number of a data items / batch that is received by DB write action modules, If this number is 5000, it means a data item burst is happening. )
· OpsMgr DB Write Action Modules(*)\Avg. Processing Time (This is the number of seconds a DB write action modules takes to insert a batch to DB, If this number is often larger than 60, it means a we are having a DB insertion performance issue.)
· OpsMgr DW Writer Module(*)\Avg. Batch Processing Time, ms (This is the number of milliseconds for DW write action to insert a batch of data items to DW)
· OpsMgr DW Writer Module(*)\Avg. Batch Size (This is the average number of data items / batch received by DW write action modules)
· OpsMgr DW Writer Module(*)\Batches/sec (This is number of batches received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Data Items/sec (This is number of data items received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Dropped Data Item Count (This is number of data items dropped by DW write action modules)
· OpsMgr DW Writer Module(*)\Total Error Count (This is number of errors happened in DW write action modules)
Root Management Server Role:Overall performance counters:These counters indicate the overall performance on Root Management Server:
Processor(_Total)\% Processor Time
Memory\% Committed Bytes In Use
Network Interface(*)\Bytes Total/sec
LogicalDisk(*)\% Idle Time
LogicalDisk(*)\Avg. Disk Queue LengthOpsMgr process generic performance counters:These counters indicate the overall performance of OpsMgr processes on Root Management Server:
· Process(HealthService)\% Processor Time
· Process(HealthService)\Private Bytes (Depend on how many agents this Root Management Server is managing, this number could be different, it could be several hundred M byte.)
· Process(HealthService)\Thread Count
· Process(HealthService)\Virtual Bytes
· Process(HealthService)\Working Set
· Process(MonitoringHost*)\% Processor Time
· Process(MonitoringHost*)\Private Bytes
· Process(MonitoringHost*)\Thread Count
· Process(MonitoringHost*)\Virtual Bytes
· Process(MonitoringHost*)\Working Set
· Process(Microsoft.Mom.ConfigServiceHost)\% Processor Time
· Process(Microsoft.Mom.ConfigServiceHost)\Private Bytes
· Process(Microsoft.Mom.ConfigServiceHost)\Thread Count
· Process(Microsoft.Mom.ConfigServiceHost)\Virtual Bytes
· Process(Microsoft.Mom.ConfigServiceHost)\Working Set
· Process(Microsoft.Mom.Sdk.ServiceHost)\% Processor Time
· Process(Microsoft.Mom.Sdk.ServiceHost)\Private Bytes
· Process(Microsoft.Mom.Sdk.ServiceHost)\Thread Count
· Process(Microsoft.Mom.Sdk.ServiceHost)\Virtual Bytes
· Process(Microsoft.Mom.Sdk.ServiceHost)\Working SetOpsMgr specific performance counters:These counters are OpsMgr specific counters, they indicate the performance of OpsMgr?s different aspects on Root Management Server:
· Health Service\Workflow Count (Number of workflows that is running on this Root Management Server)
· Health Service Management Groups(*)\Active File Uploads (This this the number of file transfers this Root Management Server is handling, e.g. configuration downloading and downloading MP files to agents, if it is high for a long time and don?t drop, and there is not much discovery or MP importing happening at the moment, then there could be a problem in file transfer.)
· Health Service Management Groups(*)\Send Queue % Used (This is the size of persistent queue)
· Health Service Management Groups(*)\Bind Data Source Item Drop Rate (This is the number of data items dropped by Root Management Server for DB/DW data collection write actions, when this counter is non 0, it means Root Management Server /DB is overloaded, it can?t handle the incoming data item fast enough or there is a data item burst happening. The data items dropped will be resend by agents and when the overloaded / burst situation is over, these data items will be inserted to DB/DW)
· Health Service Management Groups(*)\Bind Data Source Item Incoming Rate (This is the number of data items received by Root Management Server for DB/DW data collection write actions)
· Health Service Management Groups(*)\Bind Data Source Item Post Rate (This is the number of data items Root Management Server wrote to DB/DW for DB/DW data collection write actions)
· OpsMgr Connector\Bytes Received (This is the number network bytes received by Root Management Server, i.e. the size of incoming bytes before decompress)
· OpsMgr Connector\Bytes Transmitted (This is the number network bytes sent by Root Management Server, i.e. the size of outgoing bytes after compress)
· OpsMgr Connector\Data Bytes Received (This is number of data bytes received by Root Management Server, i.e. the size of incoming data after decompress)
· OpsMgr Connector\Data Bytes Transmitted (This is number of data bytes sent by Root Management Server, i.e. the size of outgoing data before compress)
· OpsMgr Connector\Open Connections (This is number of connections opened on Root Management Server. It should be same as the number of agents / Management Server that is directly connected to it)
· OpsMgr Config Service\Number Of Active Requests (Number of configuration / MP requests that is being processing by Config Service)
· OpsMgr Config Service\Number Of Queued Requests (Number of queued config / MP requests sent to Config Service, if it is high for a long time, it means instance space or MP space is changing too frequently)
· OpsMgr SDK Service\Client Connections (Number of SDK connections)
· OpsMgr DB Write Action Modules(*)\Avg. Batch Size (This is the number of a data items / batch that is received by DB write action modules, If this number is 5000, it means a data item burst is happening. )
· OpsMgr DB Write Action Modules(*)\Avg. Processing Time (This is the number of seconds a DB write action modules takes to insert a batch to DB, If this number is often larger than 60, it means a we are having a DB insertion performance issue.)
· OpsMgr DW Writer Module(*)\Avg. Batch Processing Time, ms (This is the number of milliseconds for DW write action to insert a batch of data items to DW)
· OpsMgr DW Writer Module(*)\Avg. Batch Size (This is the average number of data items / batch received by DW write action modules)
· OpsMgr DW Writer Module(*)\Batches/sec (This is number of batches received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Data Items/sec (This is number of data items received by DW write action modules per second)
· OpsMgr DW Writer Module(*)\Dropped Data Item Count (This is number of data items dropped by DW write action modules)
· OpsMgr DW Writer Module(*)\Total Error Count (This is number of errors happened in DW write action modules)
Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use
(http://go.microsoft.com/fwlink/?LinkId=151500)
for other considerations.
APPLIES TO
- Microsoft System Center Essentials 2007
- Microsoft System Center Essentials 2007 Service Pack 1
- Microsoft System Center Essentials 2010
- Microsoft System Center Operations Manager 2007
- Microsoft System Center Operations Manager 2007 Service Pack 1
- Microsoft System Center Operations Manager 2007 R2
Keywords: | KB2288515 |

Back to the top