Communication Monitoring

The A&D Solution in IFS Cloud is distributed. It requires the coordination of multiple services via a series of complex message interactions across various frameworks and infrastructure, each with their own failure points and error surfacing mechanisms. When errors occur in the communication request or event, the errors are surfaced or logged in various ways that make it difficult to cohesively handle errors across the solution.

The monitoring service which is centered around the Communication Log page (ADMON_COMMUNICATION_LOG) is used to monitor communication between IFS Cloud and A&D Services and surface errors via a uniform interface.

Communication Pattern – Request

Using the communication between Mobile Maintenance (MM) for Aviation and Maintenix services as an example, the below diagram highlights the key communication stages in a request pattern. Each stage will appear as a log record in the Communication Log page.

The following table describes the communication log type codes used to identify each stage in the Request communication.

Number Communication Type Notes
1 Request Initiated A request processing flow has been started in a service.
2 Request Sent The request has been sent by the client service.
3 Request Received The request has been received by the target service.
4 Response Sent The target service has sent a response.
5 Response Received The client service has received the response.
6 Request Completed The client has completed the processing of the response.
7 Error This is a catch-all state that would result if any unexpected (i.e., non-business) error is encountered that prevents the message from successfully completing. An IFS Application Event will be emitted to alert administrators of these types of failures.
NA Exception An exception refers to a specific event that disrupts the normal flow of communication. In the event of an exception, six retry attempts are made to process the message.
NA Ad hoc An ad hoc entry indicates either a duplicate message that has been skipped to prevent reprocessing, or an entry added to support troubleshooting or development.

Communication Pattern – Event

Using the communication between Mobile Maintenance (MM) for Aviation and Maintenix services as an example, the following diagram highlights the key communication stages in an Event pattern. Each stage will appear as a log record in the Communication Log.

The following table describes the communication log type codes used to identify each step in the Event communication.

Number Communication Type Notes
1 Event Sent The event has been produced by a source service.
2 Event Received The event has been received by a consumer service.
3 Event Completed The consumer service has completed processing of the event.
4 Error This would be a catch all state that would result if any unexpected that would prevent the data synchronization communication from completing. An IFS Application Event will be emitted to alert administrators of these types of failures.
NA Exception An exception refers to a specific event that disrupts the normal flow of communication. In the event of an exception, six retry attempts are made to process the message.
NA Ad hoc An ad hoc entry indicates either a duplicate message that has been skipped to prevent reprocessing, or an entry added to support troubleshooting or development.

Communication Errors

The ADMON_ERROR_ALERT event is triggered when an error occurs in the communication process (i.e., when a log record is recorded with communication type Error). The event contains information about the error. An event action can be created to respond to this event, for example, to send an email to administrators.