Recording Human-Machine-Interaction: API-Recording vs ATG-Recording

The purpose of recording human-machine-interaction is that we can identify and fix causes of past problems so that these causes cannot happen anymore. These causes may either be on the human side or on the machine-side or in the combination of both.

In mission-critical systems you cannot afford to repeat (near-)accidents and (near-)disasters. Therefore you need to identify causes of such problems reliably the first time they happen. For that it is essential to use sufficient recording and logging that allows you to correctly reproduce all information required for such an analysis. Due to the importance of this, it is crucial to do this recording and logging correctly.

Recording human-machine-interaction is an essential component of a complete logging solution. Therefore it is important to do this correctly, too.

There are several methods for HMI-recording. (HMI is short for human-machine interaction.)

In this article I’ll pick two of them: API-recording and ATG-recording. These methods are both used in practice for recording human-computer-interaction. I’ll highlight the advantages and disadvantages of both methods, and provide a conclusion that may be surprising.

I’ll start with ATG-recording because that’s easier to explain and therefore a good introduction.

ATG-recording

ATG means “At the Glass”. (“Glass” is a metaphor for the screen surface, even though it ceased to be made from glass when we switched away from CRT monitors.)

ATG is used to describe a recording method that records the contents of computer screens by capturing and recording the monitor signal ( DVI, DisplayPort, … ), and compressing it with a loss-less compression method. Often, Mouse and Keyboard activity is also recorded by tapping into the USB-connections between these input devices and the computer. We assume this to be available, too.

The aim of ATG is to record the contents of a monitor as exactly as possible.

However, due to technical limitations, the temporal resolution of such methods is rather low, and typically ranges in the area of 4-12 frames per second. It is unlikely that these technical limitations will go away in the foreseeable future because technical advances are also used to increase screen resolution, which increases the challenge for recording with this method. So ATG may never catch up. In fact, ATG is only loss-less for an individual still-image. It is typically not loss-less with respect to the monitor signal because it drops most frames.

Due to the low temporal resolution of ATG, you may not be able to reconstruct everything reliably. Especially, it seems to be difficult to match mouse-clicks correctly to GUI-elements in fast work-flows. ( ever worked over a bad remote-desktop connection? The effects are probably similar… )

Here’s a list of advantages of ATG-recording:

Reproduces screen contents exactly.
It is independent of operating system and applications, i.e., you can keep your ATG-recording solution when operating system and applications change.
No changes in the software configuration of your system are needed because it is a hardware solution.

And a list of disadvantages:

low temporal resolution
Depends on screen resolution, i.e., you may need to replace your ATG-recording system if you switch to a higher screen resolution.
Depends on the type of monitor connector, i.e., you may need to replace your ATG-recording system if you use different monitor connectors.
Hardware installation required.
It is difficult to interpret the keyboard input that is recorded from USB because this does not contain information about how this information is interpreted according to the current configuration of the operating system. This applies to things like Keyboard-Layout and Auto-repeat keys etc. If you try to analyze raw keyboard input by making assumptions about the configuration and behavior of the operating system, you will start to do guess-work instead of doing facts-based research.

API-recording

API-recording means to record the “application programming interface” which is provided to applications by the operating system. Here, we care about those APIs which provide Human-Computer-Interface capabilities: These are APIs for displaying stuff on the screen, and APIs for getting the input from mouse and keyboard and other input devices.

In the case of Linux or similar systems, all of these APIs are typically provided by the X Window system, also called X11. On other operating systems, similar APIs are available.

API-recording has the following advantages over ATG-recording:

High temporal resolution.
No hardware installation required because it is typically a software-only solution.
You get a better interpretation of keyboard input because
- the recorder can also record the current keyboard layout as configured in the operating system.
- the recorder can also record key-events generated by auto-repeat
Therefore it can display actual logical keyboard input as seen by your application.

But it also has disadvantages:

Typically, you have to install them on the computers which run your mission critical system. In many cases, this will be sufficiently easy and therefore not a real problem. But there may be cases where it may be difficult or perceived as a risk.
If there are bugs in a component of the operating system ( such as in X11 ), then the recorded screen content may differ from the actual screen content. Such bugs are rare, but if they occur, you’ll have no way to recognize them with API-recording.
If you switch to another operating system, you may need to purchase API-recording from another vendor. If you upgrade the operating system, you may need to upgrade the API-recorder, if the operating system provides new API-features that are actually used by your applications.
Often, API-recorders only support a subset of all available API functionality. Therefore you may need to upgrade the API-recorder if new versions of your applications use API-features that are not yet supported by the API-recorder.

Hardware-based variant of API-recording

In operating systems where the relevant APIs are available as a network interface, API-recording can also be provided by a hardware device which is connected to the network.

This may be useful in cases where software installation is difficult or not desirable.

Comparison

Both methods are useful for HMI-recording, though each method has its strengths and weaknesses.

Both methods are capable to reproduce screen contents and keyboard/mouse-input in a useful way and with high reliability and quality. In probably more than 95% of all use-cases it does not matter which method is used.

However in mission-critical systems, we cannot afford to ignore the remaining few percents.
So, we focus in these remaining cases where the choice of method does matter.

First, let’s look at some of the conditions that cause differences between ATG and API recording:

When the operating system (including systems like X11) has a bug which causes other screen contents as should be there according to its specification. In this case, API-recording will fail to correctly reproduce screen content.
When the hardware fails, e.g. the graphics hardware. Then, API-recording will fail to correctly reproduce the actual screen content.
When the temporal resolution of ATG-recording is too low, and therefore misses important information, such as:
- position of the mouse pointer during a click,
- the GUI-element that is at the position of the mouse pointer during a click.
When the actual operating system configuration or functionality differs from the assumptions that are used when analyzing keyboard-data from ATG-recording.

Apart from the temporal-resolution issue, there is a common theme for all these conditions:

The common theme is that these cases all describe a problem in the realm of the basic computing infrastructure like:

operating system (bugs or configuration issues)
hardware malfunctions.

Of course we cannot ignore such problems in the design of a mission critical system.

So, let’s look closer on what is needed to correctly identify causes which are in the realm of the basic computing infrastructure:

Which method is better with identifying malfunctions in the basic computing infrastructure?

Below we analyze that for two cases which reflect different directions of data-flow.

Case 1: Malfunctions in Graphics output

API-recording will ignore malfunctions in the graphics output components of the operating system and hardware, while ATG-recording shows the output generated by such malfunctions.

What does this mean?

It means that ATG shows that something is wrong somewhere. But it does not show whether this “something” is in the realm of the basic computing infrastructure. It might also be a problem in an application.

How can you reliably recognize whether the problem is in your application or in the computing infrastructure for graphics output?

For a facts-based analysis you’ll need a log of the input and of the output of the graphics-component of the computing infrastructure. Then you can differentiate the following two cases:

The infrastructure gets bad input. Then the output will also be bad due to garbage-in garbage-out. In this case, the infrastructure is OK. The problem is in the application or in some component that feeds data to the application.
The infrastructure gets good input, but the output is bad. Then the problem is in the infrastructure.

ATG-recording can record the output of the graphics-infrastructure.
API-recording can record the input of the graphics-infrastructure.

Only if you apply them both at the same time, then you can reliably make this facts-based differentiation.

Therefore, the value of using both recording methods is:

No guesswork needed and therefore there’ll be a higher chance that you can identify the cause correctly. Therefore you’ll not risk repeating a potentially catastrophic event.
You can quickly decide which vendor to call for further in-depth analysis and fix.

Case 2: Malfunctions in processing Input

ATG-recording will ignore malfunctions in the input processing components of operating system and hardware, while API-recording shows the input generated by such malfunctions.

What does this mean?

It means that API recording shows that something is wrong somewhere. But it does not show whether this “something” is in the realm of the basic computing infrastructure. It might also be wrong input by the human or failure of an input device.

For a facts-based analysis you’ll need a log of the input and of the output of the human-input-component of the computing infrastructure. Then you can differentiate the following two cases:

The infrastructure gets bad input from the human or from the input device. Then the output will also be bad due to garbage-in garbage-out. In this case, the infrastructure is OK. The problem is in the realm of the human or the input device. It may also be an ergonomics issue.
The infrastructure gets good input, but the output is bad. Then the problem is in the infrastructure.

ATG-recording can record the input of the human-input-infrastructure.
API-recording can record the output of the human-input-infrastructure.

Only if you apply them both at the same time, then you can reliably make this facts-based differentiation.

Therefore, the value of using both is:

No guesswork needed and therefore there’ll be a higher chance that you can identify the cause correctly. Therefore you’ll not risk repeating a potentially catastrophic event.
You can quickly decide which vendor to call for further in-depth analysis and fix.

Conclusion

The question is not “either API or ATG recording”.

We actually need both.

The reason is that only a combination of API and ATG recording can reliably identify malfunctions in the basic computing infrastructure:
Failures in the operating system or misconfigurations of the operating system or hardware issues can only be identified reliably if you apply both recording-methods simultaneously and compare their recordings.

So the value of using both API and ATG recording is:

No guesswork needed. This increases your chance that you can identify the cause correctly. Therefore you’ll not be in danger to repeat a potentially catastrophic event.
You can quickly decide which vendor to call for further in-depth analysis and fix.
(Ever had to coordinate multiple vendors who just blame each other? Or when only one vendor is cooperative while the other vendor refuses to cooperate because it is “not his problem”.)
If an incident is so severe that the legal system gets involved, you’ll have more facts which will quickly clear the suspicion for most people and vendors.
Each method can compensate weaknesses of the other (such as the weakness of temporal resolution of ATG.)
Additional redundancy. If one recording method fails, you still have the recording from the other method. (As mentioned above, in probably more 95% of cases it does not matter which recording method is used. In those cases, each method can replace the other method in a redundant way.)

What do you think?

What do you think about this?
Please post your comments below or send me an email to christian.linhart@clinhart.com.

What we do

Christian Linhart Software offers a human-computer-interaction recording solution which works essentially with API-recording. This solution is currently available for operating systems which use X11, such as most Linux-based operating systems. We also offer to create recording solutions for any API, including APIs between applications and third-party libraries.

If you are interested in an (informal) audit of your recording infrastructure for incident analysis, please send an email to christian.linhart@clinhart.com.

If you are an application vendor and you want to be protected from accusations for bugs which are actually in the operating system or in libraries, please send an email to christian.linhart@clinhart.com to discuss possibilities to achieve that.

DI Christian Linhart GmbH

Screen-Recording, Mission Critical Software, Software Project Consulting