<p>The child process connects to the process that started it - i.e. agent connects to controller, runner connects to agent.</p>
+<p>The parent process must have the listening socket ready before it starts the child process.</p>
+
<!-- TODO: authentication, TLS -->
<h2>Instance Identification</h2>
<p>Each instance in a controller-runner connection network is identified by a UUID that is dynamically generated when that instance starts.</p>
+<h2>Timeouts and Limits</h2>
+
+<p>Each instance can configure timeouts and other limits individually, currently there is no mechanism to inform other instances about these settings. It is recommended to set them to the default values listed here.</p>
+
+<h3>Connection Timeout</h3>
+
+<p>When a connection is made between two instances, the connection might fail. There are two sides to this:</p>
+
+<p>The parent process starting the child process has to wait for the child process to make a connection. The child process might fail to start properly or might fail to make the connection (e.g. due to network problems). The parent should time out on waiting for the connection eventually.</p>
+
+<p>The child process might have problems connecting to the parent.</p>
+
+<p>The child process should stop trying to connect after a default timeout 2 minutes. During this time the child process may start several connection attempts. With local sockets or when connecting to localhost only one connection attempt should be made.</p>
+
+<p>The parent process should wait a bit longer, a default timeout of 3 minutes is recommended. An unsuccessful handshake (e.g. wrong authentication) does not abort the timeout. If the parent detects that the child process has exited it should abort waiting for a connection.</p>
+
+<h3>Response Timeout</h3>
+
+<p>When sending a request the sender has to wait for a response. A response indicates that the primary action of the request was successful - i.e. the message was understood, has been confirmed to be valid, and if it can be executed instantaneously the response should contain the result, if not the response indicates that the real action has been scheduled and/or started and contains information that allows the receiver to track execution of the action.</p>
+
+<p>The response timeout starts when the full message has been sent/received and ends with the first byte of the response sent/received.</p>
+
+<p>The response timeout should be the same on both sides of the connection. A default timeout of 2 minutes is recommended.</p>
+
+<h3>Intra-Message Timeout</h3>
+
+<p>Parts of one message may arrive in different packages with some delay between message parts. A timeout of at least 1 minute is recommended for receivers. Implementing this timeout is optional if the receiver does not block on receiving messages.</p>
+
+<p>Senders must not delay between parts of a message - but may block on operating system calls while sending. I.e. a message should only be sent once it has been completely assembled.</p>
+
+<h3>Job Timeout</h3>
+
+<p>Jobs should not time out. If a runner can determine that a job will not finish (if at all, this should be configured by the author of the tests to be executed) it may abort the job and signal the job abortion to its parent process.</p>
+
+<h3>Message Size</h3>
+
+<p>Receivers may implement an appropriate maximum message size for receiving messages. Optionally a maximum for outgoing messages may be implemented. Both are ultimately limited by the available memory and the message format - which allows a maximum of 4GB to be transmitted in one message.</p>
+
+<p>Any limit should not be below 5MB.</p>
+
+<p>The following size limits are recommended:</p>
+
+<ul>
+<li>Message sent downstream (GUI->Controller->Agent->Runner): 20MB
+<li>Message sent upstream (Runner->Agent->Controller->GUI): 100MB
+</ul>
+
+<p>Implementation of this limit is optional.</p>
+
+<h3>Open Transactions</h3>
+
+<p>An instance may limit the amount of simultaneouly open transactions. It is recommended that this limit is not set below 5 open transactions.</p>
+
+<p>If implemented, this limit should be implemented separately for each connection and each direction of the connection.</p>
+
+<p>Implementation of this limit is optional.</p>
+
<!-- =========================================================== -->
<h1>Message Encoding</h1>
<h2>Generic Functions</h2>
-<p>Whenever a Runner receives a message that it does not understand, it can reject it with the following error response:</p>
+<p>Whenever a Runner receives a message that it does not understand, it can reject it with a response with Function set to "RejectMsg", with the following keys in its body:</p>
-<pre>
-<RejectMsg reason="unknownmessage" message="ProvokeError" reqid="1234z" originalsize="456"/>
-</pre>
+<ul>
+<li>reason - (string) the reason why the message was rejected
+<li>originalsize - (integer) the full message size of the request (optional)
+<li>transaction - (UUID) transaction ID of the offending message (optional)
+<li>missing - (string) if data of the message was incomplete: missing keys
+<li>wrongtype - (string) if data of the message was wrong: keys of malformatted data
+<li>range - (string) if data contained invalid or out of range values: keys of offending data items
+</ul>
+
+<p>If the offending message was a notification, the RejectMsg error may be sent as a new notification with the transaction key set to the original transaction ID. If a response was faulty, no message is sent.</p>
<p>The "reason" attribute contains an explanation for the rejection. The following reasons exist:</p>
<ul>
-<li>"unknownmessage" - the message was valid XML, but the document element of the XML was not known</li>
+<li>"unknownmessage" - the message was valid, but the function was not known</li>
<li>"permission" - the message was valid, but the sender is not allowed to send it</li>
-<li>"notxml" - the message was not valid XML</li>
+<li>"data" - data was missing or the wrong type</li>
+<li>"parser" - the message could not be properly decoded</li>
<li>"toobig" - the message was too big and has been discarded, it is recommended that runners limit messages to no less than 5MB</li>
<li>"overflow" - there are too many open requests, it is recommended that runners can buffer at least 5 messages before they start rejecting them, some messages should not be buffered but instead be worked on immediately</li>
</ul>
-<p>If the offending message was valid XML the "message" attribute should contain the name of the document element. If this element contained an attribute "reqid" than this attribute should be replicated in the "reqid" attribute of the RejectMsg tag.</p>
-
<p>The "originalsize" attribute should contain the size of the offending original message.</p>
<p>This notification may be extended later on to include more information. Implementations conforming to this version of the spec must not interpret any attributes or content not specified here, but must accept that those may be present.</p>