First draft.
Linked Process is a protocol for Internet-scale, general-purpose distributed computing. With an implementation of this protocol, any computing device with an Internet connection can contribute computing resources to a user-generated compute cloud.
Within the category of computing devices, Linked Process makes a distinction between resource consumers (devices making use of non-local computing resources) and a resource providers (devices offering computing resources)
Linked process is unlike Remote Procedure Call (RPC) and Web Service models, where the means by which resources are consumed are defined apriori by the deployer of the service (in the way of functions/methods that can be remotely executed/invoked). The benefit of this protocol is that computing resources that exist elsewhere on the Internet can be leveraged by code that was not developed/created/inteded by the owner/provider of those resources.
Linked Process was developed to address the following two distributed computing requirements: Internet-scale and general-purpose. These requirements imply yet more requirements which accompany their description below.
The introduction discussed the physical devices and the roles that they serve as the hardware infrastructure of a Linked Process cloud. This glossary provides a summary of the entites that yield the software instructure of a Linked Process cloud. The following entities make up the core of the Linked Process protocol
In short, a resource consumer maintains a villein that communicates with a resource provider's farm in order to spawn and compute with a virtual machine that leverages the computing resources of the provider.
Given that Linked Process provides an Internet-scale, general-purpose compute cloud, there are many use cases. Before listing a collection of general use case scenarios, the following subsections will present the possible interactions that can occur in a Linked Process cloud. These interactions are specified with protocol examples. Once the specifics of the protocol are understood, the end of this section will discuss more generalized scenarios in which Linked Process can be useful.
This is the list of the XMPP stanzas (i.e. packets) that MUST be supported by a farm implementation.
The farm specification for <presence/> is built on the specification as defined by the Instant Messaging XMPP specification
There are three types of non-subscription-based <presence/> stanzas that a farm produces.
A <spawn_vm/> element is wrapped by an <iq/> element. The purpose of <spawn_vm/> is to have a farm create a new virtual machine. It is through a virtual machine that a villein is able to access the computing resources of the physical device that hosts the farm (i.e. the resource provider). A virtual machine will maintain a state throughout a villein "session" with that virtual machine. The only way to alter the state of a virtual machine is through submitting jobs and updating its variable bindings
A <submit_job/> element is wrapped by an <iq/> element. The purpose of <submit_job/> is to send code (i.e. expressions, statements, instructions) to a virtual machine for execution (i.e. evaluation, interpretation). The expression SHOULD be respective of the virtual machine's language (i.e. the virtual machine's species). If they are not, then evaluation errors SHOULD occur. The expression submitted through a <submit_job/> stanza can be short (e.g. set a variable value, get a variable value) or long (e.g. define a class/method, execute a long running body of statements). The submitted expression is called a job in Linked Process and is assigned a job_id as specified by the <iq/> id attribute value. That is, the staza id of the <submit_job/> is the job's id.
The virtual machine's state exists over the villein's session with the virtual machine. Thus, note the result of the following <submit_job/>.
A <ping_job/> element is wrapped by an <iq/> element. The purpose of <ping_job/> is to determine the status (i.e. progress, state) of a previously submitted <submit_job/> stanza (i.e. job) that has yet to complete.
An <abort_job/> element is wrapped by an <iq/> element. The purpose of <abort_job/> is to cancel (i.e. quit, stop, halt) a previously submitted, yet not completed <submit_job/> stanza (i.e. job).
A <manage_bindings/> element is wrapped by an <iq/> element. The purpose of <manage_bindings/> is to allow a villein to get and set variables in the variable space of a virtual machine. The definition of the "variable space" is up to the implementation of the virtual machine. In general, this is the set of all global variables for the virtual machine.
After the previous <manage_bindings/> stanza has been processed by the virtual machine, it is possible to use the bindings in a statement. For example, in JavaScript
var fact = name + " knows josh and peter";
will set fact to the value "marko knows josh and peter" as well as make it an accessible binding.
A useful aspect of <manage_bindings/> is that it can be used to track the state of a variable during the execution of a job. For example, suppose the following job is submitted to a JavaScript virtual machine.
var x = 1.0;
while(true) {
x = x + 0.0001;
}
This job will continue indefinitely (or until it is timed out by the virtual machine). However, during its execution, it is possible to determine the current state of x using <manage_bindings/>. Each get-based <manage_bindings/> call should return a larger x value.
A <terminate_vm/> element is wrapped by an <iq/> element. The purpose of a <terminate_vm/> is to shutdown (i.e. quit, exit, halt) the virtual machine. Upon termination, the virtual machine will lose its state and will no longer be able to be communicated with.
A farm uses the XEP-0030 XMPP extension for allowing villeins to discover what features and permissions a farm and its spawned virtual machines support.
The <identity/> of a farm MUST be of category="client" and type="bot". The name attribute is up to the implementation.
The following <feature/>s MUST be supported by a farm:
The http://linkedprocess.org/2009/06/Farm# <feature/> denotes that the XMPP client is in fact a farm.
For presenting permissions, configurations, and statistics, a farm uses the data forms XEP-0004 XMPP extension in its disco#info response. The following list of <field/> variables (var) are presented below with their requirements specification. What is published by the farm's data form MUST be what is implemented by the farm and its spawned virtual machines. In other words, the data form MUST be consistent with the behavior of the farm and the virtual machines
Field | Type | Option | Status | Label |
---|---|---|---|---|
farm_password | boolean | false | REQUIRED | Denotes whether a password is required to spawn virtual machines. |
ip_address | list-single | false | RECOMMENDED | The IP address of the device hosting the farm. |
vm_species | list-single | true | REQUIRED | The virtual machine species supported by the farm. |
vm_time_to_live | list-single | false | REQUIRED | The number of milliseconds for which a virtual machine may exist before it is terminated by the farm. A value of -1 means infinite. |
job_timeout | text-single | false | REQUIRED | The number of milliseconds for which a virtual machine may exist before it is terminated by the farm. A value of -1 means infinite. |
job_queue_capacity | text-single | false | RECOMMENDED | The number of jobs which a virtual machine can hold in its queue before it rejects requests to submit additional jobs. A value of -1 means infinite. |
max_concurrent_vms | text-single | false | RECOMMENDED | The number of concurrent virtual machines which the farm can support before it rejects a request to spawn a new virtual machine. A value of -1 means infinite. |
farm_start_time | text-single | false | RECOMMENDED | The xs:dateTime at which this farm was started. |
read_file | list-multi | false | RECOMMENDED | The directories/files that virtual machine's have read access to. |
write_file | list-multi | false | RECOMMENDED | The directories/files that virtual machine's have write access to. |
delete_file | list-multi | false | RECOMMENDED | The directories/files that virtual machine's have delete access to. |
open_connection | boolean | false | RECOMMENDED | Whether a socket connection is allowed by the virtual machine. |
listen_for_connection | boolean | false | RECOMMENDED | Whether a socket connection can be listened for by the virtual machines. |
accept_connection | boolean | false | RECOMMENDED | Whether a socket connection can be accepted by the virtual machines. |
perform_multicast | boolean | false | RECOMMENDED | Whether an IP multicast can be initiated by the virtual machines. |
The previously defined interactions can be used in concert to perform some distributed computing task. A list of general use cases are presented below.
The previous list has a collection of more specific application scenarios that are itemized below. Please note that this list is not intended to be exhaustive and provides points of inspiration for the development of more use cases for Linked Process.
This is the list of the stanzas that MUST be supported by a registry.
A registry makes use of <presence/> stanzas for determining the availability of farms on a countryside. In order to monitor <presence/> stanzas emanating from a countryside, a countryside MUST subscribe to and be subscribed from a registry. In Instant Messaging, this is handled using this sequence of <presence/> communication.
After a subscription pairing has been established, the registry will then monitor all <presence/> stanza emanating from the countryside of the subscribing XMPP client. When a registry receives a <presence type="available"/> stanza from a farm (determined through disco#info), then the registry will add the countryside (the bare JID of the farm) to its index. When the registry receives a <presence type="unavailable"/> stanza from a farm, it will only remove that farm's countryside from its index if no other active farms are available at that countryside. In short, a registry only publishes countrysides (i.e. bare JIDs), not farms (i.e. fully-qualified JIDs). However, the determinant for publishing countrysides is the availability of active farms based off the bare JID countryside.
A registry uses the XEP-0030 XMPP extension as the communication protocol for publishing farm-active countrysides. The registry's index is provided to any XMPP client performing a disco#info query. The <item/> elements denote countrysides. For example, <item jid="lanl_countryside@lanl.linkedprocess.org"/> denotes that there is at least one active farm at lanl_countryside@lanl.linkedprocess.org.
A registry uses the XEP-0030 XMPP extension for allowing villeins to discover what features a registry supports.
The <identity/> of a registry MUST be of category="client" and type="bot". The name attribute is up to the implementation.
The following <feature/>s MUST be supported by a registry:
The http://linkedprocess.org/2009/06/Registry# <feature/> denotes that the XMPP client is in fact a registry.
The error codes associated with the http://linkedprocess.org/2009/06/Farm# namespace are fairly complicated as there are various states that a farm and its virtual machines can be in. The following error codes are summarized in the table below. For detailed information about mapping legacy error codes to XMPP-style error types and conditions, refer to Error Condition Mappings. Implementations SHOULD support both legacy and XMPP error handling.
Code | Type | Condition | Specific | Element | Purpose |
---|---|---|---|---|---|
400 | Modify | bad-request | malformed_packet | all | The provided stanza was not properly constructed. |
401 | Auth | not-authorized | wrong_farm_password | spawn_vm | The supplied farm password was incorrect. |
409 | Cancel | conflict | internal_error | all | An error internal to the farm has occurred. |
503 | Cancel | service-unavailable | farm_is_busy | spawn_vm | The farm is out of resources and can not spawn a new virtual machine. |
400 | Modify | bad-request | species_not_supported | spawn_vm | The provided virtual machine species is not supported by the farm. |
404 | Cancel | item-not-found | vm_not_found | all except spawn_vm | The virtual machine id does not point to an existing virtual machine. |
503 | Cancel | service-unavailable | vm_is_busy | submit_job | The virtual machine has too many jobs in its queue and will not accept anymore. |
400 | Modify | bad-request | evaluation_error | submit_job | The supplied job expression threw an error in the virtual machine. |
403 | Auth | forbidden | permission_denied | submit_job | The supplied job expression violated a security permission in the virtual machine. |
409 | Cancel | conflict | job_already_exists | submit_job | The supplied job identifier already exists in the virtual machine. |
408 | Cancel | request-timed-out | job_timed_out | submit_job | The submitted job timeout and is no longer executing. |
404 | Cancel | item-not-found | job_not_found | ping_job and abort_job | The queried job identifier does not point to an existing job. |
405 | Cancel | not-allowed | job_aborted | submit_job | The submitted job was canceled. |
400 | Modify | bad-request | unknown_datatype | manage_bindings | The provided datatype is an unsupported datatype. |
400 | Modify | bad-request | invalid_value | manage_bindings | The provided value can not be converted according to the provided datatype. |
This specification does not stipulate values of the XMPP <text/> element associated with the foregoing error conditions.
Linked Process can be a very dangerous protocol if implemented incorrectly. The reason for this is that foreign code is executed on devices that are unaware of the intention or purpose of the code. Thus, if farm and virtual machine implementations do not correctly respect the permissions stated by the farm (as specified in the disco#info of the farm), then it is possible for malicious or poorly written code to destroy the integrity of the executing device (i.e. the resource provider). Moreover, Linked Process devices can be utilized for nefarious purposes.
The following is a list of potentially dangerous behaviors that can be executed if a Linked Process farm and virtual machine is not implemented correctly and/or exposes permission that are too "loose."
It is strongly RECOMMENDED that the implementors of a Linked Process farm and virtual machine have considerable knowledge in the area of software security and operating system engineering.
This document requires no interaction with the Internet Assigned Numbers Authority (IANA)
The following namespaces are defined by Linked Process:
The protocol documented by this schema is defined in
Linked Process XEP-XXX: http://www.xmpp.org/extensions/xep-xxx.html
]]>
An acknowledgement of contribution must be given to Peter Neubauer (Neo Technology). Peter contributed to the testing of an implementation of the protocol that was developed in concert with this specification. Through the testing process, many issues were identified, rectified, and incorported into the protocol specification.