JSR-282: SI-200: "Soft Cost Enforcement" Summary --------------------- Provide an optional, alternative "soft" cost enforcement mechanism to complement the existing optional "hard" cost enforcement mechanism. Specification References --------------------- Ch6: Scheduling semantics: Cost Monitoring Ch6: Schedulable interface Problem being Addressed --------------------- Cost monitoring is an optional facility in the RTSJ that allows for the release of cost-overrun handlers, when a cost-overrun is detected, and which performs a "hard" asynchronous suspend of the errant SO, until such time as its cost is replenished - either through changing the cost budget, or through the occurrence of another release event (for periodics) While the above model is logically sounds in terms of preventing a cost overrun from affecting the schedulability of other tasks, most people are surprised that a hard suspend is used because common experience tells that asynchronous suspension in a multi-threaded environment is inherently dangerous because the suspended thread may hold resources that are needed by other parts of the application. This also makes things difficult for the VM implementation as it must ensure no key VM resources are held when a thread is cost-suspended. Further, it makes it difficult to mange other part of the system effectively. For example, if the VM uses a server thread for asynchronous event handlers then a hard suspend of a handler will suspend the server thread and so a replacement server thread must be enabled - the CPU cycles to do that must somehow be accounted for when doing feasibility analysis. To avoid the problems with hard enforcement the application must either not use it, or ensure that tasks to which cost enforcement is applied are totally independent of other tasks in the application. This restricts the applicability of hard cost enforcement to very limited situations - but arguably those situations are the ones where immediate suspension of an overrunning thread is critical to the applications correct behaviour. (Arguably most applications don't want cost enforcement as currently defined - the overrun is not critical.) Proposed Solution Summary ------------------------- People have suggested an alternative "soft" cost monitoring and enforcement model, which allows for notification of overrun without suspension, and which allows for safe-suspension (or safer anyway). The proposed solution is broken up into three parts. The parts can be adopted mostly independently, indivdually or combined. There seems to be no reason not to adopt Part 1 regardless of the rest. Part 1: Cost-overrun notification --------------------------------- This simple proposal allows a VM to implement the cost-monitoring part of the existing cost-enforcement model. When a cost overrun is detected, the cost overrun handler will be released, but nothing happens to the errant thread. This is still optional. This is useful as a monitoring tool during development, and deployment. Part 2: Safe-suspend -------------------- This part of the proposal adds a new API to allow the safe-supend/resume of a schedulable object. This is intended to be used by cost overrun handlers but is provided as a generally available API. The logical place for the new methods is in the Schedulable interface (see compatability discussion below) but we could also add static utility methods - say in the Scheduler class. Safe-suspension borrows from facilities in other languages (eg Ada) where suspension is facilitated by using the priority mechanism. To generalize for the RTSJ we will says it is based on execution elegibility. What a safe-suspend does is to lower the base execution elegibility of the "suspended" task so that it makes none, or limited progress. At the same time, if the suspended task holds a lock, then the normal priority inversion avoidance mechanisms will ensure progress of the task until the lock is released. The alternatives for how to lower execution elegibility are discussed below. Part 3: Soft Cost-enforcement Policy ------------------------------------ This part of the proposal suggests that rather than implement part 2 above, we instead allow an implementation to provide a cost-enforcement mechanism that uses the safe-suspend mechanism from part 2. In other words we take the existing cost enforcement semantics and the only change is the nature of the "suspend" when cost-overrun is enforced. Semantics --------------------- Part 1: The semantics are as per the existing cost-monitoring semantics, but do not affect the errant So. Part 3: The semantics will be based on the current semantics but replacing the current moving of the errant SO to state block_by_cost_overrun, to lowering the base execution elgibility. Part 2: There are two proposals for lowering the base execution eligibility: (1) The execution elegibility is lower than that of a conceptual idle task that is always eligble for execution. With this proposal a safe-suspended task gets no CPU cycles unless priority-inversion avoidance is involved. (2) The execution elegibility is set to the lowest value supported by the scheduler of the SO. With this proposal a safe-suspended task can consume CPU cycles if the system would otherwise be idle. In each case there is a bounded, finite time, that it may take from when the SO is suspended until that suspension request is honoured. Discussion Points --------------------- Part 1/2: 1. AsyncEventHandlers do not know for which "event" they are being released. If a cost-overrun handler is to safe-suspend an SO then it needs a reference to that SO. At present this must be established by the application code. One suggestion is that if a RichAsyncEventHandler is used as a cost-overrun (or deadline-miss) handler then the SO that trigerred the release of the handler is passed as the argument to handleAsyncEvent(Object o). Part 2: 1) There are pros and cons for both of the proposed models for lowering the base execution eligibility: - Below the idle task This is the simplest and cleanest model. From an analysis perspective it limits the CPU consumption of a suspended thread to the length of the critical sections it was in when suspended. This limits the impact of a suspended thread on other threads in the system. From an implementation perspective this model is harder to implement because (in general) there is no real idle task and no priority below that to be used. Hence this must be implemented by having the target self-suspend using conventional mechanisms. That is not diffcult, but it also requires changes in the monitor code so that priority inversion avoidance wakes the thread when needed, and blocks it again when a monitor is released. Note that if a SO is already executing at an inherited, or ceiling priority when suspended, then it must keep executing and not self-suspend until it would return to its base priority. - lowest available "priority" This is potentially trivial to implement because it just changes the base priority to an existing value. In terms of analysis however this model is weak because a suspended thread could execute when the system is otherwise idle and continually acquire monitor locks that result in interference with higher priority SO's. The main issue of contention is deciding which priority value to use: - the lowest real-time priority - a specific non-real-time priority - a new priority-level below that of non-real-time priority Using the lowest real-time priority is quite simple and is just a matter of understanding that even a cost suspended SO has higher execution eligibility than a plain Java Thread. Using a specific non-real-time priority is also quite simple. Though at present SO's are not allowed to take on non-real-time priorities. There is also contention over which value to use: norm or min Adding a new low-priority value is problematic for two reasons: a) it extends the number of distinct priority values that the RTSJ requires (28 RT, at least 1 for non-RT, plus 1 below-non-RT) b) typically the non-RT priority is actually a time-sharing scheduling class where priorities are just hints - so there is no absolute priority that is lower than non-RT, unless all priorities are taken from a real-time scheduling class. 2) Safe-suspension is only safer in one regard: use of monitor locks. If any other synchronization protocols are used, or any other shared resources acquired, then deadlocks/lockouts are still possible: if thread A is waiting for thread B to do "something" and thread B is safe-suspended, then A will also wait. It has been pointed out that a similar risk already occurs with the ability to deschedule a periodic thread. However in that case the risk is much smaller: a periodic thread is descheduled at the end of its task when waiting for the next periodic release, and you would not expect a thread to hold a shared resource when invoking waitForNextPeriod. So while the safe-suspension API has some potential use with cost-overrun enforcement, it is still a potentially dangerous enforcement mechanism. Further, its utility as a general purpose API, not tied to the cost-enforcement mechanism, is questionable. Part 3: Adding a new, optional policy for cost enforcement introduces additional complexity and raises a number of issues to address: - there must be a way to identify what policy, if any, is supported (a system property seems best) - we have to decide whether cost enforcement is a global policy (all SO's are handled by the same policy) or whether you can choose hard- or soft- enforcement on a per-SO basis. Global is by far the simpler (and as we expect few implementations to actually support more than one cost enforcement policy, it will be the default in most implementations.) If the policy is per-SO then we need a way to specify which policy to use - so that will require an additional property in ReleaseParameters. Compatibility Issues --------------------- Adding methods to an existing interface is usually prohibited as it breaks all existing classes that implement that interface. However, in the RTSJ we expect that most, if not all, implementations of Schedulable actually subclass one of the concrete classes: RealtimeThread or AsyncEventHandler. The possibility of a name clash with the new methods seems remote, and an incompatability that we are prepared to risk.