.LP
.nf

JSR-282 SI 22: 6.0 Allow the pinning of a
               schedulable object to a processor.
--------------------------------------------

Last Updated: 19 December 2006
-----------------------

Summary
-------------

Provide simple support to allow the pinning of schedulable objects to processors
in a multiprocessor (SMP) systems. A schedulable object pinned to one or more
processors means that only those processors can execute that schedulable object.

Specification References
-----------------------
5.2, 5.1, 11.3, 12.3

Problem being address
------------------------------

Currently, the RTSJ is silent on multiprocessor issues.
It attempts not to preclude multiprocessor implementations but provides
no direct support. The java.lang.Runtime class allows the number of
processors available to the JVM to be determined by the
"int availableProcessors()" method, but does not allow Java threads
to be pinned to processors. Furthermore, on some SMPs the number of processors may
vary during the execution of the programme.

JSR 302 anticipates that future SCS will contain SMP. (Currently, there does
not seem to be hard evidence that SMP have been used in SCS.
However, there is evidence that future systems will use SMP.
For example, LynxSecure Separation Kernel has recently been announced
http://www.lynuxworks.com/rtos/secure-rtos-kernel.php. )


Further Motivation
-------------------

Whilst many applications do not need more control over the mapping
of threads to processors in an SMP environment, there are occasions
when such control is important. They include:


1 To allow more flexible approaches to scheduling. -- Although
the state of the art in schedulability analysis for multiprocessor
systems continues to advance, the current state is such that
partition systems offer more guaranteed schedulability than global
systems. Quoting from Ted Baker (private communication) a known expert on
fixed priority scheduling  on multiprocessor ssytems

``"The choice between global and partitioned approaches to
multiprocessor scheduling is a conundrum. Setting aside pragmatic
questions about queue contention overhead and differences in cache
behavior, the theoretical results are equivocal.

In favor of global scheduling, it has long been known from
queueing theory that single-queue (global) FIFO multiprocessor
scheduling is superior to queue-per-processor (partitioned) FIFO
scheduling, with respect to average response time.

Apparently in favor of partitioned scheduling, the application of
well known single processor scheduling algorithms appears is
superior to the global application of those same algorithms for
some task sets with hard-deadlines.  For example, it is known that
all periodic implicit-deadline task sets with utilization below
$m(2^{1/2} - 1)$ can be scheduled on m processors using a
first-fit-decreasing-utilization (FFDU) partitioning algorithm and
and local rate monotonic scheduling, but Dhall's example shows
that there are hard-deadline periodic task sets with total
utilization arbitrarily close to 1.0 that cannot meet all
deadlines if scheduled on m processors using global rate monotonic
scheduling.  Dhall's example also applies to global EDF
scheduling, yet FFDU partitioned EDF scheduling is guaranteed up
to utilization $(m+1)/2$.

However, the supposed advantage of partitioned scheduling above
disappears if one considers hybrid global priority schemes.  The
Dhall example can easily be handled by the $EDF-US(1/2)$ or
$EDF(k_min$) schemes, in which top priority is given to a few
"heavy" tasks, as can any implicit deadline sporadic task system
with utilization up to $(m+1)/2$.  This is exactly the same bound as
for FFDU partitioned scheduling!

The experiments we performed on large numbers of pseudo-randomly
generated task sets were intended to provide some additional
evidence on which to base a choice between these two approaches.
From those experiments, statistically, the chance of being able to
satisfy all the deadlines of a randomly chosen periodic or
sporadic task set appears to be highest with partitioned
scheduling. In particular, the partitioned EDF scheduling appeared
to be the overall best performer in this statistical sense. At the
same time, there are certainly specific task sets where global
scheduling is more effective.

While the schedulability tests used in the experiments probably
could be improved, it remains unclear whether they can be improved
enough to erase the statistical margin of partitioned scheduling
with the available schedulability tests."

2. To support temporal isolation. -- Where an application
consists of tasks of mixed criticality level, some form of
protection between the different levels is required. The strict typing model
of Ada provides a strong degree of protection in the spatial
domain. The CPU budgeting facility provides a limit form of
temporal protection but at the expense of flexibility. More
flexible temporal protection is obtainable by allowing tasks in
each criticality level to be executed on partitions of the
processor set.

3. To obtain performance benefits. -- For example, dedicating one CPU to a particular process
will ensure maximum execution speed for that process. Restricting a process to run on a single
CPU also prevents the performance cost caused by the cache invalidation that occurs when a process ceases
to execute on one CPU and then recommences execution on a different CPU.


4. To be able to respond to dynamic changes to the processor
set. -- In a parallel computing environment the set of processors
allocated to an application may vary depending on the global state
of the system. An application may be able to optimize its
algorithms if it is informed when these changes in the processor
set occur.


Proposed Solution Summary
--------------------------------------

There is no POSIX standard in this area, although an initial proposal was
developed (see email exchange at the end of this SI). Consequently, it is
difficult to ensure that the API proposed here is implementable on
POSIX-compliant RTOS.  However, the POSIX proposal and the work
done on SMP Linux
(http://www.die.net/doc/linux/man/man2/sched_setaffinity.2.html)
suggests the following:

Add to RealtimeSystems class

public static java.util.Bitset availableProcessors();

public static boolean setAffinitySupported();

public static boolean affinityChangeNotificationSupported();

public static AsyncEvent ProcessorRemoved, ProcessorAdded;


public final static java.util.BitSet setDefaultAffinity(
           java.util.BitSet Processors)
           throws ProcessorAffinityException;

public final static java.util.BitSet setDefaultNoHeapAffinity(
           java.util.BitSet Processors)
           throws ProcessorAffinityException;

public final static java.util.BitSet getDefaultAffinity()
           throws ProcessorAffinityException;

public final static java.util.BitSet getDefaultNoHeapAffinity()
           throws ProcessorAffinityException;

Add new Exception
public class ProcessorAffinityException extends Exception;


In RealTimeThread class:

public java.util.Bitset setAffinity(java.util.BitSet Processors)
       throws ProcessorAffinityException;

public java.util.BitSet getAffinity();


In BoundASEH class:
public java.util.Bitset setAffinity(java.util.BitSet Processors)
       throws ProcessorAffinityException;

public java.util.BitSet getAffinity();

Semantics of Proposed Solution
----------------------------------------

The proposal is for a minimum interface that allows
the pinning of a schedulable object to one or more processors.
The challenge is to define the API so that it allows
a range of OS facilities to be supported.

The minimum functionality is for the OS
to allow the VM to determine how many processors are available for the execution of
the Java application.

The number of processors that the RT JVM is aware of is represented by a BitSet
that is returned by availableProcessors() in the RealtimeSystems class

public java.util.Bitset availableProcessors();
  // returns a bitset
  //   .length = the number of processors the VM can determine
  //             that will be available to it
  //   .cardinality = the number of processors allocated to JVM


For example, in a 64 processor system, the VM may be aware of all 64 or only a
subset of those. This is the length of the bitset. Of these processors,
the VM will know which processors have been allocated to it (either logical
processors or physical processors depending on the OS).

Each of the available processors is set to one in the bit set.
Hence, the cardinality of the bit set represents the number of processors
that the VM thinks are currently available to it.

The returned bit set is a new object that is allocated in the current memory area.


The API allows for systems that supports the dynamic
addition and removal of processors
from the set allocated to the VM.

If an OS does not support this facility
then the set will not dynamically change. An OS is also allowed to
maintain a set of logical processors allocated to the VM and to transparently
change its logical to physical mapping. Again, from the VM perspective the
set has not changed. However, it should be noted that this may have an
impact on the application if a) it is handling interrupts directly
on the processor or b) if the change undermines any feasibility analysis
assumptions.  For many RTSJ applications this may not be a problem.
In all of the above circumstances affinityChangeNotificationSupported() returns false.

If the OS does support dynamic changes to the processor set, the assumption
is that it will inform the VM of the changes (e.g. via a signal).
The VM will pass this information to the application via
the firing of the appropiate asynchronous event (ProcessorRemoved or ProcessorAdded)
declared in the RealtimeSystems class.
In this circumstances affinityChangeNotificationSupported() returns true.

An application can specify an ASEH to run in response to the firing of the above events.
The assumption is that the application
will maintain its own list of which SOs are mapped to which processors (logical
or physical). It will then undertake whatever reconfiguration it
deems appropriate.

Failure Model:

If a processor fails and the platform cannot transparently recover,
the VM abnormally ends (with assumed fails stop semantics).
Any recovery must be performed outside of the VM.
This is because a processor failure can leave the application and VM in
an inconsistent state (e.g. with a corrupt heap) from which it is unlikely
to be able to recover.

The API supports the setting of the affinity real-time threads and Bound ASEH
by the programmer. If the OS doesn't support this facility then all of the
associated operations, given below, throw UnsupportedOperationException, and any call to
setAffinitySupported returns false.


The default affinity can be set at run-time. Two defaults are provided:
one for heap-using SOs and one for no-heap SOs.

public final static java.util.BitSet setDefaultAffinity(
           java.util.BitSet Processors)
           throws ProcessorAffinityException;

public final static java.util.BitSet setDefaultNoHeapAffinity(
           java.util.BitSet Processors)
           throws ProcessorAffinityException;

The is no association maintained between the parameter passed and
the default. i.e. copy semantics - changes the parameter object at a later
stage will NOT result in a change of the default.

public final static java.util.BitSet getDefaultAffinity()
           throws ProcessorAffinityException;

public final static java.util.BitSet getDefaultNoHeapAffinity()
           throws ProcessorAffinityException;

The returned object is allocated in the current memory area.

The default "default affinity" is scheduler dependent, and must be documented.

The affinity of a specific SO can be set via the following

In RealTimeThread class:

public java.util.Bitset setAffinity(java.util.BitSet Processors)
           throws ProcessorAffinityException;


Changes to the parameter object does not change the affinity.
Changes only occur when the setAffinity method is called.
The actual affinity will be changed between the time the thread finishes
its current release and the time it starts it next release. It must
be complete by the time the next release starts.

Throws IllegalArgument if size of the given given bitset does not match
the current size of the bitset returned from availableProcessors or if
the given bitset is null.

Throws ProcessorAffinityException if a processor is unavailable.

public java.util.BitSet getAffinity();

This returns that last bitset that was set by a call to setAffinity
(or the default if there was no call).


In BoundASEH class:

public java.util.Bitset setAffinity(java.util.BitSet Processors)
           throws ProcessorAffinityException;

Throws IllegalArgument - as above
Throws ProcessorAffinityException - as above

public java.util.BitSet getAffinity();


Discussion Points
---------

1. With the current proposal, only bound asynchronous event handlers can
   be pinned, not unbound one.

2. With the current proposal, Java threads cannot be pinned.

3. I have kept it as simple as possible. One could imagine
   other methods that, for example, return an array of SO that
   are current pinned to a processor. At the moment, the assumption is
   that the programmer maintains any information it needs.

4. The current assumption is that there are no API changes needed to support
   the two priority inheritance protocols on an SMP (the priority assignment
   algorithm will change) and that WaitFree queues
   will also work in an SMP environment.