iee30nov

A Hybrid Approach to Software Interworking Problems:

Managing Interactions

between

Legacy and Evolving Telecommunications Software

Muffy Calder

Department of Computing Science

University of Glasgow

muffy@dcs.gla.ac.uk

Evan Magill

Communications Division

Department of Electronic and Electrical Engineering

University of Strathclyde

e.magill@comms.eee.strath.ac.uk

Dave Marples

CITEL Technologies Limited

Wheatcroft Business Park, Landmere Lane

Edwalton, Nottingham

dmarples@iee.org

Abstract

Interworking problems between software services arise for a number of reasons; they may occur because the services, or their component parts, have evolved to fulfill different roles from the originally intended ones, resulting in conflicting requirements. Alternatively, the services themselves may be undocumented, poorly understood, or required to interwork with services from third party legacy systems. Interworking problems are difficult to predict and detect, as well as to resolve in an acceptable manner. The problems are particularly acute in the telecommunications domain with its supplementary concerns of real-time, distributed control and data, high reliability, rapid evolution, and a deregulated market that is encouraging multiple service providers.

Approaches to interworking problems may be characterised as being either on or offff-line or on-line, formally based or pragmatically/experimentally based. While current practice within the telecommunications domain has developed numerous approaches have been developed, there have been very few attempts to , it is nearly always to the exclusion of approaches which combine both formally-based and on-line techniquesapproaches to produce a technique. Experience with the other combinations has led us to believe that those combinations they are not sufficient to deal with the interworking problems of complex, evolving software systems, as common in telecommunications. This is particularly the case for systems which also have to interwork with third party and legacy code: a hybrid approach which combines both on-line and formally based approaches promises to address problems which have proven very difficult to resolve with other techniquesis required.

In this paper we outline a hybrid approach based on a transactional technique with rollback capability. We also describe oOur long-term research programme is to integrate this with formal approaches,; t. This is which is still at a preliminary stage. While the approach we describe is applied specifically to telecommunications services software many aspects of our approach are applicable in other software domains.

1. Introduction

Any system based on the concept of providing multiple, concurrent, software applications or services has the potential to suffer a range of interworking problems between those services. In particular, aspects of the requirements of various services may conflict with each other, e.g. service goals, user views and expectations, and non-functional aspects such as billing or quality of service may all interact. Technical aspects of the implementations and the peripheral devices may also be another source of interworking problems. Managing these conflicts;, detecting them and resolving them in a coherent way, is a major challenge when designing a service-based system. Developers aim to handle, or at least anticipate, most interworking problems at the design stages but this is not always possible for several reasons. Furthermore, the problems are further exacerbated when the software systems involved are evolving and/or have to interface with third party or legacy systems. As the system evolves, assumptions about goals, views and semantics may become invalid, thus leading to inconsistencies and conflicts which were not originally present. IFurther, if third party or legacy software is involved, then assumptions are often made. . In some cases these may even be naïve and incorrect.

When conflicts between services are identified it is usually extremely difficult, for logistical or political reasons, to resolve them by modifying any third party or legacy component. For this reason techniques are required which deal with interworking problems both at design time and as systems evolve. These techniques should , and which do not involve require modifications to the existing service.s.

All of these issues are particularly acute in the telecommunications domain, where the provision of services is a major growth industry and the players compete largely on the software functionality of their systems. As more and more services are developed in software, and a deregulated market encourages multiple service providers, the potential for undesirable interactions between one or more services is growing exponentially. While such interactions are notoriously difficult to manage, the associated problems are further exacerbated by the additional constraints imposed by legacy. Moreover, while telecommunications software does not exhibit unique characteristics, in comparison with other domains, it does exhibit nearly all the many problematic characteristics of ones;: largesize, real-time, distributed control, distributed data, high reliability, and rapid changes.

In the next section we give a brief overview of the background: telecommunications services and features, legacy systems, feature interactions, and off-line and on-line techniques for detecting and resolving interactions. Section 3 contains an overview of a transactional, on-line approach, and section 4 describes a hybrid approach based on this transactional approach. The subsequent section outlines some of the challenges and open problems associated with the proposed hybrid approach. The final section contains our conclusions.

2. Background

2.1. Telecommunications Services and Features

The telecommunications services we are concerned with are the (software) telephony services programmes that control the progress of calls and connections. The software responds to telephony events, such as off-hook and digit entry, with signals to control telephones, such as the delivery of ringing current or dial tone. In telephony terms, the basic service is the simple telephone call and this service is often referred to as POTS (Plain Old Telephone Service). Features and Services are increments to this service giving the user extra functionality. Common examples of features are Call Waiting, Outgoing Call Screening, Call Forwarding, and Ringback When Free. While a service is self-sustaining, a feature is not and must be built upon a service. Of course the service may not be simple telephony. In a broadband network, a service may be a videoconference, with a feature adding extra capability such as the power of veto for the conference chair. Even in a simple telephony environment Automatic Call Distribution (ACD) and Hotline would both be good examples of alternative base functionality to POTS.

In Public Switcheding Telephone Networks, or PSTN's, POTS and its associated features have been a hugelarge software development exercise running into millions lines of code which is now largely completestable but to which new functionality is continually being added. The result is demanding environments where not only are the programs very large, but they are also real-time, subject to rapid change, and where an extremely high degree of reliability is required. Traditionally, this call processing code runs on each telephone exchange, where a large network consists of hundreds of exchanges. The exchanges are referred to as Stored Program Control (SPC) exchanges. In a large network, introducing a new feature may require changing the software on every SPC exchange. This is fraught process for an individual exchange, and a slow and expensive process for a network.

The Intelligent network or IN has been designed to overcome these limitations. Features are implemented as discrete components called Service Logic Programs or SLP's. Although the term service is used, SLP's alter the behaviour of a basic call and so we view them as features. Any updates to the overall system functionality are achieved by loading the increment, or SLP, not by modifying the (usually legacy) software implementing the basic service.

In private/enterprise systems, switching systems are increasingly sold on the basis on the features and services that they provide, the hardware providing largely the same functionality between manufacturers. Customers require specific telecommunications functionality which is tailored to way their business operates and suppliers increasingly meet these requirements by connecting the switch to some external processing capability which then influences the control of calls. Mechanisms for doing this are now offered by all major switching manufacturers in some form or another and these are used by software houses to offer products to support feature creation. Microsoft's TAPI, Novell's TSAPI and IBM's CallPath are probably the best known. The move to bespoke functionality can reasonably be expected to continue and even accelerate over the next few years because of this trend and indeed an entire industry has sprung up to support it.

In the PSTNS's, vendors often employ thousands of software designers over numerous sites and indeed numerous countries. The features and SLPs are often developed independantlyseparately, with limited co-operation between the groups. In practice, these features are often incompatible: one feature interferes with the operation of another. Historically, these difficulties, often found by system testers, were christened Feature Interactions and they have been observed in many environments, from private systems (Private Automatic Branch Exchanges or PABX) through the SPC POTS environment to the latest IN [CL97]. It is predicted that future broadband services will also experience interaction difficulties [TMK97] and there are very few reasons to believe that theyit will not. While the latter is strictly service interaction, feature interaction and service interaction are often used synonymously, and we shall do so here.

Although theseThe difficulties caused by Feature Interaction have been known for decades but have been contained, in a closed, homogeneous environment with limited features, particularly in the public domain, the problem has not been critical. This has been helped by the tight approval regimes in place around the world. Changes in the commercial, technological and regulatory structure of telecommunications are creating situations where this tight control is no longer possible. Multiple service providers are provisioning individual networks; the number of public service providers is increasing and their networks are being connected together to form heterogeneous inter-networks which contain features from multiple suppliers that have never been tested together but which need to work in concert. Private systems also form part of this "super-network" as the connection between public and private systems becomes capable of carrying more information and the demarcation between them starts to blur.

2.2. The problems of Legacy Systems

The genesis of the Stored Program Control (SPC) telecommunications switch was a haphazard affair, starting as an attempt to address the spiraling costs of automation that manufacturers were suffering in producing increasingly complex mechanical exchanges. The telecommunications industry was one of the first adopters of embedded software systems in an attempt control costs and offer benefits to users. Products were developed while design techniques to help were developed in parallel.

Object orientation, modularisation and any number of other tools wereas simply not available at the time that many of the switches now in the field were conceived. The software in these systems has been changed and developed on a continuous basis over the last fifteen to twenty five years as new requirements have been introduced in a highly competitive market. Consequently, current switching systems contain a great deal of closely interwoven, complex, code with a high degree of inter-module coupling. The functionality of these systems is orders of magnitude greater than was originally intended and they are increasingly difficult to maintain and enhance.

Although manufacturers have proactive programmes to review their existing software some of these systems are extremely fragile and it is frequently difficult to improve software without completely rewriting it. It is virtually impossible in many switching systems to remove a feature from the software due to its close interaction with other features -- it is much easier to just disable it and leave it, dormant. Such switching systems, which for reasons of economic necessity need to be integrated with new product developments, have been termed Legacy Systems. They present a very particular set of constraints and changes to the functionality of these existing products must be seen as developments in the marketplace - there is no point in rewriting switch software unless it brings some definite benefit to the user. This situation should not be considered a criticism of the industry, it is simply an inevitable consequence of the parallel development of Software Engineering techniques and the software itself. Indeed, it is fair to say that the telecommunications industry has been one of the main proponents of the development of advanced software engineering techniques and many of the tools which we now take for granted owe their existence to the funding and impetus provided by these organisations.

2.3 Feature Interactions

Feature interactions may involve any number of features, users and/or network components. We define them to be the modification of the operation of any feature that can be attributed to the presence of another feature within the operational environment. They impact on all phases of the software lifecycle and can be categorised in a number of ways: logical, network or implementation [CGLNSV94, AG95]. Another dimension to consider is the number of users and network components involved, for example, we can have it is possible to identify Single User Single Component (SUSC) or Multiple User Multiple Component (MUMC) interactions. Common sources of interaction are conflicting requirements, timing problems, limited signal capabilities, and features residing in different parts of the network. At some level of abstraction, an interaction may be regarded as the consequence of mistaken assumptions: either functional behaviour, environment, or implementation choices.

We do not attempt to enumerate causes or characterisations of interactions exhaustively here, but give two typical examples.

An example involving just one user is provided by thean interaction between Call Waiting and Call Forwarding When Busy. Suppose a customer has invoked both features and is engaged in a call. Another call arrives; should that second call be forwarded or will the customer hear the call waiting alert tone? Note that the question only arises because of an emergent behaviour, that is a behaviour which has been createdemerged because of the combination of both features; there is no problem when each feature is present on its own. Because of this emergent behaviour, the answer cannot be found by examing the intent behind each of the features, in isolation.

Another example concerns users A and B who have invoked Call Waiting and Ringback When Free, respectively. Suppose that B calls A, when A is engaged in a call to C. Because of Call Waiting, A appears to be idle to B; so, will B's Ringback When Free work as expected? Many more confusing scenarios are possible from just one feature: for example, what happens in a chain of users invoking a feature like Call Waiting when some users go on-hook while others are on hold? We also note that interactions may also provide useful new functionality (e.g. a 999/112 (122 for mobile networks) call cannot be terminated by the caller).

Of course not every interaction is undesirable. For example, the interactions between most features and POTS are desirable; we want the system to behave in a different way when new features are appended. While a rather simplistic example, it is important to note that interactions are not necessarily bad and have a negative or restrictive resolution.

It is instructive to examine how interactions have been resolvedmanaged in these current software switching systems as they were developed. Until the publication of the landmark feature interaction benchmark paper by Bellcore, as an internal memorandum in [BCDGHL88], feature interaction was never really identified as a separate subject; interactions were considered as just one more type of bug in a system. Due to this perception of the problem as being a class of bugs, coupled with the fact that documentation procedures were not exactly airtight, most interactions were resolved in an ad-hoc fashion, with very little in the way of formal documentation or traceable procedures to support the process. The designer, implementer and tester would all uncover interactions between features and these would be resolved during the early life cycle of new code. Manufacturers tacitly understood that this process occurred and they scheduled long ‘integration’ phases prior to formal product testing, during which the system testers would uncover the interactions between features created by different engineers. Effectively, interaction resolutions were embedded in the code and would be undocumented. Even today, ask a manufacturer what happens when Feature A and Feature B are present and active on a call and there is a very good chance that the only way they will be able to tell you is by creating the test case on a model switch!

The regulatory and commercial changes described earlier mean that the old system development techniques with their long testing cycles and proving phases can no longer be applied; new means of arbitrating between features need to be found. State of the art techniques fall into two camps: off-line and on-line, as described in the following two sections.

2.4. Off-Line Feature Interaction Detection and Resolution

Typically, but not exclusively, off-line techniques are primarily concerned with interactions which may be regarded as logical interactions, i.e. those with conflicting requirements or goals. Formal, abstract models, with automated tool support, are particularly helpful here. They are used for interaction detection through simulation, property analysis, and test scenario generation. Implementation and modelling are ofen contemporaneous, and given the current commercial climate, the former may even precede the latter.

Numerous approaches have been developed, (e.g. see [FIW97, FIW98]) the majority are based around the key concept of a model of feature/services and properties which those features/services should and should not fulfill. A variety of formalisms have been used for specifying both models and abstract properties, including finite state machines and their extensions (e.g. SDL), labelled transition systems, petri nets, process algebras, state based notations (e.g. Z, Object-Z), message sequence charts, classical, temporal and non-monotonic logics [AA97, AC97, CP95, LL95, Vel94, Tho97CM98, CM98 Tho97]. Approaches differ in the degrees of rigour, mathematical framework, and automated reasoning assistance, but most can be summarised by Figure 1.

Features/Services motivate Properties in

requirements, formal logic or

intentions and high-level language

assumptions

satisfy

Feature/Services Underlying model

in high level modelling denotes e.g. communicating

language finite state automata

Figure 1: Feature Interaction Detection Framework

While some approaches to (off-line) feature interaction detection depend on generic definitions of interaction, for example, as logical inconsistency, or non-determinism arising from overlapping pre-conditions for processing an event, the "classical" approach to detecting feature interactions can be expressed as the following. Let P₁ be a property, F₁ and F₂ be features, and S₁ be a service. If P₁ is satisfied by S₁with F₁, then P₁ should also be satisfied by S₁with F₁ and F₂; i.e. the properties of F₁should be preserved in the presence of other features. If it is not, then there is an interaction. The property may be either domain specific, e.g. if a user attempts to connect to another user who is already engaged in a call, then the former will hear an engaged tone; or it may be generic, e.g. the system will not deadlock or livelock, or all states are reachable.

This rather elegant formulation of interaction detection hides a multitude of difficulties.

First, the level of abstraction, both for the model and the properties is absolutely crucial. An inappropriate level, particularly for the model, can render the results of the analysis unusable. Most approaches are based upon some concept of a network subscriber's view, or perception, of the service, but this necessitates agreement about service intentions and, to some extent, user intentions when employing a feature/service. An IN approach, with the notion of (e.g. global functional plane/distributed functional ‘planes’) of functionality, at least provides some guidance about overall architectural concerns. Abstraction and/or completeness is also crucial at the property level. For example, complete descriptions of some properties give rise to the frame problem (e.g. when describing the effects of a feature, do we also need to describe the components and attributes which are not affected?).

Second, there is no single model to consider, we need to be concerned with particular scenarios, or configurations of network subscribers. For example, one scenario consists of one subscriber with Call Waiting, and another with Ringback When Free. But with just n features, and two network subscribers, there are O(n²) configurations. Of course this is further exacerbated by the distinction of originating and terminating call behaviour. Moreover, two subscribers are clearly not sufficient to reveal all interactions; for example, the full impact of Call Waiting is only realised when there are two call legs, or sessions, between at least three subscribers. Meta level reasoning about configurations can help to overcome the combinatorial explosion, if one can identify and filter out configurations which are certain not to interact. For example, the EUROSCOM project P509 [Kim97] incorporates a "filtering" stage, where pairs of features are analysed for the potential to interact (e.g. they have similar pre-conditions and event triggers); those pairs without such a potential are discarded from further consideration.

Third, how does one combine a service with one or more features? Many features require a change to the underlying service, in order to be implemented, or even modelled. How are these changes incorporated into the model and the properties? Is the combining operation compositional, i.e. does it matter if F₁is added before F₂, or vice-versa?

Fourth, when the models are based on event driven state transitions, as is quite natural for this domain, the underlying state spaces quite quickly explode. One reason for this is that it is the product spaces which must be explored, depending on the number of subscribers involved. While recent advances in automated reasoning tools, particularly with respect to model checking, are promising, nearly all such approaches are compromised, in one way or another.

At Glasgow we have been developing formal models of network subscribers, as processes which are modelled by behaviour trees, which communicate and synchronise (see [Tho97], [CM98]). We have employed LOTOS and Promela [LOTOS, Hol93, Hol95] for defining the high-level processes, LOTOS, the -calculus and linear temporal logic [Sti91] for expressing domain specific properties, and CADP, LOLA and Spin [Gar96] for automated reasoning (simulation, theorem proving and model checking). Possibly unique to our approach is an explicit theory of features, in the high level presentation, which guides the exploration of the trees. The theory of features includes precedences – ordering relations between features which depend on call states. There are two types of ordering: intra-user orderings and inter-user orderings, both of which depend on the state(s) of the user processes involved. The former define priorities between features for a given user whereas the latter define priorities between features involving different users. For example, if a user is behaving in both a Call Forward mode and a Call Forward When Busy, then the intra-user ordering, for the idle state, will determine which feature has precedence the user is called. Figure 2 below shows a portion of an example intrafeature precedence. Call Forward has precedence over the other diversions, e.g. Call Forward When Busy (CFWB) and Call Forward No Reply (CFNR). Call Waiting (CW) has precedence over divert busy, for the idle state, but the precedence is reversed, for the speech state. This means that during an on-going two-party call, the first new incoming call will activate the call-waiting alert signal, whereas the second incoming call will be diverted to another user. With these feature precedences, it is possible to offer both Call Waiting and a Forwarding feature within in the same service.

CF CW CFWB

CFNR CFWB CFWB CW

For all states For idle state For speech state

Figure 2: Example Feature Precedences

Feature precedences are crucial, as they indirectly control behaviour of the call processes. The relations need not be total, or have maximal elements, so the model may not be deterministic (and the result may be an interaction).

This approach permits a broad range of analysis including verifying properties, validating tests, simulating or animating particular scenarios, and static analysis techniques. The generic properties are those such as non-determinism, deadlock, and unreachable states, at the model level, and satisfiability, completeness and consistency at the property level Unexpected satisfaction (or otherwise) of these properties usually indicates an interaction. The specific properties refer to the characteristics of the application domain and expected feature behaviour; failure to satisfy such a property, in the presence of another feature, as described above, indicates an interaction. The static analysis includes analysis of the high level presentation of the model. Here, we inspect choices between guarded expressions where the guards refer to the current modes. Because of the precedence relations, these guards may overlap (i.e. there may be more than one solution). Overlapping guards imply nondeterminism which in turn may indicate an undesirable interaction.

Systematic feature interaction resolution techniques (off-line) have not, in general, received as much attention as detection techniques. To some extent this is understandable because when an interaction is detected at design time, the design can be modified (in an ad hoc way, if necessary) so that the interaction is "designed away". Many of the approaches which do aim to resolve interactions are based on the concept of precedences or priorites between features. Indeed, in our (off-line) approach, when an interaction is detected, it is resolved by modifying the features theory; i.e. the precedences. So, in effect, the result of our analysis is a new feature theory. Other approaches involve additional "supervisory" processes, such as [CLL97] which employs supervisory control theory, or meta-level processes which remove conflicting states [Kho97]. While these are technically off-line approaches, they might also be seen as providing the formal under-pinning of an on-line approach.

2.5. On-Line Detection and Resolution

It is clear that offline techniques on their own are insufficient. Firstly, they cannot handle the number of potential cases. Secondly, subject to deregulation, feature descriptions will not always be available for inspection.

So, a number of non-adaptive on-line approaches have been proposed. By non-adaptive, we mean that a priori knowledge of the features, interactions and their resolution is required. This knowledge takes the form of pre-defined tables [Cai92], state transition rules, [KO95], Abstract Date Types [Mak95], defined roles located between features and terminals [Fri95], and rules used in user agents within a specialised architecture [ZWOSW95]. Most of these approaches require knowledge on an ‘individual basis’ as distinct from combinations, the latter suffering from combinatorial complexity (e.g. as in [Cai92]).

In the shorter term, while the number of features is relatively small, non-adaptive approaches offer a pragmatic solution. However, for the longer term they are no longer tenable; techniques that adapt to unknown new features are required.

One approach advocated by Bellcore [GV94] is an architecture with software user agents representing users. The agents apply AI negotiation techniques to resolve perceived conflicts between features; the architecture also incorporates an arbitrator agent to settle issues that cannot be agreed by the user agents. A major characteristic of this work is the adaptive nature of their architecture; it aims to handle new and previously unknown interactions.

Two areas of work at Strathclyde have also advocated adaptive solutions. Firstly, Tsang et al. [TM94, TM98] have developed a technique which can both automatically detect and resolve features interactions between any number of services with any number of subscribers. Uniquely, the project has taken into consideration associated issues such as service provider privacy, distributed processing, service delay, and the explosion of services to provide an approach which is both general and scalable. The ITU-T CS-1 distributed functional plane standard (Q.1214) was the IN model adopted for the project and this provided the location of the Feature Interaction Manager and allowed information flows. The new approach was based around the use of behaviour modelling in which the Feature Interaction Managers first obtained the behaviour signatures of services through a ‘learning’ phase. Once obtained, these signatures were used to detect or predict and then resolve feature interactions; concepts borrowed from distributed operating systems were used in the resolution strategies. A testbed system was constructed to test the new approach, with encouraging results [MTMS95].

Secondly, Marples et. al. [MMS95, MTMS95, MM98] have developed an environment in which features can be independently written and tested, the intent being to develop a technique that is suitable for features where knowledge about the internal operation of the feature is not available. We believe this to be the first approach as equally applicable to legacy systems ands to new feature sets. The idea is that when multiple features are brought together, interactions between them can be observed, and techniques have been developed for the automatic detection of these interactions. Then, tables containing suitable resolutions for any interaction detected implement an operational system. In the absence of a suitable resolution in the table, a debug output is generated which allows a suitable resolution to be calculated and added (by a human operator).

The key concept of this on-line approach is that it encapsulates each feature in a ‘cocoon’, which gives it transactional semantics; this transactional approach is considered in more detail in the next section.

3. Dealing with Legacy: A Transactional Approach

The A pre-existanting Legacy System is considered as a Black Box, with inputs and outputs as shown in Figure 3.

Figure 3 : Considering the Legacy Software System as a Black Box

When a stimulus in provided to the system there will be a set of responses to it, with the legacy system returning back to a quiescent state when these responses have been generated. The ability to be able to treat the legacy system as a black box is, of itself, of limited utility. So, a manager entity is introduced, responsible for sending the stimuli to the legacy system and receiving the responses back. The manager makes it possible to modify the externally perceived actions of the system. In the degenerate case the manager provides a null function and simply passes the stimuli and responses through transparently and the operational semantics of the system are maintained. This arrangement is shown in Figure 4.

Figure 4 : Insertion of a manager entity into the system

The manager is free to perform whatever manipulations it requires in relation to the stimuli and responses; it can block or modify stimuli from being forwarded and it can block responses from being returned to the hardware. In this sense, the manager can modify the operation of the legacy system without actually having to modify any of the code within it.

There are two drawbacks to this simple approach. First, it is quite possible that state models representing the hardware are being maintained by the Legacy System which are updated via the stimuli. If these stimuli are blocked or manipulated, the state as represented by the model may become out of step with the state of the real physical system and problems will develop – a simple example is the case where the off-hook signal from a terminal is blocked by the manager. T and so the legacy system never updates its internal model to indicate that the terminal is now receiving dialtone. As a result, the legacy system would still attempt to present an incoming call to a terminal that was off-hook. Second, it is very difficult to create the appropriate algorithms in the manager to manipulate the stimuli and responses to achieve new functionality. It is at best a little dangerous due to the limited knowledge about what is really happening in the legacy system and it is doubtful that such an approach would yield a system of sufficient quality for production use.

It would be much better forif the manager couldto see what responses the legacy system would generate if it were to receive the stimulus. If the forecast response was not suitable then the state of the legacy system could be updated, perhaps by deprovisioning certain features or changing the state of a terminal prior to the stimulus being applied. For example, if the manager wanted to prevent a call from terminal A to terminal B and a stimulus arrived that would cause this to occur the Manager could set B off-hook, then apply the stimulus (resulting in A receiving busy-tone) then place B on-hook again. Thus, the integrity of the legacy system state is maintained while new functionality is appended.

It is no simple matter to forecast what the response of the legacy system would be to an event – this would require the system itself, or an exact model of it. But, by using the idea of rollbacks from transaction processing it is possible to create a system to do exactly this. Conceptually, when a stimulus arrives at the manager, it creates an exact copy of the legacy system, which it passes the stimulus to. The copy processes the stimulus and passes its response set back to the manager. If this response set is acceptable the manager deletes the original of the system and allows the copy to proceed as a replacement. This is the commit case. If the manager does not like the response set then it can simply delete the copy and start again, perhaps this time changing the state of the copy before passing it the stimulus. This is the abort case. Figure 5 shows the actions carried out for the abort and commit cases described above.

Figure 5 : Commit and Abort cases

3.1 Exploiting the Rollback Capability

When a rollback capability is provided, it can be exploited to ‘explore’ potential states of the overall system without any requirement for these states to be committed to the hardware. By using the responses of features to generate new trigger events, it is possible to generate a graph of all possible posterior system states based on an initial event, i.e. to generate a behaviour tree. There will be a number of interaction resolution decisions to be taken during the construction of this tree, each represented by a branch of the tree. For example, if features F₁and F₂ both offer a response to an event, we may consider at least 4 possible resolutions: offer both responses, in either order, or offer only one response. So, we generate a decision tree (which of course may contain further branching). The terminal nodes, or leaves of the branches, should be stable system states. E Ideally, errant paths, for example, through illegal states or those generated by trigger states, or infinite loops, can be pruned from the tree and do not require further consideration. Once this tree has been created, a means of determining the best, or at least a best, route through it, i.e. the resolution of the interaction between each of the feature modules, needs to be determined. In the current system this is done by means of operator intervention, with the resolution selected for this particular case being cached into a store for future re-use.

There are several drawbacks with this approach, in particular, we are lacking techniques for pruning a behaviour tree, selecting a best path through a tree, and resolving interactions via tables. To overcome these limitations, we have proposed a hybrid approach.

4. Formally Based and On-Line: A Hybrid Approach

In essence, the drawbacks mentioned above are due to the fact that there isn't enough feature ‘"knowledge’" to make informed choices when an interaction is detected: that is, we do not ‘"know’" how to make choices about which paths to prune and which paths are best, in some well-defined sense. But wWe do, however, have extensive experience of analysing behaviour trees within the off-line context. So, our proposed solution is to develop algorithms to prune and select branches which incorporate knowledge gained from an off-line analysis; that is, we propose a hybrid approach.

As an example, the knowledge from the off-line analysis will take the form of the feature precedence relations, derived generic laws such as "a terminal device must never receive message x followed by message y, as that will lead to deadlock", and theories of maximal satisfaction of a set features (for selecting a path in a tree). This knowledge will be incorporated into the pruning/selection algorithms and vice-versa, run-time experience with the hybrid system will drive further analysis and hence more knowledge. We foresee an iterative process, early implementation and evaluation will inform and validate (or invalidate) the off-line, theoretical aspects.

Additionally, we will expect the system to handle additional functionality in the way of extra features. The architecture of the hybrid system is given below in Figure 6. Under the auspices of the SEBPC research programme, we will be developing and implementing this architecture with a live, legacy PABX switching system.

Off-analysis:

feature precedences, laws

satisfaction relations

Figure 6 : Multiple Feature Modules together with Legacy software

5. Challenges and Open Questions

There are a number of important challenges and open questions which the research programme will address;

First, the level of abstraction and call primitives and event triggers in the formal models is crucial. We believe that there is no one answer here, rather, a hierarchy of models is needed. Moreover, we will need to demonstrate meaningful relationships between these models, and between them and the implemented system. We have identified at least three quite distinct modelling levels: at the level of a call process, a network of user views, and the system with feature manager. Figure 7 gives examples of these, in a mixture of notations including finite state automata, message sequence charts, and (concurrent) process diagrammes. At the far right of the figure, in the system with feature manager, the additional rectangles within the boxes denote the interfaces provided by the ‘"cocoons’".

call={} idle idle

waiting ={}

call={a,b} dial user legacy

waiting ={} switch

call={a,b} call user

waiting ={c} feature

ringing feature

call={a,c} manager

waiting={b} speech speech

call={a,b} feature

waiting={c} idle

disconnect

call={a,b}

waiting= {} idle

call process network of user views system with feature manager

Figure 7: three modelling levels

These different levels of abstraction may also lead to a corresponding number of ‘cocoons’, each one encompassing the ones inside. The result then would be a layered model where the functionality of a layer is defined by each of the layers inside it. This has strong parallels with the legacy systems encapsulationthread presented earlier, thus this approach and may form the bond between the practical and theoretic aspects of our work. Moreover, we believe that there may be parallels with the idea of levels of competence and subsumption architectures, as applied in the field of robotic control by Brooks [Bro85]. (In the robotics context, a high level requirement may be obliged to override a lower level requirement, ; for example, "dispose of bomb" may override " do not get damaged". )

Second, we will develop a number of competing pruning and path selection algorithms, which take as inputs the off-line analysis results, as well as theoretical and empirical justification for them. AI techniques, or off-line meta-reasoning may be employed here. One example of the latter is tothe analyse analysis of the results of running the system over a period of time, particularly when resolution has been "hand-crafted" (e.g.. For example, one might conclude that since in the last 20 similar situations, the event x/feature y was given precedence over event w/feature z, this precedence should now be adopted, globally.). SoThis also means that , we would introduceing a feedback loopk into the system.

Third, keeping a resolution in a table for re-use is imperfect since it is very difficult to know that the whole of the rest of the system is in the same state as it was when the resolution was originally generated. Without this information it is quite possible that a resolution will be incorrectly applied. At the present time it appears that information about the internal structure of the legacy system may be required to allow this to be possible. While a hybrid approach does not necessarily help solve this problem, we may be able to employ off-line analysis in future.

Fourth, the transactional approach has so far been developed as a prototype simulation model. Migrating this to a live switch is a major undertaking and will require careful specification of the interfaces, enhancements to the environment to enable roll-back and commit behaviour, as well as the generation of the behaviour trees.

5. Conclusions

Any system based on the concept of providing multiple, concurrent, software applications, or services, has the potential to suffer a range of interworking problems. It is clear that in the telecommunications domain, neither a purely off-line nor an on-line approach is sufficient for resolving interactions between new services and legacy services, particularly in a multi-vendor market. The general, on-line approaches have proved ineffective because there isn't enough feature "knowledge" to make informed resolution choices when an interaction is detected; some knowledge about the interacting features is required in order to make appropriate choices. On the other hand, purely off-line approaches based on an understanding (whether formal or informal) of feature specifications simply isn't possible when legacy software or third party vendors are involved. As a solution, we have proposed a hybrid architecture which combines an on-line, transactional approach, with an off-line formal analysis. The approach is perhaps unique in field in that it addresses both legacy systems and a more creative use of formal analysis techniques, as suggested in [Cal98].

Much of the basic architecture and formal analysis techniques have been developed, and we have defined the requirements for the relationships between the off-line and on-line aspects. Our research programme involves developing effective theories and implementations of those relationships, within the context of the transactional architecture and live switching system, and addressing the challenges outlined above. We believe that this research is not just applicable to the telecommunications domain, but that the software engineering community, on the whole, will benefit from this experience.

References

[AA97] P.Au, J. Atlee. Evaluation of a State-Based Model of Feature Interaction. In [FIW97].

[AC97] I. Aggoun, P. Combes. Observers in the SCE and the SEE to Detect and Resolve Service Interactions. In [FIW97].

[AG95] A. Alfred, N. Griffith. Feature Interactions in the Global Information Infrastructure. In Proceedings of 3rd ACM Sigsoft Symp. on Foundations of Software Engineering. Software Engineering Notes, Vol. 20, No. 4, Oct. 1995.

[BCDGHL] T.F. Bowen, C.Chow, F.S. Dworak, N. Griffeth, G.E. Herman, Y. Lin. The Feature Interaction Problem in telecommunications Systems. Bellcore Internal memorandum, December 1988.

[BROOKS85] Brooks, R. A Robust Layered Control System for a Mobile Robot, MIT A.I. Memo 864, September 1985, under ARPA contracts N00014-80-C-0505 and N00014-82-K-0334.

[Cal98] M. Calder. What use are Formal Analysis and Design Methods to Telecommunications Services? Feature Interaction in Telecommunications and Software Systems, pp.10-31, IOS Press, Amsterdam, 1998.

[Cai92] Cain, M. Managing Run-Time Interactions Between Call-Processing Features. IEEE Communications Magazine, February 1992, pp. 44-50.

[CGLNSV94] E.J. Cameron, N.D. Griffeth, Y.J. Lin, M.E. Nilson, W.K. Shnure, and H. Velthuijsen. A feature interaction benchmark in IN and beyond. .In [FIW94].

[CL98] E.J. Cameron, J. Lin. Feature Interactions in the New World. In [FIW98].

[CLL97] Y-L Chen, S. Lafortune, F. Lin. Resolving Feature Interactions Using Modular Supervisory Control with Priorities. In [FIW97]

[CN98CM98] M. Calder, A. Miller. Analysing a Basic Call Protocol using PROMELA/XSPIN To appear in DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 1998.

[CP95] P. Combes, S. Pickin. Formalisation of a User View of Network and Services for Feature Interaction Detection. In [FIW94].

[FIW92] Proceedings of International Workshop on Feature Interactions in Telecommunications Systems II, St. Petersburg, U.S.A., IEEE Communiciations Society, 1992.

[FIW94] W.Bouma and H.Velthuijsen (eds.), Feature Interactions in Telecommunications Systems II, Proceedings of International Workshop, Amsterdam, IOS Press, 1994.

[FIW95] K.E. Cheng and T. Ohta (eds.), Feature Interactions in Telecommunications Systems III, Tokyo, IOS Press, 1995.

[FIW97] P. Dini, R. Boutaba, and L. Logrippo (eds.), Feature Interactions in Telecommunications Systems IV, Montreal, IOS Press, 1997.

[FIW98] K. Kimbler, L.G. Bouma. (eds.), Feature Interactions in Telecommunications and Software Systems V, IOS Press, 1998.

[Fri95] N.Fritsche. Runtime Resolution of Feature Interactions in Architectures with Separated Call and Feature Control. pp 43-64 in [FIW95].

[Gar96] H. Garavel. CAESAR Toolkit. Available from Hubert.Garavel@imag.fr.

[GV94] N.D.Griffeth and H.Velthuijsen. The Negotiating Agents Approach to Runtime Feature Interaction Resolution. In [FIW94].

[Hol93] G. Holzman. Tutorial: Design and Validatin of Protocols, Computer Networks and ISDN Systems, 1993, Volume 25, Number 9, pp. 981-1017, 1993.

[Hol95] G. Holzman. Using SPIN. User manual available at http://www. Lucent Technologies 1995.

[KO95] Y.Kawarasaki, T. Ohta. A New Proposal for Feature Interaction Detection and Elimination. In [FIW95].

[Kou97] A. Khoumsi. Detection and Resoluion of Interactions between Services of Telephone Networks. In [FIW97].

[LL95] F.J. Lin and Y-J. Lin. A Building Block Approach to Detecting and Resolving Feature Interactions. In [FIW94].

[LOTOS] Information Processing Systems -- Open Systems Interconnection -- LOTOS -- A Formal Description Technique Based on the Temporal Ordering of Observational Behaviour. International Organisation for Standardisation. 1988.

[Kim97] Addressing the Interaction Problem at the Enterprise Level. In [{FIW97].

[Mak95] B. Makarevitch. Resolving Service Interactions by Service Components. In [FIW95].

[MMS95] D.J. Marples, E.H Magill, and D.G. Smith. An infrastructure for Feature Interaction resolution in a multiple service environment - The application of Transaction Processing techniques to the Feature Interaction Problem. TINA'95 Telecommunications Information Network Architecture conference, Melbourne, Australia, February 1995.

[MM98] D.J Marples, E.H Magill. The Use of Rollback to Preven Incorrect Operation of Features in Intelligent Network Based Sysstems. In [FIW98].

[MTMS95] Marples D.J., Tsang, S., Magill E.H., and Smith D.G. DESK: A flexible testbed for simulating telecommunications network services. In [FIW95].

[Sti91] C. Stirling. Modal and Tempooral Logics. In Handbook of Logic in Computer Science, Oxford University Press, pp. 477-563. 1991.

[Tho97] M. Thomas. Modelling User Views of Telecommunications Services for Feature Interaction Detection and Resolution. In [FIW97].

[TM94] Tsang S, and Magill E.H. Detecting feature interactions in the intelligent network. In [FIW94].

[TM97] Tsang S, and Magill E.H. Behaviour Based Run-Time Feature Interaction Detection and Resolution Approaches for Intelligent Networks. In [FIW97].

[TM98] Tsang S, and Magill E.H. Learning to Detect and Avoid run-Time Feature Interaction Detection and Resolution Approaches in Intelligent Networks. In IEEE Transactions on Software Engineering. Volume 24, Number 10, pp. 818-830, October 1998.

[TMK97] S.Tsang, E.H.Magill, and B.Kelly. The Feature Interaction Problem in Networked Multimedia Services - present and future. In BT Technology Journal Vol. 15, No. 1, 1997.

[Vel95] H. Velthuijsen. Issues of Non-monotonicity in Feature Interaction Detection. In [FIW95].

[ZWOSW95] I. Zibman, C.Woolf, P.O’Reilly, L.Strickland, D.Willis and J.Visser. Minimizing Feature Interactions: An Architecture and Processing Model Approach. In [FIW95].