1. Data mining
NETMAP is one of a new series of tools for what is increasingly called "data mining" or "knowledge discovery". This is the extraction of implicit, previously unknown and potentially useful information from very large quantities of available information. Conventional methods of interrogating databases such as this Encyclopedia will discover only what the user knows is there through indexes and browsing. Increasingly, especially at the policy level, there is a need to respond to patterns of information which are as yet unsuspected and have not previously been detected.
In order to be able to handle very large amounts of information, NETMAP uses an apparently simple, but remarkably powerful method of analysis. It was originally developed by an Australian, John Galloway, as an alternative method of rapidly analyzing and representing complex corporate structures, both in terms of formal and informal communication pathways -- and presenting the results in an intuitively meaningful holistic manner. For this reason its facilities were tested using a complete database of 15,000 international organizations and their 80,000 relationships derived from the Yearbook of International Organizations, as well as a second database covering the 10,000 problems in this Encyclopedia and the 100,000 relations between them.
2. Circular displays
The program takes specified items of information from the databases and places them at positions around the circumference of a circle generated on the computer screen. The scale used depends on the number of nodes. These may be coded to distinguish different types of node. In one test the full set of problem relationship data was used. One option was to colour code according to problem section. The database is explored by using the program to draw in the relationships between the problems (or organizations) under different conditions. The relationships could also be colour coded according to type.
The resulting displays are of very high quality, despite the extraordinary amount of detail. The use of colour coding can result in displays which are aesthetically both unusual and elegant -- a change from the sterility of conventional organization charts, pie charts and system diagrams. The aesthetic quality of the maps enables more information to be carried without overloading the viewer as is normally the case with technical diagrams.
3. Interactive exploration
The merit of NETMAP is that although any given display is relatively simple, and therefore inherently comprehensible, displays can be redrawn in rapid succession by modifying a variety of parameters in conventional pull-down menus. The range of problems included in the circle could be increased or decreased. All types of relationships could be drawn in or only those of a single specific type. If large quantities of information were packed into a given display, the details could be explored by zooming in on any particular part of it. Particular problems could be highlighted and the display could, if required, be redrawn with only those items included.
The ability to analyze the patterns embedded in the data is considerably enhanced by applying a clustering algorithm to the data. In this way clusters of highly interconnected problems could be extracted and displayed as satellites of the central circular display.
A display could also be drawn excluding all problems (or organizations) which did not have more than 5 (or 10) relationships to some others.
Although any given display can be printed, with or without colour coding, the merit of the tool lies in its interactive possibilities. It combines the image recognition skills of the person familiar with the data and the ability of the computer to reconfigure large quantities of data to highlight and explore such patterns. In this way NETMAP enhances the ability of the person familiar with a network of problems or organizations to clarify understanding about patterns in the information. The technique is extremely powerful when seeking proof for an hypothesis or in confirming a suspicion. Majorcorporations have used it to discover information critical to their operations and previously unknown to them even though present in their data. Police forces are now using it to locate "godfathers" controlling networks of fraudulent transactions involving banks, real estate agents and accounting firms, many of which were unaware of what their facilities were being used for.
4. Ease of implementation
The suppliers of NETMAP stress the ease with which it can take raw data from other databases in order to make analyses possible by users without any high-level computer training. This is confirmed by the ability to take a version of the data from this Encyclopedia and from the Yearbook and the speed with which it was adapted to the NETMAP package in order to produce significant results.
The main constraints on its more widespread use are the cost of the more sophisticated computer systems required to run NETMAP on any major UNIX platforms and the cost of the software itself. Minimum hardware required: 24MB of RAM (up to 128 MB if rapid response on large databases is required, 4 Plane colour monitor, 50 MB of disk + whatever is required for the database (namely up to 1 gigabyte for the problems data from this Encyclopedia). Software required: X-Window XII,4; OSF/Motif 1.1, plus NETMAP itself. A stripped down version of NETMAP, costing about £1000, is however available for PC operation (16 MB RAM). The analysis is also available as a bureau service. The product is marketed worldwide by Software AG (Contact: Mungo Park, Software AG UK Ltd, Charter Court, St Albans, AL1 3XH, UK).
5. From organizations to problems
The following reflections can perhaps be best understood by reframing the data on "world problems" collected from a wide variety of international organizational sources as an approximation to the challenge of "world governance". Governance at the world level is currently a fashionable notion. NETMAP is most readily applied to a single complex corporation. Reframed in this way the data can be seen as a reflection of the condition of the "world as a corporation", or more conventionally as an "international community".
From this perspective "problems" as data nodes may then be seen as units within this corporation. These units "communicate with each other in various ways as suggested by the types of relationship. Thus:
- the "broader/narrower" relationships correspond to conventional hierarchical relations in institutions.
- the "related" form corresponds to horizontal "consulting" relations.
- the "aggravating/aggravated by" are systemic relations in which units create strife for each other by transferring challenges (buckpassing), workloads, problem displacement, and the like through the institutional system
- the "reduces/reduced by" are systemic relations which alleviate the above form of institutional pressures
The whole purpose of using NETMAP is to find ways of presenting this information comprehensibly to bring out new levels of meaning on which higher order decisions can be taken. The challenge is to find ways of clustering and filtering the data to take advantage of NETMAP's features.
6. Hierarchical approach
This approach would take advantage of the explicit and implicit hierarchies in the data:
- Letter coding sequence B,C,D,E,G (not F) as implying a descending hierarchical structure (G most detailed) through the problem sections (PA to PG). Here all Bs are roughly at the tops of hierarchies.
- Relationship coding "broader/narrower", specifying actual hierarchical relationships.
In contrast with a complex corporation, there are probably too many hierarchies and too many Bs to bear useful comparison with corporate divisions or the like. This is not to say that NETMAP would be unable to represent the Bs or hierarchy tops around a circle. The question is more in what order and to what end, whether or not some less important (ie less connected) hierarchies are omitted.
The obvious approach to ordering is to take advantage of the systematic subject coding using the first cell digit (the matrix column) employed in the companion Yearbook of International Organizations (Vol. 3) to position Bs around the circle. The challenge is of course what to do with the fact that each B may have a multiple subject coding.
Relationships across the circle between Bs could be based on the number of relationships crossing between hierarchies (and/or subject categories). Such relationships could be accumulated up from counts of the detailed relationships within each hierarchy that cross to other subjects and/or hierarchies.
There is indeed a possibility of treating "problems" as dysfunctional links between "subjects", rather than as nodes in their own right.
7. Questions to be asked
These might include:
- which hierarchies/subjects are a focus for "communications" within the international community.
- which hierarchies/subjects are isolated within this pattern
- which hierarchies/subjects function as intermediaries in preference to direct communication between others
Such a display could be enhanced with the external circles to reflect detail within particular subjects or hierarchies.
8. Subject approach
It is possible that the hierarchical approach above could be distinguished from a purely subject approach. In the former the effort is made to represent as many major hierarchy tops around the circle. In the latter attention is focused on major subjects, whether matrix columns (lst digit), or columns minus certain rows (2nd digit), with hierarchies clustered within subjects.
One possible display approach in both cases would have the main ring display one form (tops or subjects) with external rings displaying the other (subjects or tops respectively). Thus for an external subject ring the display would be indicating for a given hierarchy (in the main ring) how much it communicated with other subject areas. Such external rings would then tend to be predominantly of one colour range or another if subjects without any communication from that hierarchy were excluded from the external ring display.
9. Circles and loops
Whilst direct connections between problems are reasonably obvious and intuitively reasonable, indirect links are quite another matter. They are both difficult to recognize and difficult to grasp in their integrity. Of special interest are vicious cycles and loops whereby one problem is aggravated through a circle of aggravation of which it is a part.
Presumably NETMAP could be used to ask questions such as which problem hierarchies are part of a circle of 5, or of 9, components.
10. Interlocking vicious cycles
It is useful to hypothesize that the most intractable problems are a part of one or more such vicious circles. In this sense the challenge of world governance calls for a means of identifying such interlocking circles. A powerful display would show, and distinguish, different (colour coded) circles.
Clearly the power of NETMAP in a dynamic mode (in a decision-environment) would be to shift between displays involving circles of 3, 4, or N components.
Clearly there is the possibility of overlaying different kinds of relationship. Of interest in the problems data are the overlays of the "aggravating/aggravated" with the "reduces/reduced by" patterns.Metaphorically these can be seen as anabolic and catabolic cycles. This would open the way to questions such as:
- which problems are absolute "sources" or "originators" of such functional relationships (being unaggravated by others) and which are absolute "sinks" (in that they do not aggravate others).
- which problems appear not to be part of any remedial cycle and which are a vital part of a remedial cycle (to exaggerate, "death" is an important component in the reduction of the "overpopulation problem")
The UIA also has data on some 30,000 international organizations and their relationships to each other and to their membership countries. It is from these organizations that the problem data was largely obtained. Clearly there is merit in working towards overlaying problem networks with organization networks. This would focus attention on:
- problems on which no organizations appear to be focusing directly
- organizations which do not appear to be focusing on any particular problems
- properties of the problem network in relationship to the corresponding organization network
12. Interlocking highways
It could be hypothesized that the way that multi-component circles interlock may provide clues to the challenges of governance. NETMAP could provide an environment in which such higher order structures could be reviewed and discussed. In particular, it could provide clues to the kind of institutional communication patterns needed to constrain problem networks that were getting out of hand.
From this perspective there are interesting possibilities for managing data in electronic mail environments for an extended network of peers. Faced with overload and junk mail problems, the challenge is to ensure that the optimal number of channels remain open, and that maximum use is made of intermediaries to filter new topics. Ideally intelligent software would monitor interest in topics and recommend new channels and conferences in the light of the global pattern of interlocking communication pathways. This network management problem is dealt with from a telecommunications perspective but not from a content perspective. It is for this reason that institutional systems develop dysfunctional communication patterns to protect themselves from overload and transformative pressures.
The data is imperfect in that there are redundancies in the links due to confusion about how problems are clustered (just as a division of the United Nations may be confused with "United Nations" or with some other UN agency depending on the informant). These redundancies are notably present when Problem A is indicating as aggravating B and C, but B is indicated as a part of C. Where these are spotted, this is corrected to remove the A to C link that is implied by the hierarchical relationship. However, as in corporations, it is questionable how far this rationalization can usefully go. It may indeed be that Unit A of a corporation communicates with Unit D of another, whatever the formal relationships up and down channels. Eliminating formal redundancies may therefore involve loss of functional information.
Ideally such redundancies need to be flaggable for review rather than automatically edited out.
14. Initial experiments
In order to explore the implications of NETMAP for the problem and organization data much of the relationship information was loaded for demonstration purposes. In the brief time available a series of diagrams were produced (see Figures 1-4 and the inside covers of this Encyclopedia).
These diagrams serve mainly to illustrate a range of possibilities for extracting significant patterns from large quantities of information. In their present form, as the product of a demonstration, they do not reflect the results of careful selection. The demonstration served to show that the tool could well be used to think about and explore such patterns over a period of days before selecting particular patterns as meriting production in hardcopy.
Figures 1 and 2: Relationships amongst selected major world problems
Figure 3: Relationships amongst Figure 4: Relationships between bodies of international NGOs in the UNESCO system of the United Nations system