next up previous contents
Next: Solution Synthesis for Up: Using Hunter-Gatherer in Previous: Using Hunter-Gatherer in

Identifying Subgraphs of Dependency

Section 3.2 presented an efficient algorithm for partitioning a graph into subgraphs for use by HG. This partitioning algorithm tries to create subgraphs that minimize the complexity of the main HG algorithms. The algorithm began by creating ``seed'' subgraphs in various regions of the graph. A potential seed is created for each vertex, or variable, in the input simply by listing all of the vertices that are adjacent to it. For instance, for Grupo-Roche (G), adquirir (A) and Dr-Andreu (D) are adjacent. This gives one potential seed (A,D,G). Potential seeds are generated for each variable, with the results as follows:

G (Grupo-Roche):   (A,D,G)
D (Dr-Andreu):     (A,D,G)
A (adquirir):      (A,D,G,ATD,C)
ATD (a-traves-de): (A,ATD,C)
C (compania):      (A,ATD,C,E,ESP,S)
S (su):            (C,S)
E (en):            (C,E,ESP)
ESP (Espana):      (C,E,ESP)

These seeds are then ordered with the smallest first (and duplicates removed):

(C,S)
(A,D,G)
(C,E,ESP)
(A,ATD,C)
(A,D,G,ATD,C)
(A,ATD,C,E,ESP,S)

A certain amount of randomness might be introduced in this ordering, since seeds with the same length are arranged in order of processing, which is determined by their ordering in a hashtable. This has not caused any problems that we have noticed.

Our goal now is to keep only one seed for a given region of the graph. So we simply go through this list in order, and delete any that are in a region included by a previously accepted seed. We define the region of a seed to be the variables in the seed, plus any variables adjacent to the seed. The regions for each seed are given below:

(C,S):             (C,S,A,ATD,E,ESP)
(A,D,G):           (A,D,G,ATD.C)
(C,E,ESP):         (C,E,ESP,S,ATD,A)
(A,ATD,C):         (A,ATD,C,G,D,S,E,ESP)
(A,D,G,ATD,C):     (A,ATD,C,G,D,S,E,ESP)
(A,ATD,C,E,ESP,S): (A,ATD,C,G,D,S,E,ESP)

We always accept the first seed on the list. Having done this, for this example, all of the other seeds are rejected because they all contain ATD in their region, which is included in the (C,S) seed. This makes sense because this is a simplified example with a small input graph. Larger inputs result in more than one seed, but the process is identical. This concludes step 1 of the algorithm in section 3.2.

HG requires subgraphs to identify the following information: its input variables (that are not involved in any input subgraphs), its input subgraphs, and all of its variables that are constrained outside the subgraph. For our seed, (C,S), this information can be represented as ((C,S), nil, (C)).

Now, for each seed (in this case only one), we determine the best possible action to take to expand the seed. Possible actions to extend a subgraph are either:

  1. The addition of variables onto the subgraph so that at least one additional variable in the seed is now covered (has no constraints outside the new subgraph).
  2. The combination of two subgraphs so that at least one new variable that wasn't covered in either of the two subgraphs alone is now covered.

The best possible action is the action that requires the lowest input-complexity. As defined in section 3.1, the input complexity is the sum of all the output complexities of input subgraphs, plus the number of individual variables added.

Since, in this case, there are no other subgraphs, only option 1 is possible. There is only 1 variable in the (C,S) seed that is not already covered, namely C. Therefore, the only action possible is to add on all the variables necessary to cover C. This results in the subgraph described by: ((A,ATD,E,ESP), (C,S), (A)). In other words, by adding on the variables A, ATD, E and ESP to the input subgraph (C,S), we create a new subgraph whose only variable constrained outside the subgraph is A. To keep track of this subgraph, we name it (A,ATD,E,ESP,(C,S)). Since this is the only possible action, we take it, ``officially'' producing the subgraph (A,ATD,E,ESP,(C,S)).

Next we try to expand this subgraph. Similar to our first expansion, there is only one variable constrained outside, namely A. Again, there are no other subgraphs available to combine with, so we simply need to add on variables to cover A. This results in the subgraph ((G,D), (A,ATD,E,ESP,(C,S)), nil). That is, by adding on the variables G and D to the subgraph (A,ATD,E,ESP,(C,S)), we produce a subgraph that has no variables affected outside the subgraph. Since this is the only action possible, we take it, producing a subgraph that contains the entire problem. Thus, we are done.

  
Figure 30: Subgraph Construction - Method 1

Figure 30 graphically displays the subgraphs created by this process. The input complexity of subgraph 1 is 2, since it contains two input variables. Its output complexity is 1 because only one of its variables are constrained outside the subgraph. The input complexity of subgraph 2 is the output complexity of subgraph 1 (1), plus the number of variables added (4), for a total of 5. Its output complexity is again 1. The input complexity of subgraph 3 is the output complexity of subgraph 2 (1), plus the number of variables added (2), for a total of 3. The maximum input complexity, which will dominate the complexity of HG for this problem, is therefore 5. Assuming each variable had two values in its domain for this simplified problem, we can therefore expect to examine on the order of = 32 combinations it.

It is easy to see that this is not the best way to partition this particular graph. This emphasizes the point made earlier that the algorithm in section 3.2 is a heuristic algorithm that is designed to quickly partition a graph in a reasonable manner with respect to the needs of the main HG algorithm. Even though the optimal partitioning was not achieved, HG still guarantees the optimal answer, even if it does take a little longer.

We have developed additional algorithms to partition specific types of problems. In particular, we can take advantage of the tree-shaped bias in computational semantics (see section 7) to produce, even more easily, the partition in Figure 31. The maximum input complexity for this partitioning is 4. Actually, partitions with maximum input complexity of 3 can be found for this input; however, since partitioning is not the focus of our research, we will not present these additional algorithms. The point is that the graph partitioning methodology, although certainly general enough to apply to many problems using the algorithm in section 3.2, can be further optimized for specific problem types. See section 9.4 for further discussion on graph partitioning, its relation to the input graph topology, and how it can be used to more simply describe how and why HG works.

  
Figure 31: Subgraph Construction - Method 2 For Tree-like Inputs



next up previous contents
Next: Solution Synthesis for Up: Using Hunter-Gatherer in Previous: Using Hunter-Gatherer in



Steve Beale
Wed Mar 26 09:27:50 MST 1997