[Debian-bootstrap] calculating feedback arc set and total order: solved!

Johannes Schauer j.schauer at email.de
Mon Nov 19 21:29:48 MSK 2012


Hi,

long story short
----------------

1. I managed to convert the entire dependency graph for native
   compilation into a DAG using manually entered reduced dependencies as
   well as forcefully breaking build dependencies by calculating a
   feedback arc set

2. The vertices of the resulting acyclic graph were assigned to a total
   order to create a final build order for the native case

some performance details
------------------------

I created an algorithm that calculates an approximation to the feedback
arc set problem using a greedy algorithm over the amount of cycles per
edge. Here some numbers on how this algorithm performs on our specific
problem domain:

159 vertices (out of which 69 are source packages), 444 edges
 - calculated a feedback arc set of cardinality 4 in 0.4 seconds
 - http://mister-muffin.de/p/NRTs.png

41 vertices (out of which 18 are source packages), 113 edges
 - calculated a feedback arc set of cardinality 2 in 0.1 seconds
 - http://mister-muffin.de/p/myFd.png

Those two examples above are subgraphs of the main SCC that were left
over after removing build dependencies that were found by Thorsten
Glaser, Patrick McDermott, Daniel Schepler and Gentoo. Also removed were
installation set of binary packages classified as "weak build
dependencies". The list was recently much extended by Daniel Schepler.

By using a bruteforce method, those two were found to even be minimal
feedback arc sets for these graphs.

One can also try to run the algorithm on the full SCC without prior
removal of known droppable build dependencies. In this case the
following are the results:

926 vertices (out of which 376 are source packages), 7354 edges
 - calculated a feedback arc set of cardinality 90 in 7 minutes

It is important to note, that out of the 90 edges that belong to the
feedback arc set chosen above, 32 edges are not optional as they break
cycles of length 2 and are therefor the only possible choice to break
those cycles. One can therefor never go below a feedback arc set of 32
edges for above graph.

The algorithm has the additional advantage, that it can be adjusted to
either give a good result fast or a better result after a longer time.
So above result can also read:

926 vertices (out of which 376 are source packages), 7354 edges
 - calculated a feedback arc set of cardinality 97 in 3 minutes

or

926 vertices (out of which 376 are source packages), 7354 edges
 - calculated a feedback arc set of cardinality 87 in 8 hours

implementation details (feedback arc set)
-----------------------------------------

First of all, all cycles with a length of two are found. Since there is
only one way to break those cycles, the respective build dependencies
are immediately added to the feedback arc set.

If the graph should not be cycle free by then, cycles up to length K are
calculated. This parameter K influences how long the algorithm takes and
how good the result is. Those cycles are given to a function "partial
feedback arc set" which, based on the cycles it is given, returns a list
of edges to remove. Those edges are added to the feedback arc set and
are removed from the input graph. K is then increased by 2. This part of
the algorithm is repeated until the graph is cycle free.

The function "partial feedback arc set" decides which edges to remove
based on its input cycles. It creates a hashtable mapping edges to a set
of cycle IDs that this edge is part of.

In a greedy manner, the edge with the most cycles going through it is
found, removed from the graph and then added to the result. Since all
cycles going through this edge are now broken, they are removed from all
other edges.  This step is repeated until all found cycles are broken.

implementation details (total order)
------------------------------------

A function for calculating the topological sorting is part of ocamlgraph
but it is of not much use here because compilation can be done in
parallel and a topological sort does not allow elements to be equal to
each other.

First the graph is reduced to only having vertices representing source
packages.  Vertices of installation sets are removed and edges to and
from them replaced by edges connecting source package vertices only,
retaining the respective dependencies. This step is not required but
reduces computation time later as the graph is considerably reduced in
size.

Then, all vertices without successors are found. Those are the source
packages that do not have any build dependency missing.

Then, iteratively the graph is traversed, starting from those initial
vertices. In each round all vertices are added for which their
successors are already part of the result. This is, all source packages
whose build dependencies are now available.

The result is a list of lists where the inner lists are of those
packages that have no interdependencies and can therefor be compiled at
the same time.

trivia
------

I got the idea for the feedback arc set approximation after trying
without avail to understand existing algorithms to understand
computation of a feedback arc set.

I learned the following:

 - too many publications are limited to undirected graphs and in fact
   solutions to the problem is much different from solutions to the
   problem on directed graphs
 - the focus is on the feedback vertex set problem but this is not too
   much of a problem as graphs can be converted in linear time from one
   representation into the other
 - algorithms often require a weight to work well which we dont have
 - I am not (yet) smart enough to understand 90% of those papers :)

There is a implementation to find a feedback arc set by Pietro in the
very early bootstrap code. This implementation assumes a weight for each
edge and then finds the cycle with the least weight. Unfortunately,
without information of which dependencies can be dropped, there are no
weights.

It also becomes obvious that a minimal feedback arc set is not desirable
because not only does the computation time quickly become completely
unfeasible but it is also very likely that some edges that are part of a
minimal feedback arc set can not be broken in reality. In this case,
other edges have to be chosen instead and the time spent before is
wasted.

future
------

I guess a milestone is reached with this :)

I will next look more into the cross case as wookey continues to change
packages so that they have correct multiarch cross build dependencies
for arm64.

I can then build a dependency graph for the cross case, detect cycles
and analyze how to break them.

When this is done, both parts (native and cross) can be combined and one
huge build order can be devised that creates all of Debian from nothing.

cheers, josch


More information about the Debian-bootstrap mailing list