On the cost of change and gcd grinding (final)
I find myself stuck with an algoritm that should be easy to implement, but that has denied me the opportunity to write it in beautiful terms.
Given a collection of integers k1...kn greater than 1 and pair-perpendicular to each other, I want to break them into smaller pair-perpendicular factors if a little bird tells me that there is a certain integer m such that for at least one i, gcd(ki, m) > 1.
Well sure enough one could do the easy thing saying g = gcd(ki, m) and then obviously ki = g * ki', thus you'd replace ki with g and ki'. But, observation:
Given a collection of integers k1...kn greater than 1 and pair-perpendicular to each other, I want to break them into smaller pair-perpendicular factors if a little bird tells me that there is a certain integer m such that for at least one i, gcd(ki, m) > 1.
Well sure enough one could do the easy thing saying g = gcd(ki, m) and then obviously ki = g * ki', thus you'd replace ki with g and ki'. But, observation:
What if g and ki' are not perpendicular?
This happens in the presence of squareful integers (as opposed to square-free integers). For example, it occurs when trying to grind 14175 down using 105 as a factor. First, gcd(14175, 105) = 105, thus you find that 14175 = 105 * 135. You could stop there, but obviously 105 is not perpendicular to 135.
If you want to take things further, you could say well gcd(135, 105) = 15. Now you need to refine both 135 and 105, obtaining 135 = 15 * 9 and 105 = 15 * 7. So now you have obtained new grinders, namely 7 and 9. In short, 14175 = 135 * 105 = 15 * 15 * 9 * 7.
Of course you are still not done. Because if you consider gcd(15, 9) = 3, you find another grinder and after some work you obtain 14175 = 3 * 3 * 3 * 3 * 5 * 5 * 7. Finally, you are done because those factors as a set are pair-perpendicular.
Implementing this is irritating because of the type of recursion necessary. The hackish approach would indicate heavy use of indices, temporary collections and other stuff. But going down that road thinking that refactoring will be able to clean it up later has failed for me. I can't even write a functioning algorithm in hackish terms I can stand.
Most of the problem for me revolves around the scheduling of grinders. In short, I need a queue that also acts like a set and that is thread-safe. I have implemented this kind of thing before in my reference finder, but since the reference finder had a more clear separation between the visitor and traverse processes, each execution context did not have a large number of names being referenced, so modifying the queue of objects scheduled to be visited while the queue was being depleted was not confusing neither it was too problematic from a thread-safe point of view. In short, proper use of an ordered collection was good enough (see below for links) .
However, this gcd grinding process does have a large number of names in each execution context already. Adding the following complexities on top of it makes it unmanageable:
But, observation: developers' abilities are usually measured by how much they can brute-force progress in spite of a) without resorting to b). While this rewards efficient thrashing, it also leads to the size of egos being directly proportional to those of scratch rams, both of these being directly proportional to the apparent short-term progress, to the degree to which produced code is an ugly mass of unmaintainable moldy rotten spaghetti, to the increasing disparity of perceived abilities between developers in the same team, to the resulting frictions between team members, to the increasing proportion of brute-forcers in the team as chunkers tend to leave, and to the overall cost of change measured over time.
In particular, the more this happens, the more the code reads like commandments given by a god-like figure. Evidently, the stress caused by the excess of referenced names in the execution context is an invitation to constraining and locking everything down --- as if all objects had to stand perfectly still unless ordered to do something, because otherwise chaos would take over and make brute-forcing impossible.
In my experience, the amount of stress is directly proportional to how much the code gives the impression of somebody shouting orders on top of increasingly louder white noise, while getting closer and closer to a major emotional breakdown.
Since we are not god-like figures with infinitely big scratch rams, chaos eventually takes over and after that any change made to the code leads to the introduction of more bugs than the ones being removed.
Hence, as a humble mere mortal I decided that I had to avoid all that by using classes to compartmentalize the names being referenced at each execution context --- chunking, in other words. The first thing I'm going to chunk out of the mess right now is that scheduling queue. Then things should be easier, and I should be able to write a working gcd grinder within the limits of everybody's scratch ram, namely: 7 ± 2 named chunks.
It is really unfortunate that chunking is usually seen as a sign of weakness (because of the assumed inability to make progress like others by e.g. brute-forcing the situation), and as a sign of time-wasting. Because in general terms, brute-forcing problems past the size of our scratch rams will get us nowhere.
Think of an ant colony: does the queen have to give out precise orders to all ants? In fact, does it have to micromanage by providing control-freak-style orders to every ant while at the same time being aware of what every ant is doing?
Note that the job of the colony is achieved without needing a context in which all the aggregate knowledge of every ant can be referenced using some labeling convention. The colony's self-awareness does not have to be omnipresent. Same thing with your own body which is supporting your act of reading this.
There is not even one example of a complex system needing an omnipresent self-awareness context to evolve. Hence, there is no need neither excuse for brute-forcing code into existence.
* Examine the reference finder's breadth first traverse implementation by downloading the source code from here (and download RAR if you don't have it already), or by going to the Cincom public repository and loading packages Distinctions and ReferenceFinder.
This happens in the presence of squareful integers (as opposed to square-free integers). For example, it occurs when trying to grind 14175 down using 105 as a factor. First, gcd(14175, 105) = 105, thus you find that 14175 = 105 * 135. You could stop there, but obviously 105 is not perpendicular to 135.
If you want to take things further, you could say well gcd(135, 105) = 15. Now you need to refine both 135 and 105, obtaining 135 = 15 * 9 and 105 = 15 * 7. So now you have obtained new grinders, namely 7 and 9. In short, 14175 = 135 * 105 = 15 * 15 * 9 * 7.
Of course you are still not done. Because if you consider gcd(15, 9) = 3, you find another grinder and after some work you obtain 14175 = 3 * 3 * 3 * 3 * 5 * 5 * 7. Finally, you are done because those factors as a set are pair-perpendicular.
Implementing this is irritating because of the type of recursion necessary. The hackish approach would indicate heavy use of indices, temporary collections and other stuff. But going down that road thinking that refactoring will be able to clean it up later has failed for me. I can't even write a functioning algorithm in hackish terms I can stand.
Most of the problem for me revolves around the scheduling of grinders. In short, I need a queue that also acts like a set and that is thread-safe. I have implemented this kind of thing before in my reference finder, but since the reference finder had a more clear separation between the visitor and traverse processes, each execution context did not have a large number of names being referenced, so modifying the queue of objects scheduled to be visited while the queue was being depleted was not confusing neither it was too problematic from a thread-safe point of view. In short, proper use of an ordered collection was good enough (see below for links) .
However, this gcd grinding process does have a large number of names in each execution context already. Adding the following complexities on top of it makes it unmanageable:
- Thread-safe access to the grinder queue
- Indexed access to the grinder queue
- Avoiding adding repeated grinders to the queue
- Avoiding adding a repeated grinder that has already been taken from the queue
- Ability to skip a grinder being processed and jumping to the next grinder from any step in the grinding process
But, observation: developers' abilities are usually measured by how much they can brute-force progress in spite of a) without resorting to b). While this rewards efficient thrashing, it also leads to the size of egos being directly proportional to those of scratch rams, both of these being directly proportional to the apparent short-term progress, to the degree to which produced code is an ugly mass of unmaintainable moldy rotten spaghetti, to the increasing disparity of perceived abilities between developers in the same team, to the resulting frictions between team members, to the increasing proportion of brute-forcers in the team as chunkers tend to leave, and to the overall cost of change measured over time.
In particular, the more this happens, the more the code reads like commandments given by a god-like figure. Evidently, the stress caused by the excess of referenced names in the execution context is an invitation to constraining and locking everything down --- as if all objects had to stand perfectly still unless ordered to do something, because otherwise chaos would take over and make brute-forcing impossible.
In my experience, the amount of stress is directly proportional to how much the code gives the impression of somebody shouting orders on top of increasingly louder white noise, while getting closer and closer to a major emotional breakdown.
Since we are not god-like figures with infinitely big scratch rams, chaos eventually takes over and after that any change made to the code leads to the introduction of more bugs than the ones being removed.
Hence, as a humble mere mortal I decided that I had to avoid all that by using classes to compartmentalize the names being referenced at each execution context --- chunking, in other words. The first thing I'm going to chunk out of the mess right now is that scheduling queue. Then things should be easier, and I should be able to write a working gcd grinder within the limits of everybody's scratch ram, namely: 7 ± 2 named chunks.
It is really unfortunate that chunking is usually seen as a sign of weakness (because of the assumed inability to make progress like others by e.g. brute-forcing the situation), and as a sign of time-wasting. Because in general terms, brute-forcing problems past the size of our scratch rams will get us nowhere.
Think of an ant colony: does the queen have to give out precise orders to all ants? In fact, does it have to micromanage by providing control-freak-style orders to every ant while at the same time being aware of what every ant is doing?
Note that the job of the colony is achieved without needing a context in which all the aggregate knowledge of every ant can be referenced using some labeling convention. The colony's self-awareness does not have to be omnipresent. Same thing with your own body which is supporting your act of reading this.
There is not even one example of a complex system needing an omnipresent self-awareness context to evolve. Hence, there is no need neither excuse for brute-forcing code into existence.
* Examine the reference finder's breadth first traverse implementation by downloading the source code from here (and download RAR if you don't have it already), or by going to the Cincom public repository and loading packages Distinctions and ReferenceFinder.

No comments:
Post a Comment