The travelling salesman problem asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?"
If someone hands us a solution, then we can check easily that it does lead to the value it claims and is it a valid ordering. This property makes the TSP a member of the class known as NP, consisting of all problems for which we can check the correctness of a solution in polynomial time. The pair of letters stands for non-deterministic polynomial. The unusual name aside, this is a natural class of problems: when we make a computational request, we ought to be able to check that the result meets our specifications.
Nearest Neighours (NN) is one of the simplest search heuristics out there. It is part of the family of constructive search heuristics, meaning that it gradually builds a route. Starting with one city and stopping only when all cities have been visited. It is greedy in nature; at each step it chooses the location that is closest to the current location.
Some graph related terms for understanding:
A sequence :
A walk with no repeated edges is called a tour.
A walk with no repeated vertices is called a path.
A walk or a tour where
A circuit is path that begins and ends at the same vertex.
Euler tour is defined as a way of traversing tree such that each vertex is added to the tour when we visit it (either moving down from parent vertex or returning from child vertex). We start from root and reach back to root after visiting all vertices.
Hamilton Circuit: a circuit that must pass through each vertex (except the first vertex is obviously visited twice) of a graph once and only once.
In other words, Hamiltonian circuit is a circuit that visits every vertex once with no repeats. Being a circuit, it must start and end at the same vertex.
Now that the terminology was clear, the actual approximation can be talked about.
Input: A complete graph
Output desired: A Hamiltonian cycle of minimum cost.
Our graph is
Since G is complete and hence connected, I was able to find a minimum spanning tree for the input graph. I used the Prim's algorithm to find the minimum spanning tree as using other standard MST algos like Kruskal would have required me to store the weights of
Objective is to find the minimum cost Hamiltonian cycle
The key observation is to observe the satisfaction of the triangle inequality by the edges. This version of TSP is known as METRIC TSP.
So, the following properties are satisfied:
The METRIC TSP problem is also NP-HARD.
Let the MST obtained be
Let the minimum cost Hamiltonian Cycle be $H^$. Let $e_h$ be an edge in $H^$.
Let
Now, we find a Euler tour of
Now, we traverse through the list L and retain only the first occurrence of each vertex in this list, and also retain the last vertex in the list, which is
Now, if L contains the sub-sequence
and so we can conclude that due to all such sub-sequences of type
Also, from the minimum cost Hamiltonian cycle $H^$, if we remove an edge $e_h$, then $[H^-{e_h}]$ is also a spanning tree of graph G and so,
$$c(T)<=c([H^-{e_h}])<=c(H^) \tag{3} $$
Using (1),(2) and (3), we get the following:
$$c(C)<=2c(H^)$$
Hence, cycle C is a factor 2 approximation of
For a visual understanding, the following example can be used:
MST
Euler Tour L of the graph
Constructed graph C of Euler Tour L
In above applied strategies, one can clearly see in the visualization of the solutions that there are a lot of edges crossing each other. Now, since we were tackling metric TSP, it was quite intuitive that if 2 crossing edges could be removed and a 2 new edges be introduced instead, then as Professor Kanan would say “we can do better than the current solution”.
So, 2-opt becomes a great candidate for a local move.
The main idea behind it is to take a route that crosses over itself and reorder it so that it does not.This move deletes two edges, thus breaking the tour into two paths, and then reconnects those paths in the other possible way.
In the below figure, we see edges
Alternative 1: Introduce edges
Clearly, such a move is illegal as the resulting graph is no longer connected.
Alternative 2: Introduce edges
This is redundant as the original configuration is obtained and hence, we essentially made no progress.
Alternative 3: Introduce edges (b,c) and (a,d)
This move seems to remove the 2 edges crossing each other and to introduce 2 new valid edges.
Thus, among all the 3 possible moves, only the 3rd alternative is both progressive and legal.
Implementation wise, I iterate over the initial solutions generated by using the MST 2 factor approximation. For each initial solution, I keep applying 2-opt as a local move as long as there is at least one possible 2-opt move which leads to an improvement over the current configuration. Otherwise, the program gets stuck in a local minima and the program starts the next iteration with a new starting initial solution. If there are several possible 2-opt moves, one can either choose the one which gives the best improvement or the one which is discovered first.
On observing more closely, a 2-opt move is equivalent to reversing the order of the cities between the two edges and hence, in order to maintain the cyclicity of the tour of directed edges, a segment reversal needs to be performed.
For a general cost function, in order to check whether a tour is 2-opt optimal we check ${V}\choose{2}$ie
For implementation specific details, let us assume a fixed orientation of the tour, with each tour edge having a unique representation as a pair
SO, if ordering is
Then new ordering involve reversing sequence starting from
Also, he move can be evaluated as follows:
Original cost of joining the tuple
Suspected cost after 2-opt move for the tuple
So, a 2-opt move will lead to an improvement only if :
bool try_2_opt()
{
int i, j, t1, t2, t3, t4;
int pos1 = -1, pos2 = -1;
double now, poss;
double diff = -1;
double max_diff = -1;
for (t2 = 0; t2 < n; t2++)
{
for (t4 = t2 + 1; t4 < n - (t2 == 0); t4++)
{
t1 = ((t2 - 1) + n) % n;
t3 = (t4 + 1) % n;
now = dist(v[t1], v[t2]) + dist(v[t3], v[t4]);
poss = dist(v[t1], v[t4]) + dist(v[t2], v[t3]);
diff = now - poss;
if (diff > 0)
{
if (diff > max_diff)
{
max_diff = diff;
pos1 = t2;
pos2 = t4;
}
}
}
}
if (pos1 == -1 || pos2 == -1)
{
return false;
}
else
{
// debug(pos1);
// debug(pos2);
tot -= max_diff;
modify_vector(pos1, pos2);
return true;
}
}
Without 2opt heuristic, observe the large number of edges crossing each other
2-opt succeeds in removing all edges crossing each other
Dataset of 200 vertices without 2-opt
Dataset of 200 vertices with 2-opt
The key feature of simulated annealing is that it provides a means to escape local optima by allowing hill-climbing moves (i.e., moves which worsen the objective function value) in hopes of finding a global optimum.
It imitates the annealing process used in metallurgy. Generally, when a substance goes through the process of annealing, it is first heated until it reaches its fusion point to liquefy it, and then slowly cooled down in a control manner until it solids back. The final properties of this substance depend strongly on the cooling schedule applied; if it cools down quickly the resulting substance will be easily broken due to an imperfect structure, if it cools down slowly the resulting structure will be well organized and strong.
When solving an optimization problem using simulated annealing the structure of the substance represents a codified solution of the problem, and the temperature is used to determined how and when new solutions are perturbed and accepted. The algorithm is basically a three steps process: perturb the solution, evaluate the quality of the solution, and accept the solution if it is better than the new one.
Example taken from paper ' Practical considerations of Simulated Annealing' by Ledesma and Sanchez
The method of simulated annealing can be easily understood by observing the above figure which shows a hermetic box with an internal uneven surface (with peaks and valleys), and a ball resting on this surface. The objective is to move the ball to a position as close as possible to the bottom of the box. At the beginning of the process the temperature is high and strong perturbations are applied to the box allowing the ball to easily jump over high peaks in search of the bottom of the box. Because of the high energy applied, it is possible for the ball to go down or up easily. As time goes by, the temperature decreases and the ball has less energy to jump high peaks. When the temperature has decreased to the point when the ball is able to jump only very small peaks, the ball should be hopefully very close to the bottom of the box and the process is completed.
Annealing schedule.
Determining the initial value of T requires experimentation. I first tried to determine the range of values of ∆E that will be encountered from move to move. I observed the values of
There are two possible typical cooling schedules in the literature: exponential and linear.
In a typical exponential cooling, the process spends little time at high temperatures, and as the temperature decreases more and more time is spend at each temperature, allowing the algorithm to refine very well the solution found at high temperatures.
On linear cooling, the temperature decreases linearly as the time increases, thus, the algorithm spends the same amount of time at each temperature.
I have used an exponential decrease in temperature with the new_temp = (TFACTR) * current temperature.
After much experimentation, I set TFACTR as 0.9.
How exactly does T govern the acceptance of a particular move ?
If S is the original candidate solution and R its newly tweaked child.
If R is better than S, we’ll always replace S with R as usual.
But if R is worse than S, we may still replace S with R with a certain probability
This equation is interesting in the following ways.
- Note that the fraction is negative because R is worse than S. First, if R is much worse than S, the fraction is larger, and so the probability is close to 0. If R is very close to S, the probability is close to 1. Thus if R isn’t much worse than S, we’ll still select R with a reasonable probability.
- If current prospected neighbour is worse and T is close to 0, the fraction is again a large number, and so the probability for a degrading move is close to 0. So, the probability of making a degrading move towards the end of the journey is practically null.
- If current prospected neighbour is worse and T is high, the probability of making a degrading move is close to 1.
In cases specific to the TSP assignment on Coursera, I found the delta(cost) to be quite closely related to the order of the magnitude of the coordinates. Eg: T will be higher if the coordinates are ~
A thorough mention of this can already be found in the LS NOTES