You are going on a one-way indirect flight trip that includes billions an unknown very large number of transfers.
Note that if the task were only to determine the source and destination airports (instead of reconstructing the whole trip), the puzzle would probably become more interesting.
Namely, assuming that airport codes are given as integers, the source and destination airports can be determined using O(1) passes of the data and O(1) additional memory (i.e. without resorting to hashtables, sorting, binary search, and the like).
Of course, once you find the source, it also becomes a trivial matter to index and traverse the full route, but from that point on the whole thing will require at least O(n) additional memory anyway (unless you can sort the data in place, which, by the way, allows to solve the original task in O(n log n) time with O(1) additional memory)
Summary: below a single-pass algorithm is given. (I.e., not just linear, but looks each ticket exactly once, which of course is optimal number of visits per ticket). I put the summary because there are many seemingly equivalent solutions and it would be hard to spot why I added another one. :)
I was actually asked this question in an interview. The concept is extremely simple: each ticket is a singleton list, with conceptually two elements, src and dst.
We index each such list in a hashtable using its first and last elements as keys, so we can find in O(1) if a list starts or ends at a particular element (airport). For each ticket, when we see it starts where another list ends, just link the lists (O(1)). Similarly, if it ends where another list starts, another list join. Of course, when we link two lists, we basically destroy the two and obtain one. (The chain of N tickets will be constructed after N-1 such links).
Care is needed to maintain the invariant that the hashtable keys are exactly the first and last elements of the remaining lists.
All in all, O(N).
And yes, I answered that on the spot :)
Edit Forgot to add an important point. Everyone mentions two hashtables, but one does the trick as well, because the algorithms invariant includes that at most one ticket list starts or begins in any single city (if there are two, we immediately join the lists at that city, and remove that city from the hashtable). Asymptotically there is no difference, it's just simpler this way.
Edit 2 Also of interest is that, compared to solutions using 2 hashtables with N entries each, this solution uses one hashtable with at most N/2 entries (which happens if we see the tickets in an order of, say, 1st, 3rd, 5th, and so on). So this uses about half memory as well, apart from being faster.
Let's forget the data structures and graphs for a moment.
First I need to point out that everybody made an assumption that there are no loops. If the route goes through one airport twice than it's a much larger problem.
But let's keep the assumption for now.
The input data is in fact an ordered set already. Every ticket is an element of the relation that introduces order to a set of airports. (English is not my mother tongue, so these might not be correct math terms)
Every ticket holds information like this: airportX < airportY
, so while doing one pass through the tickets an algorithm can recreate an ordered list starting from just any airport.
Now let's drop the "linear assumption". No order relation can be defined out of that kind of stuff. The input data has to be treated as production rules for a formal grammar, where grammar's vocabulary set is a set of ariport names. A ticket like that:
src: A
dst: B
is in fact a pair of productions:
A->AB
B->AB
from which you only can keep one.
Now you have to generate every possible sentence, but you can use every production rule once. The longest sentence that uses every its production only once is a correct solution.
I provide here a more general solution to the problem:
You can stop several times in the same airport, but you have to use every ticket exactly 1 time
You can have more than 1 ticket for each part of your trip.
Each ticket contains src and dst airport.
All the tickets you have are randomly sorted.
You forgot the original departure airport (very first src) and your destination (last dst).
My method returns list of cities (vector) that contain all specified cities, if such chain exists, and empty list otherwise. When there are several ways to travel the cities, the method returns lexicographically smallest list.
#include<vector>
#include<string>
#include<unordered_map>
#include<unordered_set>
#include<set>
#include<map>
using namespace std;
struct StringPairHash
{
size_t operator()(const pair<string, string> &p) const {
return hash<string>()(p.first) ^ hash<string>()(p.second);
}
};
void calcItineraryRec(const multimap<string, string> &cities, string start,
vector<string> &itinerary, vector<string> &res,
unordered_set<pair<string, string>, StringPairHash> &visited, bool &found)
{
if (visited.size() == cities.size()) {
found = true;
res = itinerary;
return;
}
if (!found) {
auto pos = cities.equal_range(start);
for (auto p = pos.first; p != pos.second; ++p) {
if (visited.find({ *p }) == visited.end()) {
visited.insert({ *p });
itinerary.push_back(p->second);
calcItineraryRec(cities, p->second, itinerary, res, visited, found);
itinerary.pop_back();
visited.erase({ *p });
}
}
}
}
vector<string> calcItinerary(vector<pair<string, string>> &citiesPairs)
{
if (citiesPairs.size() < 1)
return {};
multimap<string, string> cities;
set<string> uniqueCities;
for (auto entry : citiesPairs) {
cities.insert({ entry });
uniqueCities.insert(entry.first);
uniqueCities.insert(entry.second);
}
for (const auto &startCity : uniqueCities) {
vector<string> itinerary;
itinerary.push_back(startCity);
unordered_set<pair<string, string>, StringPairHash> visited;
bool found = false;
vector<string> res;
calcItineraryRec(cities, startCity, itinerary, res, visited, found);
if (res.size() - 1 == cities.size())
return res;
}
return {};
}
Here is an example of usage:
int main()
{
vector<pair<string, string>> cities = { {"Y", "Z"}, {"W", "X"}, {"X", "Y"}, {"Y", "W"}, {"W", "Y"}};
vector<string> itinerary = calcItinerary(cities); // { "W", "X", "Y", "W", "Y", "Z" }
// another route is possible {W Y W X Y Z}, but the route above is lexicographically smaller.
cities = { {"Y", "Z"}, {"W", "X"}, {"X", "Y"}, {"W", "Y"} };
itinerary = calcItinerary(cities); // empty, no way to travel all cities using each ticket exactly one time
}
It seems to me like a graph-based approach is based here.
Each airport is a node, each ticket is an edge. Let's make every edge undirected for now.
In the first stage you are building the graph: for each ticket, you lookup the source and destination and build an edge between them.
Now that the graph is constructed, we know that it is acyclical and that there is a single path through it. After all, you only have tickets for trips you took, and you never visited the same airport once.
In the second stage, you are searching the graph: pick any node, and initiate a search in both directions until you find you cannot continue. These are your source and destination.
If you need to specifically say which was source and which was destination, add a directory property to each edge (but keep it an undirected graph). Once you have the candidate source and destination, you can tell which is which based on the edge connected to them.
The complexity of this algorithm would depend on the time it takes to lookup a particular node. If you could achieve an O(1), then the time should be linear. You have n tickets, so it takes you O(N) steps to build the graph, and then O(N) to search and O(N) to reconstruct the path. Still O(N). An adjacency matrix will give you that.
If you can't spare the space, you could do a hash for the nodes, which would give you O(1) under optimal hashing and all that crap.
This is the simple case of a single path state machine matrix. Sorry for the pseudo-code being in C# style, but it was easier to express the idea with objects.
First, construct a turnpike matrix. Read my description of what a turnpike matrix is (don't bother with the FSM answer, just the explanation of a turnpike matrix) at What are some strategies for testing large state machines?.
However, the restrictions you describe make the case a simple single path state machine. It is the simplest state machine possible with complete coverage.
For a simple case of 5 airports,
vert nodes=src/entry points,
horiz nodes=dst/exit points.
A1 A2 A3 A4 A5
A1 x
A2 x
A3 x
A4 x
A5 x
Notice that for each row, as well as for each column, there should be no more than one transition.
To get the path of the machine, you would sort the matrix into
A1 A2 A3 A4 A5
A2 x
A1 x
A3 x
A4 x
A5 x
Or sort into a diagonal square matrix - an eigen vector of ordered pairs.
A1 A2 A3 A4 A5
A2 x
A5 x
A1 x
A3 x
A4 x
where the ordered pairs are the list of tickets:
a2:a1, a5:a2, a1:a3, a3:a4, a4:a5.
or in more formal notation,
<a2,a1>, <a5,a2>, <a1,a3>, <a3,a4>, <a4,a5>.
Hmmm .. ordered pairs huh? Smelling a hint of recursion in Lisp?
<a2,<a1,<a3,<a4,a5>>>>
There are two modes of the machine,
I am presuming your question is about trip reconstruction. So, you pick one ticket after another randomly from that pile of tickets.
We presume the ticket pile is of indefinite size.
tak mnx cda
bom 0
daj 0
phi 0
Where 0 value denotes unordered tickets. Let us define unordered ticket as a ticket where its dst is not matched with the src of another ticket.
The following next ticket finds that mnx(dst) = kul(src) match.
tak mnx cda kul
bom 0
daj 1
phi 0
mnx 0
At any moment you pick the next ticket, there is a possibility that it connects two sequential airports. If that happen, you create a cluster node out of that two nodes:
<bom,tak>, <daj,<mnx,kul>>
and the matrix is reduced,
tak cda kul
bom 0
daj L1
phi 0
where
L1 = <daj,<mnx,kul>>
which is a sublist of the main list.
Keep on picking the next random tickets.
tak cda kul svn xml phi
bom 0
daj L1
phi 0
olm 0
jdk 0
klm 0
Match either existent.dst to new.src
or existent.src to new.dst:
tak cda kul svn xml
bom 0
daj L1
olm 0
jdk 0
klm L2
<bom,tak>, <daj,<mnx,kul>>, <<klm,phi>, cda>
The above topological exercise is for visual comprehension only. The following is the algorithmic solution.
The concept is to cluster ordered pairs into sublists to reduce the burden on the hash structures we will use to house the tickets. Gradually, there will be more and more pseudo-tickets (formed from merged matched tickets), each containing a growing sublist of ordered destinations. Finally, there will remain one single pseudo-ticket containing the complete itinerary vector in its sublist.
As you see, perhaps, this is best done with Lisp.
However, as an exercise of linked lists and maps ...
Create the following structures:
class Ticket:MapEntry<src, Vector<dst> >{
src, dst
Vector<dst> dstVec; // sublist of mergers
//constructor
Ticket(src,dst){
this.src=src;
this.dst=dst;
this.dstVec.append(dst);
}
}
class TicketHash<x>{
x -> TicketMapEntry;
void add(Ticket t){
super.put(t.x, t);
}
}
So that effectively,
TicketHash<src>{
src -> TicketMapEntry;
void add(Ticket t){
super.put(t.src, t);
}
}
TicketHash<dst>{
dst -> TicketMapEntry;
void add(Ticket t){
super.put(t.dst, t);
}
}
TicketHash<dst> mapbyDst = hash of map entries(dst->Ticket), key=dst
TicketHash<src> mapbySrc = hash of map entries(src->Ticket), key=src
When a ticket is randomly picked from the pile,
void pickTicket(Ticket t){
// does t.dst exist in mapbyDst?
// i.e. attempt to match src of next ticket to dst of an existent ticket.
Ticket zt = dstExists(t);
// check if the merged ticket also matches the other end.
if(zt!=null)
t = zt;
// attempt to match dst of next ticket to src of an existent ticket.
if (srcExists(t)!=null) return;
// otherwise if unmatched either way, add the new ticket
else {
// Add t.dst to list of existing dst
mapbyDst.add(t);
mapbySrc.add(t);
}
}
Check for existent dst:
Ticket dstExists(Ticket t){
// find existing ticket whose dst matches t.src
Ticket zt = mapbyDst.getEntry(t.src);
if (zt==null) return false; //no match
// an ordered pair is matched...
//Merge new ticket into existent ticket
//retain existent ticket and discard new ticket.
Ticket xt = mapbySrc.getEntry(t.src);
//append sublist of new ticket to sublist of existent ticket
xt.srcVec.join(t.srcVec); // join the two linked lists.
// remove the matched dst ticket from mapbyDst
mapbyDst.remove(zt);
// replace it with the merged ticket from mapbySrc
mapbyDst.add(zt);
return zt;
}
Ticket srcExists(Ticket t){
// find existing ticket whose dst matches t.src
Ticket zt = mapbySrc.getEntry(t.dst);
if (zt==null) return false; //no match
// an ordered pair is matched...
//Merge new ticket into existent ticket
//retain existent ticket and discard new ticket.
Ticket xt = mapbyDst.getEntry(t.dst);
//append sublist of new ticket to sublist of existent ticket
xt.srcVec.join(t.srcVec); // join the two linked lists.
// remove the matched dst ticket from mapbyDst
mapbySrc.remove(zt);
// replace it with the merged ticket from mapbySrc
mapbySrc.add(zt);
return zt;
}
Check for existent src:
Ticket srcExists(Ticket t){
// find existing ticket whose src matches t.dst
Ticket zt = mapbySrc.getEntry(t.dst);
if (zt == null) return null;
// if an ordered pair is matched
// remove the dst from mapbyDst
mapbySrc.remove(zt);
//Merge new ticket into existent ticket
//reinsert existent ticket and discard new ticket.
mapbySrc.getEntry(zt);
//append sublist of new ticket to sublist of existent ticket
zt.srcVec.append(t.srcVec);
return zt;
}
I have a feeling the above has quite some typos, but the concept should be right. Any typo found, someone could help correct it for, plsss.