Creating sets of similar elements in a 2D array

后端 未结 4 1275
渐次进展
渐次进展 2020-12-28 23:51

I am trying to solve a problem that is based on a 2D array. This array contains different kinds of elements (from a total of 3 possible kinds). Lets assume the kind as X, Y,

相关标签:
4条回答
  • 2020-12-29 00:27

    In your situation, I would rely, at least, on two different arrays:

    Array1 (sets) -> all the sets and the associated list of points. Main indices: set names.
    Array2 (setsDef) -> type of each set ("X", "Y" or "Z"). Main indices: type names.
    

    It might be possible to create more supporting arrays like, for example, one including the minimum/maximum X/Y values for each set to speed up the analysis (although it would be pretty quick anyway, as shown below).

    You are not mentioning any programming language, but I include a sample (C#) code because it is the best way to explain the point. Please, don't understand it as a suggestion of the best way to proceed (personally, I don't like Dictionaries/Lists too much; although think that do provide a good graphical way to show an algorithm, even for unexperienced C# users). This code only intends to show a data storage/retrieval approach; the best way to achieve the optimal performance would depend upon the target language and further issues (e.g., dataset size) and is something you have to take care of.

    Dictionary<string, List<Point>> sets = new Dictionary<string, List<Point>>(); //All sets and the associated list of points
    Dictionary<string, List<string>> setsDef = new Dictionary<string, List<string>>(); //Array indicating the type of information stored in each set (X or Y)
    
    List<Point> temp0 = new List<Point>();
    temp0.Add(new Point(0, 0));
    temp0.Add(new Point(0, 1));
    sets.Add("Set1", temp0);
    List<String> tempX = new List<string>();
    tempX.Add("Set1");
    
    temp0 = new List<Point>();
    temp0.Add(new Point(0, 2));
    temp0.Add(new Point(1, 2));
    sets.Add("Set2", temp0);
    List<String> tempY = new List<string>();
    tempY.Add("Set2");
    
    setsDef.Add("X", tempX);
    setsDef.Add("Y", tempY);
    
    
    //-------- TEST
    //I have a new Y value which is 2,2
    Point targetPoint = new Point(2, 2);
    string targetSet = "Y";
    
    //I go through all the Y sets
    List<string> targetSets = setsDef[targetSet];
    
    bool alreadyThere = false;
    Point candidatePoint;
    string foundSet = "";
    foreach (string set in targetSets) //Going through all the set names stored in setsDef for targetSet
    {
        List<Point> curPoints = sets[set];
        foreach (Point point in curPoints) //Going through all the points in the given set
        {
            if (point == targetPoint)
            {
                //Already-stored point and thus the analysis will be stopped
                alreadyThere = true;
                break;
            }
            else if (isSurroundingPoint(point, targetPoint))
            {
                //A close point was found and thus the set where the targetPoint has to be stored
                candidatePoint = point;
                foundSet = set;
                break;
            }
        }
        if (alreadyThere || foundSet != "")
        {
            break;
        }
    }
    
    if (!alreadyThere)
    {
        if (foundSet != "")
        {
            //Point added to an existing set
            List<Point> curPoints = sets[foundSet];
            curPoints.Add(targetPoint);
            sets[foundSet] = curPoints;
        }
        else
        {
            //A new set has to be created
            string newName = "New Set";
            temp0 = new List<Point>();
            temp0.Add(targetPoint);
            sets.Add(newName, temp0);
    
            targetSets.Add(newName);
            setsDef[targetSet] = targetSets;
        }
    }
    

    Where isSurroundingPoint is a function checking whether both points are close one to the other:

    private bool isSurroundingPoint(Point point1, Point point2)
    {
        bool isSurrounding = false;
        if (point1.X == point2.X || point1.X == point2.X + 1 || point1.X == point2.X - 1)
        {
            if (point1.Y == point2.Y || point1.Y == point2.Y + 1 || point1.Y == point2.Y - 1)
            {
                isSurrounding = true;
            }
        }
        return isSurrounding;
    }
    
    0 讨论(0)
  • 2020-12-29 00:39

    [EDIT 5/8/2013: Fixed time complexity. (O(a(n)) is essentially constant time!)]

    In the following, by "connected component" I mean the set of all positions that are reachable from each other by a path that allows only horizontal, vertical or diagonal moves between neighbouring positions having the same kind of element. E.g. your example {(0,1), (1,1), (2,2), (2,3), (1,4)} is a connected component in your example input. Each position belongs to exactly one connected component.

    We will build a union/find data structure that will be used to give every position (x, y) a numeric "label" having the property that if and only if any two positions (x, y) and (x', y') belong to the same component then they have the same label. In particular this data structure supports three operations:

    • set(x, y, i) will set the label for position (x, y) to i.
    • find(x, y) will return the label assigned to the position (x, y).
    • union(Z), for some set of labels Z, will combine all labels in Z into a single label k, in the sense that future calls to find(x, y) on any position (x, y) that previously had a label in Z will now return k. (In general k will be one of the labels already in Z, though this is not actually important.) union(Z) also returns the new "master" label, k.

    If there are n = width * height positions in total, this can be done in O(n*a(n)) time, where a() is the extremely slow-growing inverse Ackermann function. For all practical input sizes, this is the same as O(n).

    Notice that whenever two vertices are adjacent to each other, there are four possible cases:

    1. One is above the other (connected by a vertical edge)
    2. One is to the left of the other (connected by a horizontal edge)
    3. One is above and to the left of the other (connected by a \ diagonal edge)
    4. One is above and to the right of the other (connected by a / diagonal edge)

    We can use the following pass to determine labels for each position (x, y):

    • Set nextLabel to 0.
    • For each row y in increasing order:
      • For each column x in increasing order:
        • Examine the W, NW, N and NE neighbours of (x, y). Let Z be the subset of these 4 neighbours that are of the same kind as (x, y).
        • If Z is the empty set, then we tentatively suppose that (x, y) starts a brand new component, so call set(x, y, nextLabel) and increment nextLabel.
        • Otherwise, call find(Z[i]) on each element of Z to find their labels, and call union() on this set of labels to combine them together. Assign the new label (the result of this union() call) to k, and then also call set(x, y, k) to add (x, y) to this component.

    After this, calling find(x, y) on any position (x, y) effectively tells you which component it belongs to. If you want to be able to quickly answer queries of the form "Which positions belong to the connected component containing position (x, y)?" then create a hashtable of lists posInComp and make a second pass over the input array, appending each (x, y) to the list posInComp[find(x, y)]. This can all be done in linear time and space. Now to answer a query for some given position (x, y), simply call lab = find(x, y) to find that position's label, and then list the positions in posInComp[lab].

    To deal with "too-small" components, just look at the size of posInComp[lab]. If it's 1 or 2, then (x, y) does not belong to any "large-enough" component.

    Finally, all this work effectively takes linear time, so it will be lightning fast unless your input array is huge. So it's perfectly reasonable to recompute it from scratch after modifying the input array.

    0 讨论(0)
  • 2020-12-29 00:40

    I wrote something to find objects of just one type for another SO question. The example below adds two more types. Any re-iteration would examine the whole list again. The idea is to process the list of points for each type separately. The function solve groups any connected points and removes them from the list before enumerating the next group. areConnected checks the relationship between the points' coordinates since we are only testing points of one type. In this generalized version, the types (a b c) could be anything (strings, numbers, tuples, etc.), as long as they match.

    btw - here's a link to a JavaScript example of j_random_hacker's terrific algorithm: http://jsfiddle.net/groovy/fP5kP/

    Haskell code:

    import Data.List (elemIndices, delete) 
    
    example = ["xxyyyz"
              ,"xyyzzz"
              ,"yxxzzy"
              ,"yyxzxy"
              ,"xyzxyy"
              ,"xzxxzz"
              ,"xyzyyz"
              ,"xyzxyy"]
    
    objects a b c ws = [("X",solve xs []),("Y",solve ys []),("Z",solve zs [])] where
      mapIndexes s = 
        concatMap (\(y,xs)-> map (\x->(y,x)) xs) $ zip [0..] (map (elemIndices s) ws)
      [xs,ys,zs] = map mapIndexes [a,b,c]
      areConnected (y,x) (y',x') = abs (x-x') < 2 && abs (y-y') < 2
      solve []     r = r
      solve (x:xs) r =
        let r' = solve' xs [x]
        in solve (foldr delete xs r') (if null (drop 2 r') then r else r':r)
      solve' vs r =
        let ys = filter (\y -> any (areConnected y) r) vs
        in if null ys then r else solve' (foldr delete vs ys) (ys ++ r)
    

    Sample output:

    *Main> objects 'x' 'y' 'z' example
    [("X",[[(7,0),(6,0),(5,0),(4,0)]
          ,[(3,4),(5,2),(5,3),(4,3),(2,2),(3,2),(2,1),(0,1),(1,0),(0,0)]])
    ,("Y",[[(7,5),(6,4),(7,4),(6,3)],[(4,4),(4,5),(3,5),(2,5)]
          ,[(4,1),(3,0),(3,1),(0,4),(2,0),(0,3),(1,1),(1,2),(0,2)]])
    ,("Z",[[(5,5),(6,5),(5,4)]
          ,[(7,2),(6,2),(5,1),(4,2),(3,3),(1,3),(2,3),(2,4),(1,4),(1,5),(0,5)]])]
    (0.02 secs, 1560072 bytes)
    
    0 讨论(0)
  • 2020-12-29 00:42

    You may want to check out region growing algorithms, which are used for image segmentation. These algorithms start from a seed pixel and grow a contiguous region where all the pixels in the region have some property.

    In your case adjacent 'pixels' are in the same image segment if they have the same label (ie, kind of element X, Y or Z)

    0 讨论(0)
提交回复
热议问题