Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in expanding cluster? #52

Open
carlos-verdes opened this issue Sep 1, 2016 · 6 comments
Open

Bug in expanding cluster? #52

carlos-verdes opened this issue Sep 1, 2016 · 6 comments

Comments

@carlos-verdes
Copy link

Hi, in GDBScan, the last line of the expand method says "if new point is a cluster"... but in the "if clause" the point is evaluated with the current neighbourhood... not with the new neighbours.

    neighbours.foldLeft(neighbours) {
      case (neighbourhood, neighbour @ Point(row)) =>
        // if not visited yet, create a new neighbourhood
        val newNeighbours = if (!(visited contains neighbour)) {
          visited add neighbour
          getNeighbours(neighbour, points.filterNot(_.row == neighbour.row))
        } else {
          Seq.empty
        }
        // Add to cluster if neighbour point isn't assigned to a  cluster yet
        if (!(clustered contains neighbour)) {
          cluster add neighbour
          clustered add neighbour
        }
        // if the neighbour point is a cluster, join the neighbourhood
        if (isCorePoint(neighbour, neighbourhood)) neighbourhood ++ newNeighbours else neighbourhood
    }

I think the last line should be:

 // if the neighbour point is a cluster, join the neighbourhood
   if (isCorePoint(neighbour, newNeighbours)) neighbourhood ++ newNeighbours else neighbourhood

If you agree I can do a PR.

@carlos-verdes
Copy link
Author

@jasonbaldridge @gabeos please could you take a look into this?

Any doubt don't hesitate to ask please.

@muuki88
Copy link
Contributor

muuki88 commented Sep 25, 2016

It's been a very long time since I wrote this. If you could provide a failing test for this to verify your assumption then accepting a PR would be a lot easier.

@btafel
Copy link

btafel commented May 13, 2020

hey @carlos-verdes ! I know this is quite old, but... did you manage to verify that issue?

I'm running into another issue now and it might be related.

At some point, the append neighbourhood ++ newNeighbours throws

Java.lang.IllegalArgumentException
            at scala.collection.immutable.VectorPointer$class.gotoFreshPosWritable0(Vector.scala:1195)
            at scala.collection.immutable.Vector.gotoFreshPosWritable0(Vector.scala:62)
            at scala.collection.immutable.VectorPointer$class.gotoFreshPosWritable1(Vector.scala:1204)
......

Scala: 2.11

For what I could tell at looking at the Vector code it is running out of 'levels'

Any clues?? @muuki88

Thanks!

@muuki88
Copy link
Contributor

muuki88 commented May 13, 2020

Thanks for reporting this @btafel 😄 . Unfortunately I can't help in any regards as I haven't used this code for 7 years now. Also I mostly work on Scala 2.13 projects.

@btafel
Copy link

btafel commented May 13, 2020

lol it makes sense! I know it is old - very old - but I have to say that it works :) at least in most cases. I was inspired by this article https://www.oreilly.com/content/clustering-geolocated-data-using-spark-and-dbscan/ and got it running smoothly.

I'm also trying another approach using UserDefinedAggregateFunction and Smile

@muuki88
Copy link
Contributor

muuki88 commented May 13, 2020

Wow! I'm really happy that this is still in use and useful to others 😍
Thanks for the hints to other libraries 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants