I am experiencing a very strange behaviour from VectorAssembler
and I was wondering if anyone else has seen this.
My scenario is pretty straightforward.
There is nothing strange about the output. Your vector seems to have lots of zero elements thus spark
used it’s sparse representation.
To explain further :
It seems like your vector is composed of 18 elements (dimension).
This indices [0,1,6,9,14,17]
from the vector contains non zero elements which are in order [17.0,15.0,3.0,1.0,4.0,2.0]
Sparse Vector representation is a way to save computational space thus easier and faster to compute. More on Sparse representation here.
Now of course you can convert that sparse representation to a dense representation but it comes at a cost.
In case you are interested in getting feature importance, thus I advise you to take a look at this.