The docs for an Embedding Layer in Keras say:
Turns positive integers (indexes) into dense vectors of fixed size. eg.
[[4], [20]]->
Mathematically, the difference is this:
An embedding layer performs select operation. In keras, this layer is equivalent to:
K.gather(self.embeddings, inputs) # just one matrix
A dense layer performs dot-product operation, plus an optional activation:
outputs = matmul(inputs, self.kernel) # a kernel matrix
outputs = bias_add(outputs, self.bias) # a bias vector
return self.activation(outputs) # an activation function
You can emulate an embedding layer with fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. In NLP, the word vocabulary size can be of the order 100k (sometimes even a million). On top of that, it's often needed to process the sequences of words in a batch. Processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. In addition, gather operation itself is faster than matrix dot-product, both in forward and backward pass.