I am developing a conference style application (many-to-many) for video calls this style. The code is available on GitHub but I do not have much node.js experience, hence I
The problem is that you're trying to use a single Peer Connection, but that will only work for a single connected party. You'll have to have an additional peer connection for each other party, and be able to associate websocket messages with users and a particular peer connection. You could do this yourself, or use a library like SimpleWebRTC that manages multiple user sessions for you.
Edit:
A very simplified explanation of how SimpleWebRTC works is this, which is one option for creating a mesh network of connected clients (all clients are connected to each other client):
The critical difference in this architecture to yours is that you are creating a single peer connection, but you need to create, store, and track an array of peer connections, and you have to map your websocket messages to particular peers.
RTCPeerConnection is inherently a one-to-one connection between two clients (peers), so if you want to go beyond that you have to get clever.
The simplest step up is creating a mesh, essentially setting up one PeerConnection to each of the other participants, with all of the participants doing the same thing. You're going to hit a wall in client upload speed pretty fast this way though, usually topping out at 3-4 participants, typically based on the participant with the lowest upload speed.
For groups bigger than that you'd probably want some special setup like an MCU or Router solution, where essentially a special server acts as a super-participant that everyone connects to, which then either mixes together a video of everyone (usually with whomever is speaking as a larger video), or forwards everyone's video to everyone (since upload speed is usually the bottleneck).