So now I am trying to understand from architecture point of view what does number of threads means here?
Each thread has its own stack memory, program counter (like a pointer to what instruction executes next) and other local resources. Swapping them out hurts latency for a single task. The benefit is that while one thread is idle (usually when waiting for i/o) another thread can get work done. Also if there are multiple processors available, they can run in parallel if there is no resource and/or locking contention between the tasks.
And how to decide what is the optimal number of threads I should choose?
The trade-off between swap-price versus the opportunity to avoid idle time depends on the little details of what your task looks like (how much i/o, and when, with how much work between i/o, using how much memory to complete). Experimentation is always the key.
And if I am using more number of threads then what will happen?
There will usually be linear-ish growth in throughput at first, then a relative flat part, then a drop (which may be quite steep). Each system is different.