I have external .DLL file with fast assembler code inside. What is the best way to call functions in this .DLL file to get best performance?
The only way to answer this question is to time both options, a task which is trivially easy. Making performance predictions without timing is pointless.
Since we don't have your code, only you can answer your question.