First decide what your optimisation goal is - set a target for timing of particular operations on a given hardware platform. Measure the performance accurately (ensure your results are repeatable) and in a production-like environment (no VMs etc unless that's what you use in production!).
Then if you decide it's already fast enough, you can stop there.
If it's still not good enough, then some extra work will be needed - which is where profiling comes in. You may not be able to use a profiler very well (for example, if it impacts the behaviour too much), in which case instrumentation should be used instead.