Cycle counter on ARM Cortex M4 (or M3)?

前端 未结 5 1763
陌清茗
陌清茗 2020-12-03 00:13

I\'m trying to profile a C function (which is called from an interrupt, but I can extract it and profile it elsewhere) on a Cortex M4.

What are the possibilities to

相关标签:
5条回答
  • 2020-12-03 00:26

    Take a look at the DWT_CYCCNT register defined here. Note that this register is implementation-dependent. Who is the chip vendor? I know the STM32 implementation offers this set of registers.

    This post provides instructions for using the DWT Cycle Counter Register for timing. (See the post form 11 December 2009 - 06:29 PM)

    This Stack overflow post is an example on how to DWT_CYCCNT as well.

    0 讨论(0)
  • 2020-12-03 00:36

    If your part incorporates the CoreSight Embedded Trace Macrocell and you have appropriate trace capable debugger hardware and software then you can profile the code directly. Trace capable debug hardware is of course more expensive, and your board needs to be designed to make the trace port pins available on the debug header. Since these pins are often multiplexed to other functions, that may not always be possible or practical.

    Otherwise if your tool-chain includes a cycle-accurate simulator (such as that available in Keil uVision), you can use that to analyse the code timing. The simulator provides debug, trace and profiling features that are generally more powerful and flexible that those available on chip, so even if you do have trace hardware, the simulator may still be the easier solution.

    0 讨论(0)
  • 2020-12-03 00:37

    This depends on you ARM implementation.

    I used the SysTick->VAL register on a stm32F4 core. This is cycle accurate.

    When interpreting the results, take care of:

    • take wrapping into account.
    • It counts down, not up.

    Limitation: This only works on intervals smaller than a single systick.

    0 讨论(0)
  • 2020-12-03 00:39

    This is just easier:

    [code]

    #define start_timer()    *((volatile uint32_t*)0xE0001000) = 0x40000001  // Enable CYCCNT register
    #define stop_timer()   *((volatile uint32_t*)0xE0001000) = 0x40000000  // Disable CYCCNT register
    #define get_timer()   *((volatile uint32_t*)0xE0001004)               // Get value from CYCCNT register
    
    /***********
    * How to use:
    *       uint32_t it1, it2;      // start and stop flag                                             
    
            start_timer();          // start the timer.
            it1 = get_timer();      // store current cycle-count in a local
    
            // do something
    
            it2 = get_timer() - it1;    // Derive the cycle-count difference
            stop_timer();               // If timer is not needed any more, stop
    
    print_int(it2);                 // Display the difference
    ****/
    

    [/code]

    Works on Cortex M4: STM32F407VGT on a CJMCU Board and just counts the required cycles.

    0 讨论(0)
  • 2020-12-03 00:41

    Expanding previous answers with a DWT_CYCCNT example (STM32) in main (similar to my other post).

    Note: I added a delay method as well. You can verify stopwatch_delay by calling STOPWATCH_START, run stopwatch_delay(ticks), then call STOPWATCH_STOP and verify with CalcNanosecondsFromStopwatch(m_nStart, m_nStop). Adjust ticks as needed.

    uint32_t m_nStart;               //DEBUG Stopwatch start cycle counter value
    uint32_t m_nStop;                //DEBUG Stopwatch stop cycle counter value
    
    #define DEMCR_TRCENA    0x01000000
    
    /* Core Debug registers */
    #define DEMCR           (*((volatile uint32_t *)0xE000EDFC))
    #define DWT_CTRL        (*(volatile uint32_t *)0xe0001000)
    #define CYCCNTENA       (1<<0)
    #define DWT_CYCCNT      ((volatile uint32_t *)0xE0001004)
    #define CPU_CYCLES      *DWT_CYCCNT
    #define CLK_SPEED         168000000 // EXAMPLE for CortexM4, EDIT as needed
    
    #define STOPWATCH_START { m_nStart = *((volatile unsigned int *)0xE0001004);}
    #define STOPWATCH_STOP  { m_nStop = *((volatile unsigned int *)0xE0001004);}
    
    
    static inline void stopwatch_reset(void)
    {
        /* Enable DWT */
        DEMCR |= DEMCR_TRCENA; 
        *DWT_CYCCNT = 0;             
        /* Enable CPU cycle counter */
        DWT_CTRL |= CYCCNTENA;
    }
    
    static inline uint32_t stopwatch_getticks()
    {
        return CPU_CYCLES;
    }
    
    static inline void stopwatch_delay(uint32_t ticks)
    {
        uint32_t end_ticks = ticks + stopwatch_getticks();
        while(1)
        {
                if (stopwatch_getticks() >= end_ticks)
                        break;
        }
    }
    
    uint32_t CalcNanosecondsFromStopwatch(uint32_t nStart, uint32_t nStop)
    {
        uint32_t nDiffTicks;
        uint32_t nSystemCoreTicksPerMicrosec;
    
        // Convert (clk speed per sec) to (clk speed per microsec)
        nSystemCoreTicksPerMicrosec = CLK_SPEED / 1000000;
    
        // Elapsed ticks
        nDiffTicks = nStop - nStart;
    
        // Elapsed nanosec = 1000 * (ticks-elapsed / clock-ticks in a microsec)
        return 1000 * nDiffTicks / nSystemCoreTicksPerMicrosec;
    } 
    
    void main(void)
    {
        int timeDiff = 0;
        stopwatch_reset();
    
        // =============================================
        // Example: use a delay, and measure how long it took
        STOPWATCH_START;
        stopwatch_delay(168000); // 168k ticks is 1ms for 168MHz core
        STOPWATCH_STOP;
    
        timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
        printf("My delay measured to be %d nanoseconds\n", timeDiff);
    
        // =============================================
        // Example: measure function duration in nanosec
        STOPWATCH_START;
        // run_my_function() => do something here
        STOPWATCH_STOP;
    
        timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
        printf("My function took %d nanoseconds\n", timeDiff);
    }
    
    0 讨论(0)
提交回复
热议问题