What is caching?

后端 未结 9 1893
鱼传尺愫
鱼传尺愫 2020-11-28 17:49

I\'m constantly hearing about person y had performance issue x which they solved through caching.

Or, how doing x,y,z in your programs code can hurt your caching abil

9条回答
  •  再見小時候
    2020-11-28 18:23

    There's a couple of issues.

    One, is granularity. Your application can have very fine levels of caching over and above what the database does. For example, the database is likely to simply cache pages of data, not necessarily specific rows.

    Another thing is that the application can store data in its "native" format, whereas the DB obviously only caches in its internal format.

    Simple example.

    Say you have a User in the database, which is made of the columns: USERID, FIRSTNAME, LASTNAME. Very simple.

    You wish to load a User, USERID=123, into your application. What are the steps involved?

    1. Issuing the database call
    2. Parsing the request(SELECT * FROM USER WHERE USERID = ?)
    3. Planning the request (i.e. how is the system going to fetch the data)
    4. Fetching the data from the disk
    5. Streaming the data from the database to the application
    6. Converting the Database data to application data (i.e. USERID to an integer, say, the names to Strings.

    The database cache will, likely, caches steps 2 and 3 (that's a statement cache, so it won't parse or replan the query), and caches the actual disk blocks.

    So, here's the key. Your user, USER ID 123, name JESSE JAMES. You can see that this isn't a lot of data. But the database is caching disk blocks. You have the index block (with the 123 on it), then the data block (with the actual data, and all of the other rows that fit on that block). So what is nominally, say, 60-70 bytes of data actually has a caching and data impact on the DB of, probably, 4K-16K (depends on block size).

    The bright side? If you need another row that's nearby (say USER ID = 124), odds are high the index and data are already cached.

    But even with that caching, you still have to pay the cost to move the data over the wire (and it's alway over the wire unless you're using a local DB, then that's loopback), and you're "unmarshalling" the data. That is, converting it from Database bits to language bits, to Application bits.

    Now, once the Application get its USER ID 123, it stuff the value in a long lived hash map.

    If the application ever wants it again, it will look in the local map, the application cache, and save the lookup, wire transport, and marshalling costs.

    The dark side of application caching is synchronization. If someone comes in and does a UPDATE USER SET LASTNAME="SMITH" WHERE USERID=123, your application doesn't "know that", and thus the cache is dirty.

    So, then there's a bunch of details in handling that relationship to keep the application in sync with the DB.

    Having a LOT of database cache is very nice for large queries over a "hot" set of data. The more memory you have, the more "hot" data you can have. Up to the point if you can cache the entire DB in RAM, you eliminate the I/O (at least for reads) delay of moving data from the disk to a RAM buffer. But you still have the transport and marshalling costs.

    The Application can be much more selective, such as caching more limited subsets of data (DBs just cache blocks), and having the data "closer" to the application ekes out that much better performance.

    The down side is that not everything is cached in the Application. The database tends to store data more efficiently, overall, than the application. You also lack a "query" language against your app cached data. Most folks simply cache via a simple key and go from there. Easy to find USER ID 123, harder for "ALL USERS NAMED JESSE".

    Database caching tends to be "free", you set a buffer number and the DBMS handles the rest. Low impact, reduces overall I/O and disk delays.

    Application caching is, well, application specific.

    It works very well for isolated "static" data. That's very easy. Load a bunch of stuff in to lookup tables at startup and restart the app if they change. That's easy to do.

    After that complexity starts to increase as you add in "dirty" logic, etc.

    What it all comes down to, tho, is that as long as you have a Data API, you can cache incrementally.

    So, as long as you call getUser(123) everywhere rather than hitting the DB, then you can later come back and add caching to getUser without impacting your code.

    So, I always suggest some kind of Data Access Layer in everyone's code, to provide that bit of abstraction and interception layer.

提交回复
热议问题