-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk Load / Loader that will use loadAll() #116
Comments
Thanks for the request! We had this feature in previous versions. We stripped it away for the 1.0 version to get rid of all non essential things, since we wanted to have a concise and well tested cache implementation. |
@cruftex this feature would be very useful to me. To make the internal Cache2K implementation simpler, it could be used whenever the |
@cruftex we really need this feature too. Btw. I have noticed the new I can only think of something like using a Your opinion would be greatly appreciated :) |
it may be naive.. but can't you just iterate over the list and add all entries? |
@globalworming well, yes of course. But that would be much slower than a real bulk load. For example consider a |
Thanks for reaching out. It is perfect timing. The version 2.0 is about to finish. So I priotise bulk @MatthiasBechtold
Yes, because I tried my best to clean up the interfaces. There will be an extra
I don't know your usage scenario. If the list values are always requested together, then it makes actually sense. Did you have a chance to look at the async version of the loader? |
New interfaces are in this commit: ebc6a95 |
Just released a preview with bulk cache loader support of There will be more tests with concurrency and further optimizations. |
@cruftex thanks a lot for the update! (sorry for my delay, festive season took its toll) In the meantime I reviewed our use case again. Until we are able plan to upgrade to 2.1.1 (and it's released of course) the current workaround should be fine. We eagerly load the whole cache content via |
@MatthiasBechtold 2.1.1 has some critical flaws with the bulk loader support. I'll release 2.1.2 soon. It also has a slightly changed/improved bulk interface. After 2.1.2 I plan to do some stress and concurrency tests and then it should be ready for production. 2.1.1: |
Thanks for taking care of bulk loading! |
@dstango thanks for testing. This version is actually pretty flawed. It would be great if you take a look at the next version. I am just debugging a nasty concurrency issue.
Is that happening at the initial load or after that during the refresh? With which methods do you access the cache? Some explanation of the mode of operation of the core cache with bulk support: Only a bulk access to the cache will cause a bulk access to the loader. The loader will be called with the keys requested or a subset thereof. Example: a Each entry is expiring individually. I don't plan something like a bulk expire at the moment. However, its clear, that for background refreshing it is more efficient to load in bulk as well. I'd like to solve that problem as well, but not in the core cache functionality, since that is already very complex. The idea would be to have a loader which adapts the real loader and is coalescing several requests into a bigger by introducing some delay. This way its modular and can be tested separately. |
@cruftex: it happens not at the initial load, but during the refresh. I access the cache in a maybe unintended way by calling entries() (not getAll()). I don't use getAll(), as I'm actually not interested in the keys, but the values(). I could probably use cache.getAll(cache.keys()) as a workaround to trigger loadAll(). I'm currently just implementing BulkCacheLoader. I'm happy to try out a new version when it's available in maven. Thanks for your efforts! :-) |
@dstango: That's expected behavior then, because expiry and refresh is working on individual entries only. That's something the additional coalescing loader needs to fix. Good that you ask for it, so I know there is demand.
A I am a bit concerned because of your usage pattern. It seems by using |
@cruftex thanks for thinking it through :-) - that gives me more clarity about what to expect. Actually I might try to "abuse" the cache in my situation. What I"m basically trying to do is to mirror all data of a certain type of a slow external system (customer addresses from some SAP system). |
@MatthiasBechtold @dstango The next alpha update with major changes to the bulk loader is out.
|
@cruftext Thanks for putting effort into implementing the bulk merge. Will have to check it out. |
Hey @cruftex Thanks for your effort in implementing this feature. I've tested your solution and it works well. I've had my own solution built on top of Caffeine and I've replaced it with cache2k and it passed all the tests. I've also played with cache2k a bit and it seems that everything is OK. I haven't tested this solution in terms of performance, but if you're going to make to release it, then I will. |
@cybuch, that's great news! I was kept busy with other stuff the last months and will get back to this now. If you have a bit time it would be nice to get feedback what elements you are using in your scenario. Sync or async bulk loader? |
@cruftex So in my scenario I've tested |
@cybuch
Maybe you want to test it? |
@cruftex |
@cruftex |
Hi @cybuch, thanks for trying. From what you explain it could be related to the async processing scheme. The Please double check. If there is still a problem, please provide me with an example how I can reproduce this and I will look into it instantly. |
Hey @cruftex ,
|
Oh, yes, it works as designed.
The bulk loader may return partial results. If you don't call the callback for the key which load is requested, its simply still loading. Either provide data for the key, or call the failure call back
This is intentional as well. With read through operation you would typically use The behavior is counter intuitive. You do load a value into the cache via refresh, but its not appearing. However, the semantics of The practical reason is: I want to expire entries that are not accessed any more. So I need to know whether an entry is still accessed. However, OTOH I don't want to add more code for bookkeeping into the most critical path of a cache, which is the cache hit. This would make every program using the cache slower. That's why a refreshed entry isn't the same as a normal entry before its value actually is requested once via |
Hey @cruftex, thanks for your answers.
That's exactly my case. Cache loader may return partial results or no results at all for given set of keys and then I'd like simply to remove those keys from the cache. Right now, if I'd call
I get your point of view. The question is, could be there another cache implementation that could wrap existing cache implementation to provide such feature? Then it wouldn't bog down cache performance for regular users (but to be honest, if this behaviour would be hidden by some flag, then probably it would be JITTed so regular users wouldn't notice any difference). So my usage is: I've got a cache, that keeps and refreshes entries forever. Actually it's not forever, because as stated in the first paragraph the data may not be available anymore. The cache size is limited and if it's about to exceed it's size then some entries are removed using LRU policy. For me that's a huge performance gain, when all the data that user needs is in cache, even if it's not accessed so often. |
Hi @cybuch. Although we are now a bit off topic w.r.t. to the issue, I am happy to help.
You have two options here: Negative caching: In case keys are requested that don't have data associated there is a problem. If you don't have a mapping for them in the cache you will repeatedly invoke the loader if those keys get requested again. The fact the cache remembers that there is no data, is called negative caching. cache2k is supporting this, by allowing to store a Remove entries via the loader: You can do that as well, with the problem that repeated requests will invoke the loader again. If this is only happening scarcely then it might be okay. You can store nothing or remove a mapping if a key yields no data. To do so, set Side note: If your keys come from user input you always open the door for DOS attacks here. With negative caching a user has the opportunity to sweep the cache and exhausting memory resources. Without negative caching the loader can be invoked continuously the user and exhaust computing resources. Countermeasures we used are rate limiters that detect a high miss rate on a single IP. Another option are bloom filters, see: https://en.wikipedia.org/wiki/Bloom_filter#Examples
You would need it always when refresh ahead is enabled. Refresh ahead makes sense pretty much always when there is expiry and a loader. So every "regular user" would potentially use it. The semantic mismatch would still be there. It's counter intuitive as well if The current solution adds the functionality elegantly without the need of additional data structures or extending the critical code path. Let's open another issue for this discussion: #172
Got it. That would be a use case for We have a similar problem in our applications. Ironically the response time at night is higher than at day time. The reason is only a few requests at night don't keep the caches warm. Can you add a comment to #34 and roughly describe your use case? Another thing: You said you were using Caffeine before, is there any particular feature that made you choose cache2k and not Caffeine? |
@cybuch I thought about your scenario with "endless refresh". You can do that quite elegant as well. Here is an example how I would do it:
This is similar with your approach of the expiry listener. Using the loader and refresh has the advantage that the data keeps being available. Since the refresh is doing a |
Hi @cruftex Removing entries from the loader by setting expiry policy works like a charm for me. Thanks for sharing the info about DoS attacks, but the source of the data in my case are in-house clients so it's not the case for me.
Sure, I will.
We have faced the same problem, but using caches that reloads the values forever solved this case.
The problem is the Caffeine didn't have solution for my needs so I built a custom solution around Caffeine. However it's not easy to maintain it and for sure it's also hard to understand the code. I'm about to hand down the project to the other team and it would be nice to use solution provided by the libraries (whether it's Caffeine or cache2k) so the other team wouldn't have to worry about some cache mechanisms that I've implemented. BTW do you know when you'll be releasing this feature as regular version? |
Just released another update mainly for the bulk support. Some concurrency issues fixed for corner cases.
Link to the release: https://github.com/cache2k/cache2k/releases/tag/v2.1.3.Alpha |
I plan to remove the old loader classes in |
Just released 2.1.4.Beta as a final step to clean up the interface. Also the old See: https://github.com/cache2k/cache2k/releases/tag/v2.1.4.Beta |
Just released 2.1.5.Beta with concurrency issues and exception handling fixed in the See: https://github.com/cache2k/cache2k/releases/tag/v2.1.5.Beta Possibly, this is the last change for version 2.2. |
Did some more testing with the latest beta with some of our application code. Everything works correctly. For the Only thing left is a look over the documentation which should mention the new bulk interfaces. |
Decided to do #173 right away. Its now default that the |
Hey,
is there any plan for releasing Cache implementation that would use Loader's loadAll()? It'd help me a lot with my use case, where I fetch data from HTTP endpoint which supports requests by many IDs. So for 100k objects in cache I could do 1k HTTP requests if each request would fetch 100 IDs instead of 100k requests 1 by 1.
Expected behaviour:
loadAll
methodA nice to have feature would be an option about the size of the batch OR time limit eg.:
the batch size is 1.000, but after 2 minutes there're only 512 keys eligible to load so the cache loads all the 512 keys and reset the timer
The text was updated successfully, but these errors were encountered: