[SPIGOT-1097] Unsafe Async implementation of ConcurrentMap in ChunkRegionLoader Created: 31/Jul/15  Updated: 11/Dec/17  Resolved: 11/Jul/16

Status: Resolved
Project: Spigot
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Kademlia Assignee: Unassigned
Resolution: Invalid Votes: 0
Labels: None


 Description   

(This problem occures very rarely or maybe never if no plugin calls sync saveChunk() via the scheduler. The Problem in itself is 'Critical'. The rarity might make it 'Minor')

Some Versions ago you changed the Code from the ChunkRegionLoader from:

    private java.util.LinkedHashMap<ChunkCoordIntPair, PendingChunkToSave> pendingSaves = new java.util.LinkedHashMap<ChunkCoordIntPair, PendingChunkToSave>();

to the more default-Implementation of Minecraft:

    private Map<ChunkCoordIntPair, NBTTagCompound> b = new ConcurrentHashMap();

While this is generally a good choice I want to point out a design-Flaw in the Class.

Method c() uses a Iterator to walk over the Entries to be saved. Iterators are not Thread-Safe even if using a ConcurrentHashMap. The code has to be syncronized or written differently. An argument could be "we dont call this method async anyway" but then again the use of ConcurrentHashMap in the first place would be questionable.

    public boolean c() {
[...]
ChunkCoordIntPair chunkcoordintpair = (ChunkCoordIntPair) this.b.keySet().iterator().next();
[...]

When can this problem occur?
It sould be possible to Crash the Server by using 100% valid sync-only code by calling .saveChunk()
This can happen because Minecraft itself calls this Methods from an async thread via
net.minecraft.server.FileIOThread.c(SourceFile:37).

Reproduction
This is of course hard to test as it is not easy to force a CME/NPE of an infrequently called Method. But using an Iterator over a ConcurrentHashMap in an not syncronized Method should be enough to know this will result in problems at some point. At least from a programmers perspective this should be sufficient.

I personally ran into this problem about 3-4 Times on a very populated Server over 3-4 Month - but i am using a heavily modified CB/Spigot Build with various optimizations. Specific to this problem i am using a chunkSavingQueue that distributes the regular worldsaving over a longer period of time to prevent lags and save all worldchunks more frequently. This 100% sync code change was able to result in the problem as it heavily increased the chance to bump into the async net.minecraft.server.FileIOThread.c() code.



 Comments   
Comment by md_5 [ 11/Jul/16 ]

CraftBukkit executes commands on the server thread, unlike Vanilla, so that example is wrong.

Additionally the semantics of how this data structure is used in Vanilla/CraftBukkit prevents it from specifically being an issue as removal is only done by a single thread, which combined with the isEmpty check means that behaviour is safe within the general contract of CHM.

You are welcome to provide a counter example actually demonstrating the issue occuring and posing an issue on CraftBukkit if you wish.

Comment by Kademlia [ 11/Jul/16 ]

(an please at least read the report)

Quote:
Conversely, executing commands via Rcon that iterate over internal data structures (such as /save-all) may cause a java.util.ConcurrentModificationException in the Rcon thread if the main thread modifies those structures (see MC-68374).

Comment by Kademlia [ 11/Jul/16 ]

See https://hub.spigotmc.org/jira/browse/SPIGOT-1097?focusedCommentId=14366&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14366

I don't think just because a command was/(is no longer afaik) broken qualifies to disregard this.

Comment by md_5 [ 11/Jul/16 ]

That stack trace is completely different. Not even in CHM.

Comment by Kademlia [ 11/Jul/16 ]

Just found the code is actually called and results in exactly what was described here. See https://bugs.mojang.com/browse/MC-72390
Using vanilla save-all command will cause this problem.

Comment by md_5 [ 11/Jul/16 ]

Impossible to cause in stock server.

Comment by Kademlia [ 02/Aug/15 ]

That Crash is unrelated to this ticked: Its in another place and uses a different technique. Looks like something on your server is asyncronously manipulating the TickList (Redstone, TNT etc) resulting in that Exception. But the problem is in a theoretical view very similar.

Comment by Super Monis [ 01/Aug/15 ]

I've experienced a crash related to NoSuchElementException.

Crash report: http://pastebin.com/c4gPnLVy
Log: http://pastebin.com/aNtEVXrL

If it helps, my server does /save-all every hour.

Comment by Kademlia [ 01/Aug/15 ]

Right. The reason I mentioned the old code is because simply reverting the changes to the previous version works without these problems. That would be the fastest "fix". The better one would of course be to analyze the problem in depth (as we kinda did now) and fix the actual problem in the current code.

Regarding the ConcurrentMap Iterator and thread-safety: I think were hairsplitting on words now but the difference we are talking about is:
Multiple Iterators over the same Map but still only being used by one Thread each (aka single-threaded usage as you stated above) is allowed.
"Iterators are not Thread-Safe even if using a ConcurrentHashMap" would still be valid as the Iterator in it self is not thread-safe. That was my intention to say (but of course in this specific case the iterator is single-thread only).

Comment by Daniel Ennis [ 01/Aug/15 ]

Ok I can see the potential for NoSuchElementException now, but only in cases of custom code or the broken and not currently usable /save-all flush.

So, I doubt anyone is going to care to fix it / add diff for something that can't happen under current situations.

If you want to PR, it simply should get an instance of the iterator then check hasNext() before calling next, instead of your proposal that the entire design of the code needs to be changed.

Your ticket accused ConcurrentMap Iterators of not being thread safe, which is untrue.

The only problem here is Mojangs use of iterator().next() w/o calling hasNext(), which just an overall no no for Iterator usage. The Thread Safety elements of ConcurrentHashMap are perfectly fine here.

Comment by Kademlia [ 01/Aug/15 ]

1.) Yes i already stated its a design flaw in the initial post. As this jira only allows 'Bug' or 'New Feature' i chose Bug and stated additional Information in the Topic. I dont see why report on a bad code-design that would break as soon as you use it as intended would be a bad thing?

2.) If the Code was in fact written properly your argument would be correct but there is no .hasNext() call - thats why i called out the design-flaw and this topic exists? The code directly calls .next() on the Iterator.

Comment by Daniel Ennis [ 01/Aug/15 ]

1) So you've confirmed this is not a bug in Spigot/Minecraft, as it is not possible to replicate this under supported circumstances (see #2 to confirm there still is no problem in this code). That means there is no reason for Spigot to add patches to a non broken system to support unsupported operations that are only occurring on a single server due to their modifications.

If you are witnessing any form of problem with this code due to your own changes, you're going to have to privately fix it in your own fork.

2) The code and docs disagree with you. The access to the collection is thread safe. The Iterator object itself is not thread safe. The iterator object is not passed around. I just personally inspected the JVM's code for ConcurrentHashMap and confirmed .next() CAN NOT NPE if hasNext() has returned true.

The iterator has its own LOCAL reference to the next node. There is no possibility that this 'next' reference can change when a single thread is the only thing operating on that instance of the Iterator, which is covered in the documentation.

The only questionable behavior that can occur due to this is the 2nd call to c() could potentially still return a ChunkCoordIntPair that was removed by the first one, however even this is still safe, as the remove call is written as:

NBTTagCompound nbttagcompound = (NBTTagCompound) this.b.remove(chunkcoordintpair);

remove is thread safe, and the 2nd call is guaranteed to return null, meaning the chunk will not double save.

if you're encountering thread safety issues, it is due to your modifications.

Comment by Kademlia [ 01/Aug/15 ]

You are correct that currently c() is only called in a way that cannot produce the problem. And the problematic method is .c() and not .saveChunk() as stated in the report - i will correct that. Edit: Or not as im not allowed to edit the report

However the whole point of this report is that it will break as soon as c() is called from anywhere in the code. As you said yourself flushSave() already provides that possibility. The intention to use it is already in the code.

What are you going to get an npe on? The map will always be set. It doesn't matter if you call the code from two different threads at same time, they each will get a unique iterator that is safe to use on the thread it was created on.

A nullpointer can occure because .next() results in a null-value as the Map has already been emptied by another worker. There is no null-check in c().

Comment by Daniel Ennis [ 01/Aug/15 ]

Studying this more now I'm home I can see the code called in exactly 2 places:
1 - The Async File Save thread
2 - /save-all flush

.... which #2 is a broken command that will crash your server with an infinite loop.

So no there's no issue at all here. c() is only ever called from the FileIOThread.

Comment by Daniel Ennis [ 31/Jul/15 ]

What are you going to get an npe on? The map will always be set. It doesn't matter if you call the code from two different threads at same time, they each will get a unique iterator that is safe to use on the thread it was created on.

This is what the benefit of a concurrent hash map is, to handle all sync for you and not require you to do it yourself

Comment by Kademlia [ 31/Jul/15 ]
Actually you're wrong. Iterators ARE thread safe AS LONG AS they are only used by a single thread and not SHARED.

Your special case contradicts the very definition of thread-safety. See https://en.wikipedia.org/wiki/Thread_safety
The cited source only refers to CME. You can still get NPE. Please review the report. I specifically state the code gets called by another thread.

(OT ChunkSaving: Thats not what im referring to. Spreading out processChunkGC() and forcing actual File-Updates is closer to what im talking about.)

Comment by Daniel Ennis [ 31/Jul/15 ]

Actually you're wrong. Iterators ARE thread safe AS LONG AS they are only used by a single thread and not SHARED.

Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. 
They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. 

So it is not an issue. Something else must be triggering your issue.

Also

 Specific to this problem i am using a chunkSavingQueue that distributes the regular worldsaving over a longer period of time to prevent lags and save all worldchunks more frequently. 

I've already got a change into Spigot that does this sort of.

https://hub.spigotmc.org/stash/projects/SPIGOT/repos/spigot/browse/CraftBukkit-Patches/0070-Improve-AutoSave-Mechanism.patch#22

Though limiting # of chunks saved per auto save isn't a bad idea. can just add a && savedThisTick < maxPerTick check

Comment by SpigotMC [ 31/Jul/15 ]

Your build is not the latest and therefore may be the reason you are having this issue. Spigot is 2 version(s) behind. CraftBukkit is 2 version(s) behind. This message was automatically generated and is not guaranteed to be a solution to your issue.

Generated at Tue Apr 15 11:10:05 UTC 2025 using Jira 10.3.3#10030003-sha1:d220e3fefc8dfc6d47f522d3b9a20c1455e12b7b.