[SPIGOT-3014] Low-Performance Nether Portals Created: 16/Jan/17  Updated: 20/Jan/17  Resolved: 16/Jan/17

Status: Resolved
Project: Spigot
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Matt Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None

Attachments: File snapshot-1484850439741_PORTAL-TRAVEL.nps     File snapshot-1484850928983_WARP-TRAVEL.nps    

 Description   

Issue: traveling through nether portals causes more lag on a server than simply warping between worlds

The suspicion: Nether portal use ignores cached chunks, and forces some sort of refresh on the cache. Waste of a cache, and CPU cycles. Alternatively, Nether-portal animations are being rendered server-side, causing resource spikes on the server.

 

Testing: For each test, nether portal had already been opened and used, to pre-generate the chunks. Essentials warps were created at each end of the portal, then the server was restarted.

 

1) Fresh restart, wait 2 minutes for everything to fully settle, spawn in overworld, wait 1 more minute (I'm being patient, so as not to screw up the timings), start timings, travel through nether portal, wait for cpu to drop below 50%, paste and stop timings. TPS: 14.1

https://timings.spigotmc.org/?url=qoworuneva

 

2) Same as (1) (restart again), only this time /warp nether was used to travel to the nether world. TPS: 19.9

https://timings.spigotmc.org/?url=wokujutehe

 

3) After restart, portal was used to access nether and return to overworld. After 1 minute, timings was started and portal was used again. TPS: 11.2

https://timings.spigotmc.org/?url=simivigihu

 

4) Same as (3), except /warp nether was used for the second travel. TPS: 19.8

https://timings.spigotmc.org/?url=wicafahida

 

5) Same as (4), except /warp was used for both trips to the nether and back. TPS: 19.8

https://timings.spigotmc.org/?url=cadafijeji

 

6) Same as (4), except warp was used for the initial trips, portal was used for second (timing) trip (to rule out using a portal as clearing the cache). TPS: 10.8

https://timings.spigotmc.org/?url=qosurenawi

 

Not 100% sure what is causing the massive TPS lag, but it is definitely there. The real way to fix this issue would be to track down the cause, but a quick fix could be made by simply calling the portal animation, then warping players between portals after a second or so. Unless, of course, the TPS lag is the portal animation... In which case I'll just disable portals on my server and make people use warp signs.

 

Spigot version 1.11.2

Plugins: Essentials, LANBroadcaster

 



 Comments   
Comment by pokechu22 [ 16/Jan/17 ]

The portal animation is entirely client side; it's just an animated texture.

On the other hand, when you go through a portal, it does try to find a portal to link with (going through a portal always places you at another portal, even if the other portal's ~100 blocks away). Using the dimension warping commands doesn't do that IIRC, instead keeping you in more or less the exact same position. The search for a portal block may create lag.

That search is supposed to be cached, though. It might make a lag spike the first time while it tries to find the portal, but it shouldn't do it each time (and it shouldn't be a significant hit even then).

Vanilla's debug profiler (which isn't available on spigot for reasons) would indicate what specific section this is happening at - finding the portal, placing the entity, etc.

Comment by Matt [ 16/Jan/17 ]

Interesting point, I had not considered portal search as a potential problem as well... I mistakenly assumed that was cached already.

I figured the first time a portal was used would be problematic, which is why I didn't even post the timings on that, since the portal search combined with world generation should be significant.

I also suspected that the portal animation was client-side, I just didn't want to look like an idiot for ruling it out if I was wrong, as it is should be the only real difference between properly cached portals and a warp

Comment by Matt [ 16/Jan/17 ]

https://hub.spigotmc.org/stash/projects/SPIGOT/repos/craftbukkit/browse/src/main/java/org/bukkit/craftbukkit/CraftTravelAgent.java

 

https://hub.spigotmc.org/stash/projects/SPIGOT/repos/craftbukkit/browse/nms-patches/PortalTravelAgent.patch

 

I'm not seeing any caching at first glance... But it does look like support was built in to allow for a dynamic search radius, but then it was hard coded to be 128 blocks... Would be nice to be able to configure this in one of the server .yml files, setting it to something smaller like 64 or 32 could dramatically reduce the search loads on the server, at minimal cost to vanilla functionality.

 

When I have the time, I'm gonna actually run through the search algorithm, I suspect it just iterates through every block possibility in the search radius. If that is the case, then knowing that portals have a minimum width of 4 at their widest, 2 at the base, one could theoretically reduce the number of calls by a factor of 2-4 depending on how the search is performed.

Comment by md_5 [ 16/Jan/17 ]

The API already allows setting a custom search radius, we do not generally add features which can be trivially done with a plugin.

Additionally I did some profiling whilst using the portals with a Java profiler (YourKit) and on my server I was unable to see anything unexpected.

Portal creation is again one of those things which needs to mirror Vanilla and in light of 1) The optimisation you requested already being implementable by a plugin and 2) Me being unable to reproduce, I am closing this ticket.

Comment by pokechu22 [ 17/Jan/17 ]

I'm not seeing any caching at first glance

To be clearer: the caching happens in PortalTravelAgent, in vanilla (it's not added via a patch). On a casual inspection, it looks like that caching should still work within CraftTravelAgent.

There are some limitations to the cache: It expires after 300 ticks (and is polled for expiration every 100 ticks, so it can potentially last up to 399 ticks). In this context, I'm pretty sure ticks are world ticks, so that means... 15 to 26.6 seconds. Additionally, the caching seems to happen on a per-chunk level, not a per-portal level, which may lead to weird behavior (but I'm not entirely sure what it might cause). Maybe the cache could be improved slightly, but I don't think that it'd be too bad (since it tries to find the closest portal first, which shouldn't take too long...)

Comment by Matt [ 18/Jan/17 ]

15-26 seconds doesn't seem like a very long time to remember where a portal <should> go to (how often do they change? Can the destinations not be saved until a change is made to one of the portals? how often do people go through the same portal in 15 seconds?). In addition, on an admittedly less robust server than I know some people can afford, performance is hindered server-wide for 5ish seconds every time someone decides to take a trip to or from the nether. I realize the options of replace nether portals with a plugin, or buy a more powerful server exist, but that doesn't really feel like "high performance Minecraft" to me, more like "just Minecraft with some plugins". If the performance was also limited when warping, I would accept my fate, but it is clear to me that something about nether portals is being handled in a highly inefficient manner.

Here's an example of the slow loading of nether portals, this would be combined with lag for everyone else

https://www.spigotmc.org/threads/nether-loading-very-slow.60853/

Looking at my timings reports, it is clear that Connection Handler and livingEntityBaseTick both spike during portal events, though the cause of this may simply be the lag introduced by the portal (rather than them being the cause).

Given that warping has very little effect on TPS, I almost wonder if some of that chunk handling is actually done on a separate thread, or at least asynchronously, unlike the nether portal... I realize I speculate a lot, but Java is not my forte, so it takes me quite a while to navigate around the source.

The reason I labeled this a minor bug is because it's not game breaking, and to some extent, it can be bypassed with plugins, but it is a performance bug which I thought would be of interest to the Spigot community.

 

Comment by md_5 [ 18/Jan/17 ]

>performance is hindered server-wide for 5ish seconds every time someone decides to take a trip to or from the nether
Except that I can't reproduce this so can't do anything about the issue so we can't validate any of your assumptions.

>something about nether portals is being handled in a highly inefficient manner
There are tens of thousands of lines of code that are executed when a player uses a nether portal, there is a lot of potential "something"

>I realize I speculate a lot
Instead of speculating, it would be more helpful if you could provide an overview of the issue occuring in a Java profiler snapshot (YourKit, VisualVM), timings really do no good here.

Comment by Matt [ 19/Jan/17 ]

Ask and you shall receive

Added images in case they are of any use:

 

Portal:

snapshot-1484850439741_PORTAL-TRAVEL.nps

http://i.imgur.com/M6XzGVg.png

 

Warp:

snapshot-1484850928983_WARP-TRAVEL.nps

http://i.imgur.com/TyhqMdk.png

 

Thanks,

-Matt

Comment by md_5 [ 20/Jan/17 ]

The code there is pretty standard.
Maybe your world is an edge case like massively mismatched heights between worlds, or you have other server issues.
I see no avenue for optimisation aside from reducing search radius which is possible via a plugin.
I cannot reproduce such slow portalling on my server.

Comment by Matt [ 20/Jan/17 ]

Well, I suppose that is unfortunate that you cannot reproduce the issue... I find it hard to believe that I am alone in this situation, and I have tested on multiple randomly generated worlds with 100% success rate, but not surprised to find that it's not easily reproduced without ultra budget hardware, similar to a Raspberry Pi 3.

 

For now I'm willing to let this topic rest and block nether portals on my server in favor of warps, but perhaps one day someone with a budget server will come along and add some extra useful data that can help diagnose the source of the lag (or at least confirm that I'm not alone...). I have certainly wasted more man-hours trying to diagnose the problem than portal lag will cost me annually at 5-10 extra seconds a trip. But hell, that's what programming is all about!

 

If this is truly an isolated incident then there is no need to proceed further, maybe down the road it can be reopened if anyone is able to reproduce what I'm experiencing.

 

Thanks for your time.

Comment by md_5 [ 20/Jan/17 ]

The raspberry pi is terrible for Minecraft.
It has about the same performance as a 10 year old pentium, we essentially don't support it and a $5/month shared host would be better.

Comment by Matt [ 20/Jan/17 ]

There's the response I was expecting! This doesn't seem like the right place to debate the merits of affordable computing. Not everyone can afford $5/month, simply be glad you live in a situation where that's a negligible expense.

All I will say is that vanilla Minecraft/Spigot run fine for 5 or fewer players, with the primary (nay, only) significant exception I've uncovered being this nether portal issue. I keep seeing people say the Pi is bad, but it seems, never those who have tested/worked to optimize it... Obviously it won't be handling six separate bungee cord worlds with 150 players and the latest XL FTB modpack, but it holds its own. Sometimes the best way to find inefficiencies in one's code is to make sure it runs well on the low end of the spectrum, rather than just throwing money at the problem and raising the performance barrier.

 

-Matt

Comment by md_5 [ 20/Jan/17 ]

It's as optimized as it's gonna get, this is just a case of an unavoidable operation which the pi sucks at - 128*128*256 is a lot of blocks.

Given the pi costs about the same as 7 months worth of hosting (excluding internet, electricity etc), I don't think I'm making an outlandish suggestion in any situation.

Comment by Matt [ 20/Jan/17 ]

I will defer to your judgement on it being at peak optimization, and/or not worth investigating but I will point out the following:

A straight search of a 257*257*256 (128 in each direction...) grid is the easiest way to handle the process, but not the most optimized. Simple knowledge of the minimum portal dimensions (4*5) means that you can actually search 20x fewer points. Distribute the work across N cores, since it is just a simple search iterator (I know threading in Minecraft is a pipedream...), and you now have a 20N performance increase. Ensuring there is a system in place for not wasting time locating portals farther away from the center than the closest found portal can drop the number of searched points for standard portals by another 64x. If one were to implement some decent caching, then for times where the destination portal isn't destroyed there could be a (roughly) 16908544x decrease in search time, >64x (again up to 16908544x if your portal is in the center) depending on center distance if you want to preserve vanilla behavior and just double-check there are no portals closer to the center than the one you have already confirmed exists...

 

Having a system that works doesn't mean we have the best system. I realize this isn't a high priority (again, points to the "Priority: Low" flag), but it is a space for optimization if anyone is ever interested in doing some fine-tuning. The system as-is is "good enough" in that it performs the designed function, but it is not very optimized. Who knows, perhaps this is an issue best brought up with Minecraft devs, since it is admittedly their inefficient code, but I figured I would check here first since the Spigot slogan is high performance.

 

I realize I'm likely beating a dead horse at this point as your mind appears made up, but I figure I should at least get my views/ideas out there in case anyone feels inspired to go with it.

 

Best,

-Matt

Comment by md_5 [ 20/Jan/17 ]

Except that the Vanilla code allows for a 1x1 portal.
This is not something that we will optimize because:

1) It is very easy to completely obliterate Vanilla behaviour (adventure maps)
2) A rewrite of an entire class a large undertaking which must be redone every update
3) You are running your server on hardware which is _below_ the minimum requirements
4) You're kidding yourself if you think this is something that is able to be parallelised
5) Did I mention that we have to maintain this code and its bugs?
6) Oh yeah I also have to maintain an entire API, implementation, bug fixes and performance changes

When we add an optimization to Spigot we have to evaluate the cost benefit for the _entire userbase_ and in this case the cost benefit is massively skewed away from optimizing.

Generated at Thu Aug 11 15:24:15 UTC 2022 using Jira 9.0.0#900003-sha1:3a66c72723fa5e73f2191cdfc1f1b28a5950171c.