I made this based on the gripe about some of the silent failures with federation. Might help users choose other servers. Might help admins troubleshoot. Open to comments and criticisms!
Oooohhh … Nice!! I’m repeatedly impressed at how many hackers are going ahead and just getting some stuff done here!!
Questions/thoughts:
- What instance is used as a reference for the delay? One you self-host (lemmy.management)?
- Sooo … what’s the deal with lemmy.ml … that seems to have gone beyond lag and is basically falling over … seems like the devs have neglected their own instance’s health?
- What’s that
Redash
? Is it a plotly thing or some other product that just uses their graphing library? How have you found it?
What instance is used as a reference for the delay? One you self-host (lemmy.management)?
Yes. lemmy.management. It is purposefully updating subscribed communities to as many as possible (via automation.) This doesn’t correct for network lag, but the idea was to capture the “federation” lag. There’s no code I’m aware of that allows admins to prioritize outbound federation traffic. I could be wrong though.
Sooo … what’s the deal with lemmy.ml … that seems to have gone beyond lag and is basically falling over … seems like the devs have neglected their own instance’s health?
I just collect the data.
What’s that Redash? Is it a plotly thing or some other product that just uses their graphing library? How have you found it?
https://redash.io I don’t remember how I found it. Probably an “awesome” list on github.
On mobile, when touching the “Federation Lag-o-meter (now - 1h)” statistics, the page is hard to scroll. Other than this the page is gold
When I saw the bar looking like the Burj Khalifa, I assumed it was
.world
instead of.ml
. Interesting.Props to Ruud@lemmy.world for dealing admirably with the Rexxit hug of death.
I’m expecting that JSON parsing is a huge overhead with the fediverse. I work on a SAAS that needs to do all its internal processing in under 10 ms, and serializing/deserializing ends up being a sizable chunk of server time. I saw a 40% reduction in runtime using simdjson for deserializing, and there exists a rust crate for it, but I haven’t had time to look the Lemmy code over.
Can anyone with an overloaded instance get on their command line and gather a decent flamegraph so the performance folks can aim optimizations in the right direction?
Beehaw is currently doing the Burj
Yep, it seems completely different to when I last looked.
It seems everyone gets a turn a top.
Graph should remove the outlier as it is skewing the results for every other instance and not letting to see smaller numbers show up.
Or we should move to log scale so that it can be displayed correctly.
Great idea. I was trying to figure out if it was lemmy.world trying to deal with new users or a bug with Memmy app that caused random errors
Is it possible to have the lag metrics by instances in a table format? Its so hard to view your site on mobile
I didn’t even load it on mobile. I will check it out tonight and maybe just create a separate “mobile friendly” dashboard.
Not the person you’re replying to, but I didn’t find it awful on mobile. The zoom by dragging worked well, as did the double tap to view the whole dataset.
For a quick browse I wasn’t frustrated at all and found the information I wanted to in a short amount of time!
Nice work! Maybe add feddit.de?
Fixed! The regex was not getting content from < 0.18.0 instances. Thanks!
EDIT: I am wrong, it was something else in feddit.de’s messages I THOUGHT was a version thing, but must be a localization thing. A string in the JSON was breaking some regex. Regardless… fixed.
Awesome, thank you :-)
It’ll be interesting to see how this changes through the day! I know .world tends to slow down later in the day when the US contingent is getting going.
(also, yay lemm.ee)
This is awesome! Hopefully it’ll help spread the load among instances. Definitely going to use this to see which instance to move to (and which to avoid)
Keep in mind this is a one hour snapshot. I am working on a historical rating as well to give a better indication of overall long term stability.
This looks great. Is there any chance that this could be extended to include Kbin as well, since those instances federated with Lemmy, too?
I am actually working on that! Stay tuned. Like days though, don’t get too excited. :)
Aye aye. I’m mildly excited.
kbin posts DO show up in the details table. you would need to know the ip they are coming from. they don’t include their instance host name in the header, which is why it’s not in the table and instance is null for some IPs. also I don’t scrape and subscribe kbin magazines like i do for lemmy ATM, so the traffic will be low. probably just a few from kbin.social.
This looks really good.
As an admin of a small kbin instance, I’ll be keeping an eye on updates from you as this will be very handy!
This is really cool! Would it be possible to grab this data as json, csv or some other equivalent format? I’m working on making my own lemmy client and this would be very helpful to be able to display i think
Should already be able to:
https://redash.io/help/user-guide/integrations-and-api/api
For example: https://aftershock.lemmy.management/api/queries/4/results
The API key for public users is the same as the dashboard slug: oT7pdcoeHWccpvZCNmTpJKoGZND8ZdRO3wDWpMug
Awesome work! Added my instance!