Three-day OSgrid outage caused by cluster issues
OSgrid was down on Wednesday, Thursday and Friday of this week when the grid was no longer able to access its asset services, but other grids probably don’t need to worry about the same thing happening to them.
“It was an issue with the cluster and was not related to OpenSim,” grid president Dan Banner told Hypergrid Business. “It has since been corrected and steps taken to avoid future downtime.”
Avination grid president Melanie Thielker fixed the problem. She was the one who originally built the cluster infrastructure for OSgrid after its six-month outage in 2014.
Thielker was moving house when the outage occurred, she told Hypergrid Business, which delayed the repair process until she could get back online.
“The asset service cluster has some issue that causes a deadlock of sorts, preventing ROBUST (the asset service software) from starting,” said OSgrid board member James Stallings, also known as Hiro Protagonist in-world, who was president of the grid until Banner took over in April. “This is not unusual, though the nature of the issue is. Typically, OSgrid staff are in a position to deal with asset service cluster issues. In this case, there was no prior experience with the issue so staff does not have a ready recipe for resolution.”
Thielker had seen the issue before, he said in a forum post on Thursday.
The grid assets were never in danger, he added. A grid’s assets database is the collection of all the stuff that’s located on the grid or that residents have in their inventories, including textures, scripts and objects.
Some residents were upset that the grid did not keep them posted about the outage as soon as it occurred.
“I have spent two days scouring the website, checking on and asking questions on Twitter,” said OSgrid resident “Frankie Rockett” in a forum post. “I finally found this forum and your post — the fruit of persistence and serendipity.”
“It would be nice if there was an established and known point of contact for checking such things,” he added.
Although better communication was one of the priorities for the new board, residents didn’t learn about what was going on until Thursday, when another former president, Michael Emory Cerquoni — also known as Nebadon Izumi in-world — posted a note on Google Plus and an announcement went out over Twitter.
“Twitter did not even get an info update til four hours ago,” said Darkfyre Algoma in a Google Plus post. “A full 24-plus hours since the issue began.”
“A Tweet, that would also show up on the homepage, would probably be super-helpful for all who wonder what is going on,” said Xmir grid founder Gavin Hird in a Google Plus post. “Even no ETA is better than nothing.”
OSgrid holds weekly resident meetings at Wright Plaza on Saturday at 11 a.m. Pacific time. The hypergrid address is hg.osgrid.org:80:wright plaza.