Three-day OSgrid outage caused by cluster issues

OSgrid’s Wright Plaza at Plaza.

OSgrid was down on Wednesday, Thursday and Friday of this week when the grid was no longer able to access its asset services, but other grids probably don’t need to worry about the same thing happening to them.

“It was an issue with the cluster and was not related to OpenSim,” grid president Dan Banner told Hypergrid Business. “It has since been corrected and steps taken to avoid future downtime.”

Avination grid president Melanie Thielker fixed the problem. She was the one who originally built the cluster infrastructure for OSgrid after its six-month outage in 2014.

Thielker was moving house when the outage occurred, she told Hypergrid Business, which delayed the repair process until she could get back online.

“The asset service cluster has some issue that causes a deadlock of sorts, preventing ROBUST (the asset service software) from starting,” said OSgrid board member James Stallings, also known as Hiro Protagonist in-world, who was president of the grid until Banner took over in April. “This is not unusual, though the nature of the issue is. Typically, OSgrid staff are in a position to deal with asset service cluster issues. In this case, there was no prior experience with the issue so staff does not have a ready recipe for resolution.”

Thielker had seen the issue before, he said in a forum post on Thursday.

The grid assets were never in danger, he added. A grid’s assets database is the collection of all the stuff that’s located on the grid or that residents have in their inventories, including textures, scripts and objects.

Some residents were upset that the grid did not keep them posted about the outage as soon as it occurred.

“I have spent two days scouring the website, checking on and asking questions on Twitter,” said OSgrid resident “Frankie Rockett” in a forum post. “I finally found this forum and your post — the fruit of persistence and serendipity.”

“It would be nice if there was an established and known point of contact for checking such things,” he added.

Although better communication was one of the priorities for the new board, residents didn’t learn about what was going on until Thursday, when another former president, Michael Emory Cerquoni — also known as Nebadon Izumi in-world — posted a note on Google Plus and an announcement went out over Twitter.

“Twitter did not even get an info update til four hours ago,” said Darkfyre Algoma in a Google Plus post. “A full 24-plus hours since the issue began.”

“A Tweet, that would also show up on the homepage, would probably be super-helpful for all who wonder what is going on,” said Xmir grid founder Gavin Hird in a Google Plus post. “Even no ETA is better than nothing.”

OSgrid holds weekly resident meetings at Wright Plaza on Saturday at 11 a.m. Pacific time. The hypergrid address is plaza.

Maria Korolov

Maria Korolov is editor and publisher of Hypergrid Business. She has been a journalist for more than twenty years and has worked for the Chicago Tribune, Reuters, and Computerworld and has reported from over a dozen countries, including Russia and China. Follow me on Twitter @MariaKorolov.

  1.' Avia Bonne says:

    Communication about the outage has been started Thursday July 7th at G+ at the OSG group there.
    So it is not all true that we didn’t communicate at all.
    For those who only look at Twitter and the forum, I would highly recommend to join the FB-group of OSG or OSG at Google Plus, which is highly read by members and there will be placed the latest news, editted by Foxx Bode and myself.
    We noticed that that are the places which works very well when we place notes.

  2.' XMIR Grid says:

    UPDATE: Sorry folks, the grid is offline until issue with assets is resolved permanently.
    – Which is a real shame OSG9B and all 🙁