OSgrid begins rebuilding

OSgrid has begun the rebuilding process, grid administrators announced today, even as the latest attempt to recover data starts to approach the finish line.

Around 3 terabytes of data has been recovered as of this morning, out of a total of 3.5 terabytes — or about 21 million files. On November 10, the recovery effort stood at 1.9 terabytes. That’s a rate of .12 terabytes a day, meaning that it will probably take at least four days to finish the recovery process.

Entrance of Wright Plaza on OSGrid.

But rebuilding has begun anyway, so that the grid will be up soon — with, or without, its assets.

“Today we’ll be reprovisioning the old server with new disks and getting a second, identical server, to prepare as a ‘hot and ready standby’,” said James Stallings in a Facebook post yesterday. “This will leave us with two servers always up to date and ready to handle service requests.”

The team had contacted the data recovery services company on Monday to find out the status of the third — and last — recovery effort. There was no response.

“Not precisely encouraging,” said Stallings, who is also known as Hiro Protagonist in-world.

But they aren’t writing off the recovery efforts just yet, he said. It will take a little while to set up the servers and get the grid back up.

“We’ll give those guys a little more time while we get ready,” he said.

Then, this morning, he reported via OSgrid’s Internet Relay Chat channel that the recovery team was approaching the 3 terabyte mark.

“I got new disks in the raid array yesterday and set up the server operating system,”  said Stallings, also known as LeTwitch on IRC, according to a repost of the chat on the OSgrid forums.”Doing some configuration today.”

OSgrid secretary Lawrence Roberts, who is also known as Albertlr Landar in-world. confirmed that the recovery efforts will continue until its clear that they will not succeed in an OSgrid forum post yesterday.

“But in the meantime, instead of just waiting for the results, it was decided to proceed with getting replacement servers and set up a new better backup system for either case, so that when a final determination is made there will be no further delay,” he said. “Regardless of which way things go the new equipment will be needed for us to go forward.”

Grid won’t be up for at least another week

Stallings did not specify how long it would take for OSgrid to be back up, except to say, “it won’t be tomorrow or even next week.”

“But we are no longer dead in the water,” he added. “Once we have this server live and workable via remote access, we’ll order the hot and ready standby server.”

According to Stallings, the new equipment will improve the grid’s technology infrastructure significantly — he called it “OSgrid 2.0.”

“We are going to get a little progressive, and try to prevent this sort of thing from happening again in 18 months,” he said.

Maria Korolov

Maria Korolov is editor and publisher of Hypergrid Business. She has been a journalist for more than twenty years and has worked for the Chicago Tribune, Reuters, and Computerworld and has reported from over a dozen countries, including Russia and China. Follow me on Twitter @MariaKorolov.

9 Responses

  1. geir.noklebye@dayturn.com' Geir Nøklebye says:

    It is good they are going to go ahead no matter the data can be recovered or not. So that is good! 🙂

    The hot standby approach is not a backup solution though. It is a high availability solution.

    The problem with writing data to a mirror server as they do, is that data on both servers can go corrupt at the same time and then they are both hosed.

    An example: A few years ago 81% of all mini banks (ATMs) went offline for over 4 days in the country I live. The reason was that an operator made an error at the console and the production system started to write corrupt data records and at the same time the corruption was written to the mirror location so that after 3 1/2 hours both the prod and mirror system was so corrupt they both crashed. They spent 4 days to only get the mini banks back in operation, but it took them over a year to sort out all the transactions that had happened in the 3 1/2 hours it took to corrupt the system.

    For a test installation / grid the window for corruption of the databases is pretty high, so you need to have a system for snapshotting in combination with (tape) backup that can restore multiple days or even weeks back in time so you are guaranteed not to have a corrupt system to work on. Ideally you need to snapshot before every new software deployment.

    An installation that is used for testing should be controlled both in size and users for it to be manageable and not to introduce all kinds of factors that can influence the testing. So in my opinion the developers need such a grid, and what we call OSGrid, as it was with free for all, should exist as a separate entity.

    It is also important that for whatever they implement, they use and test the asset server software that is in the standard opensim distribution and not a third party module that others don’t use. – Which was the case for the old OSGrid (use a third party module that is.)

  2. sjatkins@mac.com' Samantha Atkins says:

    What an unmitigated disaster and black eye to the largest open opensim grid. I don’t see anything here that tells me the most important thing. Which is what is being done to ensure this sort of thing can never ever happen again. It isn’t rocket science. Anyone that has run a large database or multiple database system knows how to protect the data reasonably well. We still haven’t heard why such measures were not in place or that they will be going forward. Without that I see no reason anyone would ever put much important on osgrid.

    It need not be hard. You have a hot slave and do periodic database dumps from it that you store on S3. It isn’t even terribly expensive. There are even services that are inexpensive that will do that lifting for you.

  3. laranguya@gmail.com' Lara Nguya says:

    Lara knows that OSGrid has now lost most of its followers. Over the years there have been several separate moves/attempts/ploys by various factions in ‘OSGrid authorities’ to get people away from OSGrid and onto other grids. There have been several reasons, often obscure. Some of those have been to do with people wanting the grid to be exclusive (to them and their like) and they created an atmosphere uncomfortable to others that made them move away from the grid – in their droves. Some of it was stupidity. Some of it was selfishness.

    Lara is now suspicious of the current disaster given the spectral nature of past history where strong driving forces tried to change the culture/property/character of OSGrid, despite O = open. Lara resolves all of this in her own mind that it has been brought about through the capriciousness of people. In short, ‘it takes all sorts to make a world’.

    Lara sincerely hopes that in this instance she is wrong in her assumptions.

    • hanheld@yahoo.com' Han Held says:

      >Some of those have been to do with people wanting the grid to be
      exclusive (to them and their like) and they created an atmosphere
      uncomfortable to others that made them move away from the grid – in
      their droves.
      Yep. You nailed it. Wow did you ever nail it.

      For some people, it’s not about winning …it’s about making the other players LOSE.
      And the biggest loser in all of this is the opensim community at large.

      I’d encourage you to check out Metropolis grid if you haven’t already, I think you’ll find a more welcoming and more truly democratic community there.

    • Frank Corsi says:

      Ive never heard of a recovery issue take this long. And all who keep donating to osgrid.. well money may not get this grid back. Newly created ATEKGrid.com has been born to help offer the community a free open to connect grid. Great customer support, check it out if you care to.

  4. setexasrob@hotmail.com' Cathartes Aura says:

    I think most folks are and have been pretty patient with and supportive of the OSGrid admins during this very long lasting outage. A few folks have posted comments questioning why this outage needed to occur in the first place and wanting to know what is being done to prevent it from happening again. A few folks have questioned why the lack of news/information about where the recovery process stands, what is currently being done, and a very rough estimate of when the grid will be back.

    The silence from OSGrid admins is deafening… Hey we support you, we want to see OSGrid back up, and most of us are very patient people. But even supporters of OSGrid become disillusioned over time if we are being treated with contemptuous disrespect.

    Here’s a little common sense advice from this old carrion eater… Put out information on a regular basis – updating us less than once a month is not regular… lmao!! And be truthful about everything that is going on. Put it all out there so people can decide for themselves whether to continue waiting for OSGrid to come back up or if it’s going to be another few months then we can all move on to greener pastures. Basically, show some respect to all the folks that created and made OSGrid the wonderful place that it once was.

    • laranguya@gmail.com' Lara Nguya says:

      Lara couldn’t agree more with you, Cathartes. She supports OSGrid and has done for years. Like many of her supportive friends, she is a builder – one of those creatures who create the fabric that becomes metaverse. But the communities that she used to know in OSGrid are what make metaverse come to life. Lara believes that those who make up these communities are to be shown the respect you mention here.

  5. arpholdings@gmail.com' AviWorlds says:

    Im willing to invest in servers for OSGRID but i say this publicly; OSGRID has to charge a fee for all hosting companies that are making money providing paid hosting services for their customers!
    And the name OSGRID would need to be changed to AVW-OSGRID but it is something that is negotiable.
    Im willing to save OSGRID and help make it self sustainable. Donations alone is not a safe business module.
    Here it goes.
    1-10 regions free. By the same IP same person
    1-100 regions 150.00 per month and this would apply to opensim hosting companies that make money hosting regions in OSGRID.
    I would want recognition off course. Some kind of re branding or me being one of the owners or directors.
    I can offer servers that will be paid by me. I could open another grid like OSGRID BUT…OSGRID is already established and why would I waiste time … A STRONG OSGRID WOULD KICK BUTT!!
    Waiting to hear from OSGRID

    [email protected]

  6. davegumm@mail.com' Dave Gumm says:

    I think OSGrid users should hang in there just a while longer for the OSGrid reopening we all miss it as home I for one look forward to the great return and have no intrest in these small unheard of new grids trying to take OSGrids place