Server crash, backup failure destroys Aesthetica build

The owner of the Aesthetica sim — a rich and detailed artistic build created over the course of several months — learned on Monday that his region was gone for good. And five months of scheduled nightly backups? Never happened.

What happened to his region could happen to other region owners or hosting companies, if they don’t take some simple precautions.

The US-based business consultant, who goes by the avatar name Lee Oldrich on OSGrid, requested that he not be quoted under his real name to preserve his employer’s confidentiality.

The Aesthetica region was hosted by SimHost.

“It’s inexcusable,” said Oldrich. “Their backup system hadn’t just failed — they hadn’t checked them for months.”

An art gallery on the now-defunct Aesthetica region on OSGrid.

An art gallery on the now-defunct Aesthetica region on OSGrid.

One other customer also lost a region, but that region was empty and no content was lost, the hosting company tells us.

Two other regions Oldrich, was running, for his employer’s corporate clients, were hosted with another company — Dreamland Metaverse — and were unaffected.

OpenSim is still “alpha” software, and many consider it to be less reliable and stable than commercial enterprise-class virtual world server software like Second Life Enterprise, ProtoSphere, and Teleplace.

However, last weekend’s loss of Aesthetica was not related to OpenSim technology problems, but other issues. In fact, OpenSim’s build-in region backup feature could have been used to protect again the loss of the build.

There are three ways to back up content in OpenSim.

Builders can back up individual objects by saving them into their inventories or exporting them to their desktops via the export functionality built into the Hippo, Meerkat and Imprudence browsers.

Server managers can back up entire sims by saving copies of the whole OpenSim database. The database backup saves a snapshot of the entire simulator — including terrains, all objects in the simulator database, region settings, and local users and inventories. Database backups are a great short-term way to protect a sim against hardware or software failures since they allow for a restore to the exact state the sim was in at the time of the backup. However, as OpenSim software evolves, database structures may change and the backups lose compatibility.

Finally, region owners can download region archive files, also known as OAR files. These files include the terrains and the objects physically located on the regions. OAR files are standards-based XML files, and are a convenient way for region owners to keep their own copies of regions, and for builders to share their builds with others.

Oldrich’s hosting provider, SimHost, offers both database and OAR backups.

Adam Frisby

Adam Frisby

“We do have a nightly backup procedure,” said Adam Frisby, co-director and the company’s head of research and technology at SimHost’s parent company DeepThink Pty Ltd.

An automated script makes a backup of each simulator’s database and saves it to a separate location, he said. If a script fails, it generates an alert.
However, the script for the Aesthetica region didn’t fail — it simply froze up in the middle of the execution. As a result, no alert was generated, and SimHost didn’t notice that no backup was actually created.

“That part is definitely our fault, and we’ve already done an audit of all our other customers regions to ensure that similar problems are not occurring there,” Frisby said.

Now that the company is aware of the problem, inspection procedures will be adjusted to check for it, he added.

Because there was no backup, when  the drive housing Oldrich’s Aesthetica region was damaged during a move at a colocation facility in Dallas, there was nothing to restore. If there had been a working backup, Oldrich would have lost no more than a day of work.

Frisby said that SimHost attempted, without success, to recover data from the damaged drive. A specialized data recovery service would cost around $3,000, he said.

“We do think that they [the colocation facility] should pay for it, but there’s nothing we can do if they’re not going to,” said Frisby. He declined to provide that company’s name. “As much as I would like to publicly complain about it, I do still want to have a good relationship with them.”
Frisby said that SimHost won’t be paying for the recovery itself.

“This isn’t that we are being callous or heartless here — we have already spent money trying to recover this, and we’d be happy to explore any affordable options,” he said. “It’s just that it is simply too expensive for us as a small business to afford to do for a single customer’s region.”

Meanwhile, Frisby said that SimHost is offering to give credit for free hosting for the months that there was no functioning backup for Oldrich’s region.
In addition, SimHost offers OAR files, but customers have to request each one from SimHost’s James Stallings (“Hiro Protagonist” on OSGrid). Frisby said that it takes about an hour to generate an OAR file and send it to the customer.

However, Oldrich said that he requested OAR files several times.

“I had repeatedly asked them for OARs but they never came through,” he said. “The last OAR they took was November 9, last year.”

Back then, the region was brand-new and almost completely empty. Since then, it had been built up to around 17,000 objects — including painstakingly detailed buildings, a cafe, and an art exhibit, set in a lush natural landscape.

The lesson here for other OpenSim hosting companies is that it’s not enough to rely on automated scripts for backup, and there needs to be additional inpection on a regular basis to ensure that the backup scripts are working. In addition, OpenSim hosting companies need to make self-serve OAR backups available to their customers or generate OAR files automatically for their customers on a regular basis.

There’s also a lesson for OpenSim users — to request OAR files frequently, and especially when major changes have taken place on a region.

“This is a case where we had a freak accident, this is not a normal business operation case,” Frisby said. “We’re both sorry and disappointed that it happened and we’re going to make sure it doesn’t happen to any other customers.”

That might be too late for Oldrich.

“Needless to say, I dropped them,” he said. “Will never recommend them to anyone again.”

For his corporate clients, Oldrich said he opted for Dreamland Metaverse, operated by avatar Snoopy Pfeffer (who also prefers not to use a real name).

Frisby’s SimHost, one of the largest OpenSim hosting providers on OSGrid, currently hosts 80 regions, about a quarter of them on OSGrid and the rest running as private grids. About 40 percent of SimHost clients are corporate clients, mostly small businesses but also “one or two larger ones” he said. The rest are individuals, like Oldrich.

Frisby said that his company has a self-serve management panel under development which would allow customers to download their own OAR backfiles. Currently undergoing testing, the system should be live in about six weeks, he said.

“It’s not something we send automatically to our users, since some of them are very big files,” he said. “Some of them are 400 to 500 megabytes.”

Snoopy (Photo by Hypergrid Business.)

Snoopy Pfeffer (Photo by Hypergrid Business.)

Like SimHost, Dreamland Metaverse currently requires customers to request OAR files, but is also working on setting up a self-serve interface that would allow OAR files to be generated and downloaded easily.

Dreamland Metaverse does daily database backups, and a second set of backups once a week to a separate location.

“So usually my customers would only lose one day of work maximum, unless the data center burns down, in which case they could loose up to one week of work,” Pfeffer said.

Dreamland Metaverse also uses RAID1 technology, Pfeffer added, which allows for better resilience in case of disaster. “Usually no harddrive recovery is necessary, because the damaged harddrive can be exchanged easily, without any service interruption or data loss. After that,m the new harddrive resyncs with the harddrive that was still OK. This means that the probability of data losses for my customers is really low.”

Oldrich said that he’s tested the Dreamland backups and they worked, and he’s happy with their service.

Related Posts

maria@hypergridbusiness.com'

Maria Korolov

Maria Korolov is editor and publisher of Hypergrid Business. She has been a journalist for more than twenty years and has worked for the Chicago Tribune, Reuters, and Computerworld and has reported from over a dozen countries, including Russia and China.

  • Pingback: osgrid-de.info » Blog Archiv » Katastrophen()

  • I am sorry to see that in this high-profile case that the data will not be recovered by the host. High-profile in that it appears in the blogosphere. Upon reading through the host’s website, it seems fairly clear that a customer should feel safe: “We keep monitors on your region day and night. Any failure triggers a notice to our entire support department to get it fixed.”

    Legally, anything goes – ethically though is what matters to customers.

    Is it worth $3000 to attempt to recover the data? That’s not my call and not my place to fight. But it does have ramifications.

    SimHost offers a similar package to what we currently have but with more resources for the same money. While we entertained the idea of using them to be out in OSGrid, this news has changed that thinking.

    What is the value of that? Like many people, we think we have a certain value and importance, but that is highly biased.

    Our current hosting is $175 a month vs. $179.55 (their package minus a 5% discount). The loss of one potential customer is fairly low, but what about what that customer brings?

    What’s the value of a happy customer? After all, our Ener left Second Life and a monthly subscription of $2,920 due to policy changes and lack of acknowledgement. Part of that was in my own efforts of speaking at conferences on the use of Second Life. Add to that the daily blogging Ener does, the 8,000 tagged Flickr pictures, and 20,000 twitter followers.

    But there is no guarantee that if the recovery was attempted that any data would be salvaged. But certainly no data will be recovered as it stands now. It’s an unfortunate case for both the customer and provider.

  • Pingback: backup, backup, and backup again «()

  • Pingback: Tweets that mention Server crash, backup failure destroys Aesthetica build - Hypergrid Business -- Topsy.com()

  • Mike

    I guess the old saying you get what you paid for comes up in this case. You are using Alpha software and having a simulation hosted by a small company running things at agressive rates. The customer should of known tis going into the arrangement.

  • ReactionGrid CTO Chris Hart has just posted a note about how ReactionGrid handles backups for both its on-grid regions, and private grids:

    http://www.reactiongrid.com/About/ReactionGridBlo

    It's a nice overview of how to do this — and what users should expect from all their hosting companies.

    It includes daily automated backups, off-site copies, plus automatic OAR files generated three times a week and instant, self-serve OAR backups at any time.

    There's a reason why ReactionGrid a top destination for enterprises and schools looking for OpenSim hosting.

    — Maria Korolov

    Editor, Hypergrid Business

  • Pingback: links for 2010-04-30 | Metaverse3d.com()

  • Pingback: Benefits of Linux Dedicated Servers | Host Rage()

  • Pingback: Green Geeks |()

  • Walter Balazic

    Same thing just happened to me.

  • Walter —

    Sorry to hear that! What OpenSim hosting provider were you with?

    — Maria

  • Ener Hax

    caveat emptor – it was true whenever that saying was born and true now. if your work is important to you, it is up to you to safeguard it. trusting anyone else is a luxury and means placing your work in their hands. should they be trusted, i would like to think so

    but then. if everything ran as it should, we would have no car accidents, no wars, and i would like that very much