Server crash, backup failure destroys Aesthetica build

The owner of the Aesthetica sim — a rich and detailed artistic build created over the course of several months — learned on Monday that his region was gone for good. And five months of scheduled nightly backups? Never happened.

What happened to his region could happen to other region owners or hosting companies, if they don’t take some simple precautions.

The US-based business consultant, who goes by the avatar name Lee Oldrich on OSGrid, requested that he not be quoted under his real name to preserve his employer’s confidentiality.

The Aesthetica region was hosted by SimHost.

“It’s inexcusable,” said Oldrich. “Their backup system hadn’t just failed — they hadn’t checked them for months.”

An art gallery on the now-defunct Aesthetica region on OSGrid.
An art gallery on the now-defunct Aesthetica region on OSGrid.

One other customer also lost a region, but that region was empty and no content was lost, the hosting company tells us.

Two other regions Oldrich, was running, for his employer’s corporate clients, were hosted with another company — Dreamland Metaverse — and were unaffected.

OpenSim is still “alpha” software, and many consider it to be less reliable and stable than commercial enterprise-class virtual world server software like Second Life Enterprise, ProtoSphere, and Teleplace.

However, last weekend’s loss of Aesthetica was not related to OpenSim technology problems, but other issues. In fact, OpenSim’s build-in region backup feature could have been used to protect again the loss of the build.

There are three ways to back up content in OpenSim.

Builders can back up individual objects by saving them into their inventories or exporting them to their desktops via the export functionality built into the Hippo, Meerkat and Imprudence browsers.

Server managers can back up entire sims by saving copies of the whole OpenSim database. The database backup saves a snapshot of the entire simulator — including terrains, all objects in the simulator database, region settings, and local users and inventories. Database backups are a great short-term way to protect a sim against hardware or software failures since they allow for a restore to the exact state the sim was in at the time of the backup. However, as OpenSim software evolves, database structures may change and the backups lose compatibility.

Finally, region owners can download region archive files, also known as OAR files. These files include the terrains and the objects physically located on the regions. OAR files are standards-based XML files, and are a convenient way for region owners to keep their own copies of regions, and for builders to share their builds with others.

Oldrich’s hosting provider, SimHost, offers both database and OAR backups.

Adam Frisby
Adam Frisby

“We do have a nightly backup procedure,” said Adam Frisby, co-director and the company’s head of research and technology at SimHost’s parent company DeepThink Pty Ltd.

An automated script makes a backup of each simulator’s database and saves it to a separate location, he said. If a script fails, it generates an alert.
However, the script for the Aesthetica region didn’t fail — it simply froze up in the middle of the execution. As a result, no alert was generated, and SimHost didn’t notice that no backup was actually created.

“That part is definitely our fault, and we’ve already done an audit of all our other customers regions to ensure that similar problems are not occurring there,” Frisby said.

Now that the company is aware of the problem, inspection procedures will be adjusted to check for it, he added.

Because there was no backup, when  the drive housing Oldrich’s Aesthetica region was damaged during a move at a colocation facility in Dallas, there was nothing to restore. If there had been a working backup, Oldrich would have lost no more than a day of work.

Frisby said that SimHost attempted, without success, to recover data from the damaged drive. A specialized data recovery service would cost around $3,000, he said.

“We do think that they [the colocation facility] should pay for it, but there’s nothing we can do if they’re not going to,” said Frisby. He declined to provide that company’s name. “As much as I would like to publicly complain about it, I do still want to have a good relationship with them.”
Frisby said that SimHost won’t be paying for the recovery itself.

“This isn’t that we are being callous or heartless here — we have already spent money trying to recover this, and we’d be happy to explore any affordable options,” he said. “It’s just that it is simply too expensive for us as a small business to afford to do for a single customer’s region.”

Meanwhile, Frisby said that SimHost is offering to give credit for free hosting for the months that there was no functioning backup for Oldrich’s region.
In addition, SimHost offers OAR files, but customers have to request each one from SimHost’s James Stallings (“Hiro Protagonist” on OSGrid). Frisby said that it takes about an hour to generate an OAR file and send it to the customer.

However, Oldrich said that he requested OAR files several times.

“I had repeatedly asked them for OARs but they never came through,” he said. “The last OAR they took was November 9, last year.”

Back then, the region was brand-new and almost completely empty. Since then, it had been built up to around 17,000 objects — including painstakingly detailed buildings, a cafe, and an art exhibit, set in a lush natural landscape.

The lesson here for other OpenSim hosting companies is that it’s not enough to rely on automated scripts for backup, and there needs to be additional inpection on a regular basis to ensure that the backup scripts are working. In addition, OpenSim hosting companies need to make self-serve OAR backups available to their customers or generate OAR files automatically for their customers on a regular basis.

There’s also a lesson for OpenSim users — to request OAR files frequently, and especially when major changes have taken place on a region.

“This is a case where we had a freak accident, this is not a normal business operation case,” Frisby said. “We’re both sorry and disappointed that it happened and we’re going to make sure it doesn’t happen to any other customers.”

That might be too late for Oldrich.

“Needless to say, I dropped them,” he said. “Will never recommend them to anyone again.”

For his corporate clients, Oldrich said he opted for Dreamland Metaverse, operated by avatar Snoopy Pfeffer (who also prefers not to use a real name).

Frisby’s SimHost, one of the largest OpenSim hosting providers on OSGrid, currently hosts 80 regions, about a quarter of them on OSGrid and the rest running as private grids. About 40 percent of SimHost clients are corporate clients, mostly small businesses but also “one or two larger ones” he said. The rest are individuals, like Oldrich.

Frisby said that his company has a self-serve management panel under development which would allow customers to download their own OAR backfiles. Currently undergoing testing, the system should be live in about six weeks, he said.

“It’s not something we send automatically to our users, since some of them are very big files,” he said. “Some of them are 400 to 500 megabytes.”

Snoopy (Photo by Hypergrid Business.)
Snoopy Pfeffer (Photo by Hypergrid Business.)

Like SimHost, Dreamland Metaverse currently requires customers to request OAR files, but is also working on setting up a self-serve interface that would allow OAR files to be generated and downloaded easily.

Dreamland Metaverse does daily database backups, and a second set of backups once a week to a separate location.

“So usually my customers would only lose one day of work maximum, unless the data center burns down, in which case they could loose up to one week of work,” Pfeffer said.

Dreamland Metaverse also uses RAID1 technology, Pfeffer added, which allows for better resilience in case of disaster. “Usually no harddrive recovery is necessary, because the damaged harddrive can be exchanged easily, without any service interruption or data loss. After that,m the new harddrive resyncs with the harddrive that was still OK. This means that the probability of data losses for my customers is really low.”

Oldrich said that he’s tested the Dreamland backups and they worked, and he’s happy with their service.

Maria Korolov