OSgrid lost a chunk of new inventory items over the course of this past month, but the problem doesn’t affect other grids.
“It appears that for the last month or so, the asset server was unable to write assets to the database,” OSgrid administrator Allen Kerensky said in yesterday’s announcement.
The grid has not yet released numbers as to what percentage of the new content was lost, but some residents are reporting that up to 75 percent of their content was okay.
The fact that inventory items were being cached by viewers or individual regions hid this problem, he explained.
It only became apparent when the asset server was restarted for emergency patching on Thursday, April 16.
The grid has already taken steps to correct the problem, he added.
“Several changes have been made to the asset cluster to attempt to prevent a repeat of this type of event, or detect and report it if it starts to happen again,” he said. “OSgrid has changed the database indexing system, added new logging and exception reporting, and additional database write reporting. Additional asset write-and-verify tests are being implemented going forward.”
As a result of the fixes, the database is now working correctly, he said.
Grid residents should now restart their regions and try to reload the assets, if they can, from their original sources, or try to recover them from IAR or OAR backups, he said.
“We feel your pain with these ‘missing asset’ messages, as the OSgrid admin inventories and assets were not immune and are experiencing the same issues you are,” Kerensky concluded. “We deeply apologize for any inventory or asset issues that you have encountered.”
OSgrid residents were mostly philosophical about the inventory losses. They’ve recently returned to the grid after a six-month-long outage, so they are prepared for difficulties.
However, some were unhappy about the way the grid communicated about this issue.
“I do have a real complaint about grid management not doing a good job about getting the word out on something like this,” said Danko Whitfield in a comment on Google Plus. “That is a much bigger problem. That is an area that other grids do much better at than OSgrid.”
For example, residents pointed to the title of yesterday’s announcement — “Asset cluster maintenance complete” — which seemed worded so as not to catch anyone’s attention.
Other grids not affected
OSgrid is running a unique asset management system, called FSAssets, which has not yet been released to the public, said Zetamex CEO Timothy Rogers.
“So this is an issue specifically for OSgrid,” he told Hypergrid Business.
He added that Zetamex uses a different system for the grids it hosts.
“Our version is much like our Zetamex Inventory,” he said. “It does daily self checks and is also backed up every 15 minutes and replicated six times.”
Zetamex system is not the same as the default asset system that comes with OpenSim, but he says he hasn’t seen any asset losses happen there, either.
“The stock assets server is very strong,” he said. “It just gets slower as it gets bigger, which is why we use a custom one. Now I have seen some loss with the XAsset service which is experimental in OpenSim anyways. But stock is fine.”
Sudden shutdowns can also affect databases
Even if a grid’s database is running smoothly, it can still be adversely affected by things like sudden power outages and computer crashes.
These problems can affect even grids without a complex, large scale infrastructure, Avination grid founder and OpenSim core developer Melanie Thielker told Hypergrid Business.
“MySQL and MariaDB have both got weaknesses when the machine they are running on is shut down unexpectedly by a power cut,” she said.
Thielker has been working with OSgrid on their new asset management infrastructure.
“For OSgrid, we added some options that make this less likely,” she said. “But it can still happen that table corruption occurs and a table is marked as crashed, thereby not allowing any access until it it repaired. Likewise, filesystems can get corrupted by power cuts as well.”
Overloaded systems can also damage data, she added, which can sometimes happen when servers are hit by denial-of-service attacks.
She recommends that grid owners have uninterruptable power supplies in areas where power outages are likely, and avoid “hard restarts” of their servers as much as possible.
“Finally, one should install monitoring and alerting tools so issues are noticed right away,” she said.