BIG BUG OF G3 / I lost 10.000 photos

luc7v

Joined: 2010-11-05
Posts: 61
Posted: Sat, 2011-08-13 11:39

Hi guys,

This is an important issue, I hope you'll work with me to find out the cause, IT'S VERY SERIOS AND IT CAN AFFECT OTHERS.

I have a pretty big G3 gallery, about 100.000 photos. Anyway, I had the following problems since the gallery was small, but now it seems more often.

As I can observe, when two users (or just one user from two browsers, because he is "in the hurry") upload photos to G3 albums at same time, the paths stored in relative_path_cache and relative_url_cache of items table goes crazy for some photos and albums, at random (maybe the ones accesed at that time). I think left_ptr/right_ptr are wrong, so relative_path_cache and relative_url_cache is resulting wrong.

When I see problems, I run "Fix your Gallery". Sometimes I must do this twice, because after first time it's not everything right.

And now the big problem: if photos/albums are deleted when something is messed up I said, it's a lottery! If left_ptr/right_ptr are wrong, anything could be deleted without wanting to! I wanted to delete an album on 3rd level (with about 20 photos) and G3 deleted everything down to 2nd level (about 1500 folders with 10.000 photos). They were deleted from disk only, not from database (probably relative_path_cache was wrong).

I'm a programer, just let me know what info you need.

If you can't find the bug quickly, I think more checks are needed before deleting something. You can check if relative_path_cache make sense (if it is wrong, sometimes it doesn't point to a folder on disk). You can show the path that will be deleted and the number of photos that will be deleted fom database and from disk (if they are not the same, something is wrong!).

I hope you can help before another disaster happens!

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Mon, 2011-08-15 18:43

I can't believe that no one is interested about such a bad bug in G3! :(

 
xahmol

Joined: 2011-03-22
Posts: 12
Posted: Tue, 2011-08-16 11:01

Certainly interested, but personally I have never encountered the bug nor can I help to solve the bug. Guess this is true for more visitors here.

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Tue, 2011-08-16 11:53

How many pictures do you have?

How many levels of albums do you have? (I have up to 5 levels: category/name/year/month/day)

Do you have many uploads at same time? (In my case, this messes up relative_path_cache and relative_url_cache pretty quickly)

 
xahmol

Joined: 2011-03-22
Posts: 12
Posted: Tue, 2011-08-16 12:08

Have approx 3,000 pictures in up to 3 levels of albums.

But I never have more than one upload at the same time and also I myself as administrator am the only concurrent user with upload rights.

So in that case it is also quite logical that I do not encounter the error you describe if the trigger is indeed concurrent uploads at the same time.

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Tue, 2011-08-16 12:55

I think most owners upload themselves. But there are more open galleries too.

Looking up in forum, I see others had similar problems when deleting albums and such. As I said, more checks are needed to prevent disasters.

BTW, I use G3.02 on Apache version 2.2.3 with MySQL version 5.1.57.

 
chris quinn

Joined: 2011-08-08
Posts: 11
Posted: Tue, 2011-08-16 13:20

I had similar problems recently (my albums go to about 10 levels) and ended up having to nuke the whole system and reload everything from scratch. Fix your gallery would not run properly - getting to about 75%, then restarting until I cancelled it

I have about 9GB of photos, with lots of empty albums, which were created using Server Load.

Deleting galleries seemed to possibly cause the problems

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Tue, 2011-08-16 14:06

The big problem is on delete, when things are messed up and you don't know. It's not acceptable, for any reason, that a program to delete anything else than expected. In the worst case, the program should stop the deletion until the things are right again.

As I said, if someone from G3 team wants to find the bug, I will provide all my help and info. As programer, I know G3 pretty well already, but mostly on theme and module level, the core is not so easy to understand without help.

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Wed, 2011-08-17 00:48

I have tried to reproduce this issue over and over. I'm unable to reproduce.
That does not mean there is not a bug. I will see if bharat can chime in.
Are you able to reproduce this consistently? If so then it might be easier to get this solved as bharat will get login info from you and trouble shoot as required.

Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Wed, 2011-08-17 13:35

It happens often to find wrong relative_path_cache and relative_url_cache (even many times in same day). I can save parts of items table if you want to see exactly how it looks (or any other info).

I think the problem doesn't show up so often on a small database because the operation is much quicker. In my case, to add a photo or to delete it takes about 5-15 seconds, so it's enough time for conflict to happen.

Maybe someone can explain what happens, step by step, when G3 adds or deletes a photo. Or where I can find the code (what file).

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Sun, 2011-08-21 03:40

Thanks luc7v .. I agree that this is a problem, but thankfully it only affects a very small percentage of our population. I'd like to track it down, but I haven't been able to reproduce it yet. I agree that it's most likely a function of delete. We verify on the database side that we only remove records that have the right match in the adjacency table (ie: we verify that the parent_id matches an item that we want to delete, we don't rely on the MPTT pointers or the path/url caches). But inside the delete code when it comes to deleting the directories we do pull the thumb and resize paths and rely on those.

The problem here is one of locking. If we do pessimistic locking then all operations are slow, so we're minimizing that as much as possible. On small Gallery installs, things stay fast. Without pessimistic locking, I think we'll always have this problem so we have a couple of choices.
1) Introduce pessimistic locking and slow everything down
2) Try to minimize the window where problems can occur

If we do #2 (which we could do by, for example recalculating the various file paths in Item_Model::delete) we'll probably still run into the problem just very rarely.

I suspect that we can do #1 by introducing locking around Item_Model::save and Item_Model::delete so that they can't happen concurrently, and then inside those locked blocks we recalculate the cached paths before using them. If you want to see how we're doing locking now, take a look at modules/gallery/libraries/ORM_MPTT.php at the lock and unlock functions. You could try this in your install by adding $this->lock() and $this->unlock() calls around everything in the save/delete functions in modules/gallery/models/item.php -- just don't forget that you need to catch any thrown exceptions and still release the lock before leaving the function. If that fixes it for you, it *should* be relative cheap to add for all code.
---
Problems? Check gallery3/var/logs
file a bug/feature ticket | upgrade to the latest code! | hacking G3? join us on IRC!

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Sun, 2011-08-21 09:08
bharat wrote:
I agree that it's most likely a function of delete. We verify on the database side that we only remove records that have the right match in the adjacency table (ie: we verify that the parent_id matches an item that we want to delete, we don't rely on the MPTT pointers or the path/url caches). But inside the delete code when it comes to deleting the directories we do pull the thumb and resize paths and rely on those.

I confirm that I never lost something from database, only photo files.

bharat wrote:
2) Try to minimize the window where problems can occur

If we do #2 (which we could do by, for example recalculating the various file paths in Item_Model::delete) we'll probably still run into the problem just very rarely.

This sounds interesting, but how can you be sure that the file paths will be recalculated without mistake?

The big problem is on delete, because of risk of losing data, that is clear. But it's also a problem on accessing photos with wrong relative_path_cache and relative_url_cache (and search engines really don't like this). Is it pessimistic locking the only way to prevent this?

I have an idea: maybe it's better to calculate relative_path_cache and relative_url_cache for all items (in maintenance mode, if needed), then at upload.

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Sun, 2011-08-21 17:37

Pessimistic locking is the only way to be sure because the various operations that affect the caches are not atomic in the database. For example, if you rename a top level album it then has to adjust the cached path for every item under it (in your case, 10k+ items). We do this with a series of database operations, but if you access the data during that window you'll see something slightly wrong. If we really want to do it right, we need to lock the database while this happens or use some heavy transactions. Transactions are not guaranteed to be available in all cases (transaction support is not available in MyISAM which is what most folks use) so we're using something lighter that just acquires a lock.

Remember also that the problem doesn't just happen at upload. If you try my suggestion above it won't damage anything but you'll get an idea of whether or not it's preserving your cache state properly.
---
Problems? Check gallery3/var/logs
file a bug/feature ticket | upgrade to the latest code! | hacking G3? join us on IRC!

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Sun, 2011-08-21 18:58
bharat wrote:
You could try this in your install by adding $this->lock() and $this->unlock() calls around everything in the save/delete functions in modules/gallery/models/item.php -- just don't forget that you need to catch any thrown exceptions and still release the lock before leaving the function.

I understand how to lock/unlock, but I don't know what to do about thrown exceptions. Can you explain, please?

I think pessimistic locking should be an option in G3.

bharat wrote:
For example, if you rename a top level album it then has to adjust the cached path for every item under it (in your case, 10k+ items). We do this with a series of database operations, but if you access the data during that window you'll see something slightly wrong.

In my case, it's extremly rare to modify album paths. So if I fill all relative_path_cache and relative_url_cache when it's no other operation, I'll reduce the risk of conflict. If at some point an operation will reset them, it's no problem.

It should be pretty easy to fill relative_path_cache and relative_url_cache for all items: an iteration and a call to a function. Please tell me how to add this at the end of Fix your Gallery (or as a new command).

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Mon, 2011-08-22 00:59

Sure, you can do it like this:

<?
foreach (ORM::factory("item")->find_all() as $item) {
  $item->relative_path();
}
?>

That will probably time out for 100k items and it'll be very slow. I can look into productionizing it, but it won't be trivial.
---
Problems? Check gallery3/var/logs
file a bug/feature ticket | upgrade to the latest code! | hacking G3? join us on IRC!

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Mon, 2011-08-22 10:24

Thank you.

I think to add a new task:

$tasks[] = Task_Definition::factory()
->callback("gallery_task::fill_cache")
->name(t("Fill cache paths and urls"))
->description(t("It fills relative_path_cache and relative_url_cache, up to 10.000 at a time"))
->severity(log::SUCCESS);

And the function:

static function fill_cache($task) {
foreach (ORM::factory("item")->where('relative_path_cache', 'IS', NULL)->limit(10000)->find_all() as $item) {
$item->relative_path();
}

The code is not tested yet.

How to add an evolution status? I need to see if it takes 10 minutes or 10 days, and a way to stop it :D

 
mr.xy

Joined: 2011-09-20
Posts: 67
Posted: Sun, 2012-03-18 13:22

@luc7v:
Had lost a lot of photos too. Which precautions did you take to prevent further loss?

 
luc7v

Joined: 2010-11-05
Posts: 61
Posted: Sun, 2012-03-18 17:58

The problem is that "Fix your Gallery" actually puts your gallery into big trouble. It clears all relative_path_cache and relative_url_cache fields and until they are filled up, the gallery is sensitive when new pictures are added and pictures without this cache are viewed at the same time. For a small gallery, the risk is low, but for a big gallery is big trouble.

You can put your gallery in read-only mode and wait until relative_path_cache and relative_url_cache are filled in (a few weeks, probably).

Or you can hack G3 as it's said above to speed the process (less than a day for 100.000 pictures). THIS IS SO IMPORTANT, IT SHOULD BE IN THE CORE!

 
mr.xy

Joined: 2011-09-20
Posts: 67
Posted: Tue, 2012-03-20 09:30

@luc7v:
Hmm - would be easy to prevent "Fix your Gallery" from deleting the pathcaches. It happens in /gallery/helpers/gallery_task.php L431.
But since we don't know why they null out the pathes, it is better not to comment that out.

Can you please test the module I' ve written. It signals when using G3 that pathcaches need to be refilled, adds a fillpathcaches-task and prevents hacking G3-core files.

 
mr.xy

Joined: 2011-09-20
Posts: 67
Posted: Tue, 2012-03-20 21:58

Update !!

Use this, because the previous version had a bug in the install-file.

 
jnash
jnash's picture

Joined: 2004-08-02
Posts: 814
Posted: Wed, 2012-03-21 01:34

Seems to work like a charm for me.

mr.xy wrote:
Update !!

Use this, because the previous version had a bug in the install-file.

 
micks80

Joined: 2012-04-22
Posts: 71
Posted: Sat, 2012-06-30 23:44

it looks like we got hit by this issue as well, probably because we upload a lot of photos as well (60 every day) and have about 6000 photos. Some thumbnails are disappearing randomly inside the albums. We clicked 'fix gallery' and all other things under admin->maintenance but nothing worked and then when we checked the backend, all these thumbnails have 'relative_path_cache' and 'relative_url_cache' set to empty. We manually re-created the url's from the name and slug and the thumbnails start showing up again. We thought it would be really painful to do this manually for all the pics when we found the module in this thread and applying it did the magic and everything got fixed.

It would be nice to have the root cause fixed or atleast include this mod in the Core.

Thanks
Mick

 
mr.xy

Joined: 2011-09-20
Posts: 67
Posted: Sun, 2012-07-01 13:13

Glad to hear my module making a good job for you.

For everyone who is working on this problem: I noticed, that there is a big difference between relative_path_cache 'empty' and 'NULL'. I have the feeling, that the code checks 'NULL' but gets confused by an totally empty databasefield. In my testing environment I had some lost thumbnails too. Maintenancerun did not fix it. So I deleted the wrong relative_path_caches and afterwards lost an album at whole.

Maybe there can be some precautions, when normal functions don't do the job, that reasonable manual intrusion doesn't damage the gallery at all.

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Sat, 2012-07-21 18:13

FYI - I've rolled this into the Fix Gallery task. It slows the task down considerably (which is why I didn't want to put in there originally) but you guys made a good case for it. Thanks for the module, mr.xy!
---
Problems? Check gallery3/var/logs
file a bug/feature ticket | upgrade to the latest code! | hacking G3? join us on IRC!

 
mr.xy

Joined: 2011-09-20
Posts: 67
Posted: Mon, 2012-07-30 21:44

Glad to be at least a small part of a great thing :-)

Could you spend a minute on this thread about german Umlauts. They are troublesome in folder- and filename:
http://gallery.menalto.com/node/107113

thxxl