Gallery 3 and Azure Storage

Serge D
Serge D's picture

Joined: 2009-06-11
Posts: 2466
Posted: Wed, 2010-11-10 18:48

MS gives away 25Gb of free storage
would it be possible to create something similar to WordPress media Azure storage?

http://wordpress.org/extend/plugins/windows-azure-storage/

Windows Azure SDK for PHP Developers - http://phpazure.codeplex.com/
and all of the rest here - http://www.microsoft.com/web/platform/phponwindows.aspx

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7993
Posted: Fri, 2010-11-19 17:48

It's definitely possible. The trick is that we'd probably have to create an abstraction layer above the filesystem which makes everything a little bit heavier. It seems like this type of problem would be better handled as an Apache2 or PHP extension; ie what if gallery3/var/albums was backed by Azure? Gallery 3 wouldn't even have to know about it.
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Fri, 2010-11-19 18:34

i think if you tried hard enough this can be done. there's a similar plugin for wordpress's media upload feature, uploading to amazon s3 as opposed to local storage (but giving the appearance of uploading to local storage). as far as wordpress is concerned, the media is local - just it's re-written when the url is vended on the page.

25gb free azure storage is tempting enough for me to try it...

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7993
Posted: Fri, 2010-11-19 19:10

If we try hard enough, anything can be done. :-) But I'm not motivated to try very hard on this. Perhaps somebody else is.
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Fri, 2010-11-19 19:14

i might try this with amazon s3. could code in the azure api after the fact if it works well..

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7993
Posted: Fri, 2010-11-19 20:52

Awesome. Ask questions and I'll give guidance.
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Mon, 2010-11-22 21:03

Hi bharat.
I've got the module interacting with S3 well now, events are sorted. What I can't work out is how to extend the Item_Model class so I can manipulate the URLs that are outputted into the templates/HTML.
Help appreciated.
Dan

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Tue, 2010-11-23 21:33

never mind; i worked it out. took a bit of digging but i got there eventually.

for future reference (and correct-me-if-i'm-wrong's):

in the individual module folder, create a "models" directory. in there, create a file for the model you want to overload/extend. in this case, it's an Item_Model. then start the php file as such:

class Item_Model extends Item_Model_Core { }

anything i add in this class/file here seems to get added into, or overload, the functions that already exist, thus allowing me to manipulate the URL's provided by this model so i can redirect it to Amazon S3 as opposed to local storage.

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-11-24 12:46

ok, i'm nearly done with the S3 module, which i should be able to post online later tonight.

as far as azure goes, i've just created an account there for storage and i'll copy the s3 module and create an azure version. maybe later i'll combine the 2 into a 'cdn' module where you can select which cdn you want to use.

Dan

 
Serge D
Serge D's picture

Joined: 2009-06-11
Posts: 2466
Posted: Wed, 2010-11-24 17:48

Great! since you do have a great experience with storage API, maybe you would tackle this editor integration?

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Wed, 2010-11-24 18:33

Oh, you've got to be kidding! I was literally just about to start working on the exact same thing as danneh3826! So this is awesome. Are you using s3fs to mount the s3 bucket into the OS's file system?

Here's an old thread on trying to do the same thing with Gallery 2:

http://gallery.menalto.com/node/77816

A link from a few days ago shows that this guy was working on the same type of thing as well for Gallery 3:

http://write2it.net/2010/09/an-early-attempt-at-integrating-gallery-3-with-amazons-cloudfront/

So, definitely post your stuff, danneh3826!

˳® bromide      
                      

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-11-24 18:44

nah, i'm using basic PUT and DELETE HTTP requests to move stuff around the S3 bucket. i don't know how well it'll work with cloudfront. i don't actually know the differences between the 2. this module is strictly for S3, the redirected url's are hard coded as http://<bucket>.s3.amazonaws.com/ etc.

i guess as the module evolves we can get more options in there. i think i might turn this into a generic "CDN" module, and have wrappers plugged into it for S3, Azure, Cloudfront (if it's different from S3), and open the doors for others (or myself) to write in compatibility for more/future CDN systems, or if you have your own system/CDN if you're that way inclined! but for now, it's S3 only.

i can only write for accounts to services i have access to though. that write2it post has a little of what i've written, except i've not modified any of the g3 core code, this is completely self-contained in the module and completely removable if required and returns g3 to its original state. this also handles new uploads/deletes/moves and hooks into the album thumb creation to upload the album thumbnail as well once it's created.

i'll post it later, it's nearly complete. just gotta write in the task to upload the gallery's existing contents, and test it on my public server.

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Wed, 2010-11-24 19:23

S3 isn't a content delivery network, it's just for storage - so the use case for S3 is that you don't have enough space in your regular web hosting account and you want to offload all of the Gallery image files to S3. Or possibly, you want super-high-reliability backup and redundancy for your image files. It's for the same purpose you might put the files on a SAN (storage area network) or NAS (network attached storage) solution.

CloudFront is the content delivery network - it's basically a massive worldwide distributed cache of your site. It speeds up delivery of your content (html and images) to people browsing from all over the world and if its use is designed properly it can make your site able to withstand a massive spike in traffic that would bring it down if you were only relying on your single web server. But CloudFront doesn't require S3, it can simply cache everything from your web server.

You really ought to make the URL modifiable, maybe as a module setting - I use a virtual bucket with S3 so my URLs go through my own domain. (Also - maybe an option to turn off trying to manage the file transfer yourself for people who are using something like s3fs. Or maybe a separate version of your module, which I could make, I dunno.)

˳® bromide      
                      

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-11-24 19:16

thanks for clearing that up for me. the only time i've ever used s3 before is on the espn website which uses s3 as its cdn. i didn't set that up, so could only assume that's what it was for and how it worked. (still getting into the whole cloud thing, me! lol)

i'll switch cloudfront on in my aws account later and get that working as well. and good point with the virtual buckets. i'll put an option in to specify a custom url to the content (be-it s3, cloudfront, etc).

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Wed, 2010-11-24 19:39

Also, for anyone else who is doing this sort of thing: for security in S3 you can set a bucket policy that requires images to be retrieved with the HTTP Referer of your site. It's far from ideal for security, of course, since the HTTP Referer can easily be spoofed, but I haven't figured out any way to integrate with Gallery3's login and permissions system.

You have to write the bucket policy in a JSON-based language but once you've done that you can set them via the AWS Management Console (under "permissions"). My policy looks like this

{
	"Version": "2008-10-17",
	"Id": "mybucket policy",
	"Statement": [
		{
			"Sid": "http referer policy statement number 1",
			"Effect": "Allow",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::mybucket/*",
			"Condition": {
				"StringLike": {
					"aws:Referer": "http://mysite.com/*"
				}
			}
		}
	]
}

˳® bromide      
                      

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-11-24 19:48

yes, the whole permissions thing got me thinking a bit too. however, g3 won't vend the html to the image if you don't have access to the image/album. now, to start with, i'm just copying the directory hierarchy into the bucket under a unique id, but it could be made, i suppose, to use an un-guessable image path within the bucket, so that you can only get access to the image if you know the url to it (and you'll only get that if g3 deems you have permission to the item).

it's far from perfect security, but i guess as far as using a cdn is concerned, you're more interested in getting your content out, not keeping it away from people. maybe i can implement an option as well to not syncronise permission-restricted content to the cdn (i.e; only sync content that everyone has access to). i suppose that's going to be the only way you're going to 100% keep your content from prying eyes if you're that way inclined..

Dan

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Wed, 2010-11-24 20:20

In my case, it's a private site and ALL of the content is permission-restricted, so not putting it on S3 isn't an option. ;^)

Another security possibility with S3 is the query string authentication, which is what the AWS Management Console uses, but that's more complicated (and more processor-intensive) since you have to generate the signatures.

You might take a look at s3fs and other filesystem-to-S3 solutions like S3QL (great comparison by between most of them by S3QL's nikratio here) to see what kind of options they've implemented and what kind of problems they've had to solve.
˳® bromide      
                      

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-11-24 20:25

cool. thanks for the links and stuff. i'll certainly be reading up on all these problems and attempting to implement solutions to them into this module.

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-11-24 22:45

ok, here's how i've done it thus far.

when an item is created, it's uploaded to s3 with a permission of "public-read" unless the "everyone" group in g3 doesn't have access to the item, in which case it's given a permission of "authenticated-read". the s3 library i'm using will kick out an authenticated url to the object on s3 with a lifetime of 60 seconds (probably will be customisable in the finished product). since gallery will spit out the html with the image url, if the image url gets out, the signature will expire and access no longer granted.

i'm going to go with this method for now and see how it pans out, both practically and performance wise. i can make adjustments later on down the line. i won't be posting it tonight, but i'm going to polish it up and put some more options in the admin screen to help it suit most s3 users, and probably post it tomorrow.

@bromide; you say you use a virtual bucket. can you provide me with a sample url (or pattern), just so that i can make sure i can provide adequate customisation for the returned url irrespective of whether you're using just s3, cloudfront, or a virtual bucket on your own domain.

Dan

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Wed, 2010-11-24 22:57

Virtual buckets are pretty straightforward - you just name the bucket as your subdomain and then in your DNS you set that subdomain to forward to Amazon. So the URL

ht​tp:​//static.mydomain.com.s3.amazonaws.com/favicon.png

is

ht​tp​:/​/static.mydomain.com/favicon.png

˳® bromide      
                      

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Wed, 2010-11-24 23:13

One other thing I noticed in your last post there - when an uploaded image isn't accessible by the "Everyone" group in G3, don't you want to make it "private" on Amazon? If you make it "authenticated-read" then anyone with an Amazon AWS account can get access to it.

And one last thing - something that's a parallel issue in s3fs is that by default, for the REST services it goes through ht​tp:​/​/s3.amazonaws.com, which means that your Access Key ID and your Secret Access Key are sent in plain text over the network, so he added an option to authenticate through ht​tps​:/​/s3.amazonaws.com (which IMO ought to be the default as it's the more secure option and using HTTPs does not cause any problems to my knowledge.)
˳® bromide      
                      

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-11-24 23:14

ahh gotcha. that's straight forward enough then. is cloudfront the same? same paths just a different (sub)domain that it points to?

 
bromide
bromide's picture

Joined: 2010-08-20
Posts: 28
Posted: Wed, 2010-11-24 23:18

I haven't used CloudFront, just read the FAQs on it, but that's my impression - you just have your own cloudfront domain and all of your stuff appears under it at the same paths.
˳® bromide      
                      

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Thu, 2010-11-25 18:39

yup, i got that. i changed authenticated read to private. i wasn't aware that the authenticated query string would allow access to private objects.

gonna polish up the admin interface and i'll have something later this evening :)

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7993
Posted: Thu, 2010-11-25 22:09

This is awesome. Great work!
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Fri, 2010-11-26 21:52

hihi.

thanks, Bharat :)

as promised, here's a roughly tested version 1 of said module: http://www.danneh.org/2010/11/amazon-s3-for-gallery-3/

i've got my gallery syncronising to s3 as i write this. i don't have many items in my gallery installation, but you can check it out working here: http://www.danneh.org/gallery3/

works so far so good. couple of caveats though which i'm working on:
1. does not work with cloudfront - kind of. if your g3 items are all public, you won't have any issues with cloudfront, it'll serve up the url as expected. private objects are somewhat more tricky though, as the S3 QSA (Query String Authentication) method is slightly different for cloudfront, and i'm not yet finding a light enough module pre-written that can calculate the signature for me. until i do, or i can write my own, this won't be supported.
2. ssl transfer also works - kind of. i've had issues with getting curl to trust the ssl certificate when communicating with s3, so i'd recommend leaving ssl turned off until i can resolve these issues.

plus sides:
- multiple g3 instances are able to use the same bucket. upon installation, it creates a unique id (ok, it's an md5() of time(), but it's good enough for the purpose) that represents that g3 installation, and it uses this uid in all the urls it spits out. as a result, i've got both my local test g3 and my danneh.org g3 running in the same danneh-org bucket. i know buckets can be created as and when, i just figured this was simpler. you don't have to use it, just remove the {guid} in the url template in the admin if desired.
- if you use the embedlinks module, this module with override some of the features in embedlinks preventing the display of links to private items (since the QSA will expire, they won't be good for long, so it's pointless having it). this at least allows you to use embedlinks for your public content if you have any.
- there's a task which can be run to upload the contents of your gallery to s3 if you're installing on a g3 installation that's already populated.

anyway, url's above. i'll make a codex page for it tomorrow. let me know how things work out (success or fail).

Dan

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7993
Posted: Sat, 2010-11-27 22:57

Totally awesome, Dan. Can you get this into a github fork so that we can pull it into the contrib modules? This is going to make a lot of people happy!
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Sat, 2010-11-27 23:50

i certainly will. just going to tidy the code up a bit, re-upload it to my website and i'll push it into my git fork later on tomorrow. i've got a transcode fix that needs pulling into master also, so will send a pull request for both at the same time :)

Dan

 
Shai

Joined: 2003-12-10
Posts: 61
Posted: Wed, 2010-12-01 20:28

Dan, you are my this weeks' god :) Where can we post bugs/issues/questions? Is there documentation for your module?

Thanks!!!! This is a fantastic addition to my Gallery :)

Q1: What files are actually uploaded when this module is initially installed and beings to sync? Are the originals saved on S3? Would my originals be up on S3 or just thumbnails of them? I noticed files on S3 being uploaded are just 10k (approx.) and not 3-4MB as the originals (most of them) are.

Q2: How would this module interact or (maybe) interfere with Keep Original Module?

Q3: After sync is complete, does it remove the images from the storage on the current webhost?

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-12-01 21:20

hi shai;
thanks! :D
to answer your questions, codex page is here: http://codex.gallery2.org/Gallery3:Modules:aws_s3, thread for bug reports here: http://gallery.menalto.com/node/99424

1. when you initially sync (and upload), it uploads the thumbnail (var/thumbs), resize (var/resizes) and full sized image (var/albums). this module, currently, simply redirects the request from the g3 server over to s3. the file paths are kept intact save a couple of changes. since the idea of s3/cdn is for load handling, it only makes sense to have all the images (generated and originals) up on s3 so reduce bandwidth on the g3 server to just html and the g3 skin.
2. i've not tested it with the keep originals module, only the embedlinks module. however, i'll run some tests and see what actually happens as to whether it'll work properly or not. there's a few enhancements and fixes to go into v2, so if it doesn't work correctly, i can make it so.
3. no, this module only ever modifies the s3 storage bucket it's assigned to (and only files prefixed with the g3id folder path). the idea being that you can simply "disable" the s3 module and all requests end up back on the g3 server. if the s3 module is disabled using it's admin page (and not via modules), syncing will continue to happen when items are added/moved/deleted, but url redirection will not happen - useful for "i'm using s3 for backups only" implementations. if the module is disabled via the modules page, obviously none of this will happen and the file systems will become out of sync when any item manipulations happen.

Dan

 
Shai

Joined: 2003-12-10
Posts: 61
Posted: Wed, 2010-12-01 22:03

Well... that answered all my current questions :) Thanks man! This is a great module! Later in time I will try and figure out what CDN is(?) ...

 
danneh3826
danneh3826's picture

Joined: 2007-08-18
Posts: 290
Posted: Wed, 2010-12-01 22:10

:)

CDN = Content Delivery Network http://en.wikipedia.org/wiki/Content_delivery_network - amazon's version is called CloudFront.

in a nutshell, it's a very useful tool to help high-traffic sites overcome bandwidth problems by serving static content from servers closer (physically) to the end user than the hosting server might be. also reduces load on the serving server so it can concentrate more on doing the php/vending html than serving thousands of static images/js/css files.

 
Shai

Joined: 2003-12-10
Posts: 61
Posted: Wed, 2010-12-01 22:11

Actually, I just noticed that when enabled, it won't show any images (not in the Admin Dashboard, nor in the main gallery). No thumbnails, no resized images and no full sized images. Nothing shows up when this module is enabled (SSL is currently disabled and sync went just fine and finished successfully.

 
Shai

Joined: 2003-12-10
Posts: 61
Posted: Wed, 2010-12-01 22:23
Shai wrote:
Actually, I just noticed that when enabled, it won't show any images (not in the Admin Dashboard, nor in the main gallery). No thumbnails, no resized images and no full sized images. Nothing shows up when this module is enabled (SSL is currently disabled and sync went just fine and finished successfully.

I posted this question where it belongs.