"Special characters" not permitted in Internet address???!!!

OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Sun, 2011-03-13 22:02

I'd like to use some letters that have diacritics. Since most browsers are able to handle "special characters", I'm wondering if there is a specific reason why these aren't permitted in the internet address field in G3.

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Mon, 2011-03-14 04:04

I would suspect that some browsers at the time of development did not have that support.

Dave

____________________________________________
Blog & G2 || floridave - Gallery Team

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:24

For 3.0.0 perhaps, but the same can't be said for 3.0.1 ... so for the current version I'd go with oversight, rather than an actual conscious decision. That's good news for me then, as it means it may be fixed in a future version.

In the meantime, any suggestions as to how I can hack a fix?

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:24

Anybody?

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:24

I am looking to edit these in the database. I think the relevant table is g3_items

What's the difference between:
name
relative_path_cache
relative_url_cache
slug

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:24

Ok, seems name and slug are simple enough.
Name is album name and does not play a role in generating a URL.

Slug is essentially the last part of the URL...the identifier for the album...the slug as generally used in reference with URLs....so, pretty standard..

I still strruggle with
relative_path_cache
relative_url_cache

Can someone please explain their role?

Also, when I changed internet address for an album, relative_url_cache changed, but not relative_path_cache....how come?

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Thu, 2011-04-21 14:16

I suspect if you ever run the Fix Gallery maintenance task you're changes will get messed up...

Would just waiting and seeing what we come up with work for you? This is a problem we are trying to solve. The big problem is what unicode characters are safe and what are not? Here's the discussion that's happened recently on the -devel mailing list:
http://old.nabble.com/URL-Validation-to31309979.html
____________________________________________
Like Gallery? Like the support? Donate now!

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:25

I hope you are able to solve it soon.

I didn't quite understand the issue around security though. What are the concerns? Could you explain with examples?

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Thu, 2011-04-21 17:33

Ever hear of cross-site scripting or sql injection hacks?

If you only allow for a small set of characters (0-9, a-z, -, _ for example) it's very easy to filter and contain those. If you just blindly accepted everything some characters can trick the server into doing things not intended, like either serving up files it's not suppose to or creating files or modifying information in the DB it's not suppose to. Allowing unicode characters is HUGE http://www.i18nguy.com/unicode/char-count.html

What's legit, what's in common use, what can cause problems? We also don't have language experts for every language available to us, maybe we don't or won't need them, but how are we going to be able to tell if a certain character with a certain language install on the server isn't going to cause problems? For example there are over 47,000 Chinese characters... http://en.wikipedia.org/wiki/Chinese_character lots of Kanji characters as well... http://en.wikipedia.org/wiki/Kanji#Total_number_of_kanji

The problem is much larger than just an umlaut here or there :) Read the 2 posts in the -devel thread I posted a link to above.
____________________________________________
Like Gallery? Like the support? Donate now!

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:25

I read the 2 messages at the link you posted but am a bit thick technologically and didn't quite understand what it was all about.

I ran a quick search on the net and so far only found this http://www.schneier.com/blog/archives/2005/02/unicode_url_hac_1.html
I believe the issue being discussed on this link is not quite relevant - G3 users provide slugs of URLs not domain names within the program...the domain is picked up directly from the server (if I understand correctly). So I don't quite see how the phishing issue is still alive.

Regarding XSS - first of all, is this a major issue at all? I know some popular opensource ecommerce scripts that have this vulnerability, but believe the risk is minimal. G3 handles much less sensitive information, so I would imagine it's nothing more than a nuisance. But more to the point, I am not quite sure how UTF8 encoding could make G3 more vulnerable.

I think while XSS, URL injection and others are real threats/risks, G3 should evaluate:
1) the probability of exploitation of the vulnerability, and
2) more importantly, the potential impact if a vulnerability were to be exploited

Based on this is there a serious security threat caused by utf8?

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Thu, 2011-04-21 20:44
Quote:
Regarding XSS - first of all, is this a major issue at all?

YES!!
I don't want some user that I allow to upload items to my gallery to add a script that will delete all my content or cause some other catastrophic issue.
All security issues have to be taken seriously.

Our language/security expert is sick at this time hopfully he can address some of these issues.

Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Thu, 2011-04-21 20:58

Does this help?
https://github.com/gallery/gallery3/commit/7da771acc75058726a2815ec49f449ff6bd51646
you will have to update to an experimental version.

Dave

_____________________________________________
Blog & G2 || floridave - Gallery Team

 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Thu, 2011-04-21 21:23
Quote:
I know some popular opensource ecommerce scripts that have this vulnerability, but believe the risk is minimal.

They are dealing with people's frickin MONEY on both sides of the transaction, the people using their software and their customers. They are completely irresponsible and should be avoided at all costs. If the business' site were to get hacked, then they look bad and loose revenue. If the customer's data were to be leaked or money stolen from them, they'll be pissed and never shop at that site again, again hurting the business.

It doesn't matter what the software does, if they don't take security seriously, don't use it. It's amazing what "minor" software causes big problems.

Regarding XSS exploits, read this:
http://en.wikipedia.org/wiki/Cross-site_scripting

Did you follow that Bruce Schneier article to this page?
http://www.schneier.com/crypto-gram-0007.html#9

The last statement in that article:

Quote:
Unicode is just too complex to ever be secure.

My personal opinion; I don't think unicode characters should ever have been allowed in URLs and it's a huge mistake that has been made that cannot be undone. Yes, it doesn't allow for countries who's character set is complex to not have native characters in the URL, but well, too bad. I can't say for sure, but I'm pretty sure that even if I grew up in China, Japan, Russia or any other place that uses such characters that my belief would be the same. For example if your daughter's name is Zoë, her URL would be Zoe

More reading: http://unicode.org/reports/tr36/
____________________________________________
Like Gallery? Like the support? Donate now!

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:25

Thanks, I am grateful to Tim for the rapid response (and, of course, thanks Dave for pointing it out to me). Not sure I understand what this does. Does this convert ó, ő, ö into o, í into i, etc? If so, it already solves a large part of the problem, esp. for languages written in Latin script.

floridave wrote:
you will have to update to an experimental version.

Can I simply copy this code into item.php? Or download experimental version and upload only item.php and item_helper_test.php to replace old ones? Or does this change use other modifications applied in the experimental code?

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
OC2PS
OC2PS's picture

Joined: 2010-09-08
Posts: 428
Posted: Thu, 2012-01-26 19:24

Thanks for the links, Kevin, I feel more enlightened now.

nivekiam wrote:
Did you follow that Bruce Schneier article to this page?
http://www.schneier.com/crypto-gram-0007.html#9

The last statement in that article:

Quote:
Unicode is just too complex to ever be secure.

Considering that the article was written in early 2000, one would have thought that in 12 years a solution would have been found.

Very interesting. What I get out of this is:

1. Visual spoofing (citibank, paypal) in case of utf8 is a matter of scale not concept. Eitherway this is not relevant for G3 (as far as I can see).

2. ISAD - All conformant users of [IDNA2003] are required to process domain names to convert what are called compatibility-equivalent characters into a unique form using a process called compatibility normalization (NFKC) and casefolding. Essentially, everything that looks alike is considered to be the same, and so can't be registered as separate domain names. But since visually confusable characters are not usually unified across scripts, it is possible to some extent to use mixed-script spoofing...which can be addressed with mixed script detection. Glyphs in complex scripts like devnagri present some risk in relevant languages as the encodings are not identified and the visual interpretation is completely disconnected from the underlying characters. Luckily such font-encodings are seldom used, and their use is decreasing rapidly with the growth of Unicode. BUT ULTIMATELY, ISAD SPOOFING DOESN'T SEEM TO BE A RELEVANT SECURITY ISSUE FOR G3.

3. The risks regarding ISAD (while irrelevant anyway for G3) are particularly low as modern user agents IE9, Firefox 3 and Google Chrome all actually do follow the recommendations presented in the paper (remember, user agents are supposed to be the most vulnerable aspect).

4. While there are some potential issues with post gatekeeper character encoding conversion, the simple solution is to encode everything in utf8
http://unicode.org/reports/tr36/#Canonical_Represenation

Unicode Technical Report #36 wrote:
Where possible, using Unicode instead of other charsets avoids many of these kinds of problems.

5. generation of "non-shortest form" UTF-8 was addressed and fixed in Unicode 3.1

6. Buffer overflows - Some programmers may rely on limitations that are true of ASCII or Latin-1, but fail with general Unicode text. These can cause failures such as buffer overruns if the length of text grows.
When performing character conversion, text may grow or shrink, sometimes substantially. Always account for that possibility in processing. OR simply encode everything in utf8 and forget about conversion issues.

Overall, I didn't see any specific issues on this page that affect G3. But as I said earlier, I am thick about technology. Please can you use an example to illustrate how a utf8 vulnerability (thats not present in case of ASCII) can affect me as a site owner (G3 installation)? Thanks for your patience.

OC2PS
Csillamvilag.com

Bodypainting, Facepainting, Glitter, Henna
Materials, Courses, Resources

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Fri, 2011-04-22 14:59
sooskriszta wrote:
Can I simply copy this code into item.php? Or download experimental version and upload only item.php and item_helper_test.php to replace old ones? Or does this change use other modifications applied in the experimental code?

You can try but we always suggest to an full upgrade because there might be other issues related that are required as well.

Dave

_____________________________________________
Blog & G2 || floridave - Gallery Team