Welcome to our forums...

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed.

Forum Statistics

  • Forum Members:
  • Total Threads:
  • Total Posts: 7
There are 1 users currently browsing forums.
General Web Programming This is for more general discussion about web programming or other web programming languages such as XML, Mod_Rewrite, ColdFusion, CGI, VBScript, and Ruby on Rails.

Reply
  #1  
Old 11-28-2005
Mitch's Avatar
Emo Pose >.>
 
Join Date: Dec 2004
Location: United Kingdom
Age: 21
Posts: 2,757
Rep Power: 9
Mitch will become famous soon enoughMitch will become famous soon enough
URLs

I'm working on a "user submittable" PHP tutorial website, and I was wondering what URLs I should allow for users to submit.

For instance:
www.something.com - should I allow this or not? because it's a website rather than a specific page of a tutorial

www.something.com/this.html/htm
www.something.com/this.php/asp/cfm
www.something.com/this.php?id5 /asp/cfm
As well as folders..

Any other file types i should allow? I want to be quite strict as I don't want users linking to other things..

PS: I put this in general programming, as I'm not asking for actual coding help- and this section needs filling up .
__________________
MitchStanley.co.uk - Coming Soon
Reply With Quote
  #2  
Old 11-28-2005
hot_cakes's Avatar
Moderat0r!!1
 
Join Date: Aug 2005
Location: Bristol, UK
Age: 28
Posts: 2,939
Rep Power: 8
hot_cakes will become famous soon enough
Re: URLs

Quote:
sketchie originally posted:
www.something.com/this.html/htm
www.something.com/this.php/asp/cfm
www.something.com/this.php?id5 /asp/cfm
As well as folders..

Any other file types i should allow? I want to be quite strict as I don't want users linking to other things..
You simply can't guess the contents of a page by it's URL(s). The only way to be strict is to actually look at each URL that gets submitted.

You might also consider it necessary to check periodically for changes to those websites. Some one could post a URL that is a tutorial one day and something completely different the next. Perhaps for this you could keep a cache of the page's source and check for changes greater than 10% using whatever metric so you can flag it as "possibly changed and something you should check out". You could automate this if you're allowed to run cron jobs on your server.

I imagine there are still ways around this though e.g. A page of 'junk' with a fast-acting redirect -- you'd only have to change the URL which could be less than 10% (for example) if the page is padded out with fluff.

Any system is going to require manual checking to begin with and regularly there-after if you want to be strict.

Edd
__________________
Visit me at: mr-edd.co.uk
Languages: Python | Lua
Compilers: MinGW | MSVC++9
Libraries: Boost | gtkmm
Reference: Dinkumware | a.c.l.l.c-c++ FAQ
Reply With Quote
  #3  
Old 11-28-2005
Mitch's Avatar
Emo Pose >.>
 
Join Date: Dec 2004
Location: United Kingdom
Age: 21
Posts: 2,757
Rep Power: 9
Mitch will become famous soon enoughMitch will become famous soon enough
Re: URLs

I plan on adding a "Report Bad link" option for users to users which will help bring attention to changed websites, and a rating system for un-useful/incorrect tutorials which will help prevent some things. I suppose that's all I shall do for now..

Except prevent links to zips/rars, executables, media files..
Reply With Quote
  #4  
Old 11-28-2005
4fingers's Avatar
Doctorate Student
 
Join Date: Aug 2005
Location: UK, Scotland
Age: 23
Posts: 919
Rep Power: 6
4fingers is on a distinguished road
Re: URLs

One thing worth noting is that if you use the fopen command to open an HTTP stream, and the URL you're trying to access is invalid or generates an error response, (i.e. 404 Not found), the fopen call will return false. Which could come in quite handy to double check links aren't broken.
Reply With Quote
  #5  
Old 11-28-2005
IntellEJent's Avatar
Active Supporter
 
Join Date: Jan 2005
Location: The restaurant at the end of the universe
Age: 15
Posts: 1,546
Rep Power: 7
IntellEJent is on a distinguished road
Re: URLs

I don't really agree with a URL checker because people can use folder names or a rewrite engine (i.e. mod_rewrite and URL shorteners) to explain the URL. Also, there are many other languages that things could be written in for it to work. What you should do is disable certain extensions that could possibly have malicious stuff or unsavvy or canny content. If you want something interesting, you could have a user-based acceptance system where tutorials would go in a certain section until checked as legitimate by a couple users or something to that extent. Just my two-cents worth, because there are too many extensions to count that are legitimate on the web, and too many other ways to bypass extensions.
__________________
Reply With Quote
  #6  
Old 11-28-2005
Mau Mau is offline
A friend
 
Join Date: Jun 2005
Location: California, USA
Age: 20
Posts: 2,956
Rep Power: 8
Mau is on a distinguished road
Re: URLs

I would personally just check to headers that are returned from the page--emulate a browser, but only send the headers to get just the headers back (I can look into that if you want me to.)
Reply With Quote
  #7  
Old 11-29-2005
Graduate Student
 
Join Date: Sep 2005
Location: Perth, Western Australia
Posts: 446
Rep Power: 5
_jameshales is on a distinguished road
Re: URLs

I think that the headers option is a good one. I think fopen() would not encounter problems with URL rewriters or supplied folders, because the request is sent to the server the page holds, and will be managed by it so that the correct page is returned.

I think a URL pattern that should be disabled is links to people's local hard drives :P

The "report broken link" option could be extended to other things too, since there are more than just links to be wary of on the Internet these days. For instance, an excess in images/ads that causes page loads to be really slow, offensive content surrounding the tutorial, an excess in popups or malicious or dangerously broken scripts, etc. It is possible that the person who submits the link does not think those things are much of a problem, or it is possible that those things change over time. There could be other reasons.

... Maybe to check if a link has changed you could, instead of caching source code, cache a few paragraphs of the tutorial, so that you can tell if the tutorial is still there or not. I.e. you could copy in three random paragraphs, and then look for them in the source code of the page, and if none of them are there, the link should be checked.

_jameshales
__________________
Death to the non-believers!
Reply With Quote


Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Making dynamic urls search engine friendly Vio-Bear Articles, Tutorials, and Guides 0 06-09-2005 01:14 PM
Extract URLS Vio-Bear PHP Articles 0 09-04-2004 05:32 AM