Start networking and exchanging professional insights

Register now or log in to join your professional community.

Follow

Regular expression to match url string not containing protocol name or WWW

I want to write JavaScript regular expression to remove all URLs from given text I use this expression (?:(?:https?|ftp|file)://|www\.|ftp\.)[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)? its work fine if the url start with "given protocol e.g http ..." or www but it failed in some cases for example just.edu.jo (not start with www) how can I rewrite the expression to woke on such cases?

user-image
Question added by Khadijah Shtayat , Technical Lead , Opensooq
Date Posted: 2012/11/19
Hazem Qannash
by Hazem Qannash , Technical Team Leader , Bayt.com

In most cases we use the protocol name (http, ftp, https, ...
) or www to identify the url from the string, but in this case the only way to make sure if this valid url or not valid url is to check the root domain: com, net, org, ...
here is the latest valid top level domains: http://data.iana.org/TLD/tlds-alpha-by-domain.txt You can take the important domains (or all) and add it in your regular expression Ex: var text="not valid url: google.java.
valid url: https://www.google.com.
valid url: google.me.
valid url: just.edu.jo"; var re=/(https?:\/\/|ftp:\/\/|file:\/\/)?[a-z\.0-9]+\.(com|net|org|me|jo)([^a-zA-Z]|$)/gi var cleanText=text.replace(re,"$3"); console.log(cleanText);

Khadijah Shtayat
by Khadijah Shtayat , Technical Lead , Opensooq

Thank you Hazem, But this solution is limit by list of generic top-level domains (gTLDs) and country code top-level domains (ccTLDs) gTLDs and ccTLDs list may updated and its too long list.| e.g just.edu.jo feed://example.com/rss.xml both examples will not catch as URLs I want more generic expression

More Questions Like This

Do you need help in adding the right keywords to your CV? Let our CV writing experts help you.