Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Geek Culture / regex problem

Author
Message
Phaelax
DBPro Master
20
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 11th Apr 2017 13:38
https://www.forum.thegamecreators.com/board/junk.html

I'm trying to split a url up (im using java). For the most part it's simple enough splitting the string with a forward slash delimiter. However, sometimes address will have http or www and sometimes it won't. If it doesn't, then all is well and I can assume the first token is the domain. But if it does, the regex matches the slashes after http, leaving me with the first two tokens as "https:" and "" (empty string). This is no good as then I cannot assume which token is the domain.

I was trying to match only slashes which have been preceded by a period anywhere in the string, but I'm not sure how to do that. I tried using look behind but I'm not familiar enough with them to make it work. I'm not even sure if it can look behind by a variable length like this. If I can do then, then the first slash that matches would be after the .com

"I like offending people, because I think people who get offended should be offended." - Linus Torvalds
BatVink
Moderator
20
Years of Service
User Offline
Joined: 4th Apr 2003
Location: Gods own County, UK
Posted: 11th Apr 2017 16:01
you could use "not"

so [^/]/

would be a forward slash preceded by anything but another forward slash.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Quidquid latine dictum sit, altum sonatur
TutCity is being rebuilt
Phaelax
DBPro Master
20
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 12th Apr 2017 01:57
What would happen when it reaches the 2nd slash?

"I like offending people, because I think people who get offended should be offended." - Linus Torvalds
budokaiman
FPSC Tool Maker
14
Years of Service
User Offline
Joined: 24th Jun 2009
Playing: Hard to get
Posted: 12th Apr 2017 21:28 Edited at: 12th Apr 2017 21:55
If you're using groups you could do something like:


This way you can call match then get group(1) on the matcher and you should get "www.forum.thegamecreators.com" out of the string you provided. All it does is gets the valid domain characters (letters, numbers and dots) after the first set of '/' characters if there are any, and before the next /. Just note that means a url should have a / after the .com (or org or whatever) which url's should. Hoping that's what you were looking for. (EDIT: also, make sure everything's escaped properly, I only used one backslash for escaping but if you put that in a string you'll need to use double).
"Giraffe is soft, Gorilla is hard." - Phaelax
TheComet
16
Years of Service
User Offline
Joined: 18th Oct 2007
Location: I`m under ur bridge eating ur goatz.
Posted: 13th Apr 2017 12:11 Edited at: 13th Apr 2017 12:11
Expanding on what budo provided, with regex it's better to try and match as little as possible.


Or even (if you know that it can only be http or https):


You can test regex online if you didn't know: https://regex101.com/

With all of this said, regex is the WRONG approach to this problem!. Java has facilities to parse URLs...
https://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html

"Jeb Bush is a big fat mistake" -- Donald Trump
https://vt.tumblr.com/tumblr_o2rvwdLLSF1rmjly4.mp4
Phaelax
DBPro Master
20
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 13th Apr 2017 15:43
Thanks comet! It's been years since I've done any real java programming, I've forgotten all the good stuff. They had just added support for generics last time I used java, so it's been awhile.

"I like offending people, because I think people who get offended should be offended." - Linus Torvalds
budokaiman
FPSC Tool Maker
14
Years of Service
User Offline
Joined: 24th Jun 2009
Playing: Hard to get
Posted: 13th Apr 2017 16:27
Quote: "regex is the WRONG approach to this problem"

It may be the wrong approach, but it's definitely the more fun approach.
"Giraffe is soft, Gorilla is hard." - Phaelax

Login to post a reply

Server time is: 2024-03-28 21:10:19
Your offset time is: 2024-03-28 21:10:19