regex - gsub error extract url with R, what did i miss -


i tried extract url everytime run code. didn't work. did miss? great.

x$url <- gsub("(.*)(http://www.bloomin.com)(.jpg)(.)",
"//2//3", x$product.description.)

[1] //2//3

it return. want http://www.blooming.com/image/xxxxxxxx.jpg in return below vector.

<div>colorful floor chair series</div><div><br /></div><div>soft suede</div><div><br /></div><div>cute bubble design</div><div><br /></div><div><p align="center"><p align="center"><img src="http://gdetail.image-gemkt.com/186/716088198/2010/2/e3b117e2-a7bd-4d.gif" /></div><div><p align="center"><p align="center"><img src="http://www.blooming.com/image/xxxxxxxx.jpg" /></div> 

  1. backreferences must refered backslash no forward slash.

  2. use .*? (non-greedy) match characters exists inbetween .com , file extension .jpg

    x$url <- gsub("(?s).*\\b(http://www\\.blooming\\.com\\b.*?\\.jpg\\b).*",                               "\\1", x$product.description.)  

demo


Comments

Popular posts from this blog

android - Why am I getting the message 'Youractivity.java is not an activity subclass or alias' -

python - How do I create a list index that loops through integers in another list -

c# - “System.Security.Cryptography.CryptographicException: Keyset does not exist” when reading private key from remote machine -