regex - gsub error extract url with R, what did i miss -
i tried extract url everytime run code. didn't work. did miss? great.
x$url <- gsub("(.*)(http://www.bloomin.com)(.jpg)(.)",
"//2//3", x$product.description.)
[1] //2//3
it return. want http://www.blooming.com/image/xxxxxxxx.jpg in return below vector.
<div>colorful floor chair series</div><div><br /></div><div>soft suede</div><div><br /></div><div>cute bubble design</div><div><br /></div><div><p align="center"><p align="center"><img src="http://gdetail.image-gemkt.com/186/716088198/2010/2/e3b117e2-a7bd-4d.gif" /></div><div><p align="center"><p align="center"><img src="http://www.blooming.com/image/xxxxxxxx.jpg" /></div>
backreferences must refered backslash no forward slash.
use
.*?
(non-greedy) match characters exists inbetween.com
, file extension.jpg
x$url <- gsub("(?s).*\\b(http://www\\.blooming\\.com\\b.*?\\.jpg\\b).*", "\\1", x$product.description.)
Comments
Post a Comment