regex - gsub error extract url with R, what did i miss -


i tried extract url everytime run code. didn't work. did miss? great.

x$url <- gsub("(.*)(http://www.bloomin.com)(.jpg)(.)",
"//2//3", x$product.description.)

[1] //2//3

it return. want http://www.blooming.com/image/xxxxxxxx.jpg in return below vector.

<div>colorful floor chair series</div><div><br /></div><div>soft suede</div><div><br /></div><div>cute bubble design</div><div><br /></div><div><p align="center"><p align="center"><img src="http://gdetail.image-gemkt.com/186/716088198/2010/2/e3b117e2-a7bd-4d.gif" /></div><div><p align="center"><p align="center"><img src="http://www.blooming.com/image/xxxxxxxx.jpg" /></div> 

  1. backreferences must refered backslash no forward slash.

  2. use .*? (non-greedy) match characters exists inbetween .com , file extension .jpg

    x$url <- gsub("(?s).*\\b(http://www\\.blooming\\.com\\b.*?\\.jpg\\b).*",                               "\\1", x$product.description.)  

demo


Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -