preg_match_all php function & how to filter out div, a links, img or other elements

Filter out specific elements using preg_match_all php function & RegEx

To filter out specific elements from content, such as all links, images or divs, see code examples below. The outcome will be gathered in the $result variable as an array. You need to put the html code to be filtered in the variable $incoming_data before starting.

Filter out all headings

$headings = ‘/<h1(.*)<\/h1>/iU’;

preg_match_all ($headings, $incoming_data, $result);

Filter out all a links

$links = ‘/<a(.*)<\/a>/iU’;

preg_match_all ($links, $incoming_data, $result);

Filter out all images

$images  = ‘/<img[^>]+>/i’;

preg_match_all ($images, $incoming_data, $result);

Filter out divs

Now, this will only work if you don’t have nested divs inside eachother.

$divs = ‘/<div(.*)<\/div>/iU’;

preg_match_all ($divs, $incoming_data, $result);

Filter out divs by class, for example “myclass”

Just change the word “myclass” to what ever class you are filtering out. This will not work on nested divs, meaning if div with class “myclass” has more divs nested inside it, those will not be included. Code is filtering only to first </div> closing div it can find.

$divs = ‘/<div class=\”myclass”>(.*?)<\/div>/s’;

preg_match_all($divs, $incoming_data, $result);

Filter out divs by class using – character, for example “my-class

Just change the word “my” and “class” to what ever class you are filtering out. This will not work properly on nested divs, meaning if div with class “my-class” has more divs nested inside it, those will not be included. Code is filtering only to first closing </div> found.

$divs = ‘/<div class=\”my\-class\”>(.*?)<\/div>/s’;

preg_match_all($divs, $incoming_data, $result);

Filter out divs by class when div having multiple classes

Just change the word “myclass” to the divs first class. This will not work properly on nested divs, meaning if div with class “myclass” has more divs nested inside it, those will not be included. Code is filtering only to first closing </div> found. With this code we can filter out all divs starting with <div class=”myclass no matter how many classes that follows after that!

$divs = ‘/<div class=\”myclass(.*?)<\/div>/s’;

preg_match_all($divs, $incoming_data, $result);

Filter out nested divs with multiple divs inside it

To filer out a div by class which holds multiple divs inside it, nested divs, and still get all content can be tricky. Try this code but change the “my-class” to whatever class your div has:

$divs = ‘{<div\s+class=”my-class”\s*>((?:(?:(?!<div[^>]*>|</div>).)++|<div[^>]*>(?1)</div>)*)</div>}si’;

preg_match_all($divs, $incoming_data, $result);

How to echo out / print your preg_match_all() content

To print out, or echo out your filtered content. Follow this complete code example. Change the regEx code to match what ever pattern you want to filter out from content. If you are having problems getting the content to print properly, wrap everyting inside a <pre> </pre> tag.

$incoming_data = put all your html code to be filtered into the $incoming_data variable;

$h1 = ‘/<h1(.*)<\/h1>/iU‘;

$count = preg_match_all($h1, $incoming_data, $result);

if ($count > 0) {

for($i = 0; $i < $count; $i++) {

echo($result[1][$i]);

}

} else {

echo(‘There was a problem loading this information’);

}

4 سال پیش

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

این سایت از اکیسمت برای کاهش هرزنامه استفاده می کند. بیاموزید که چگونه اطلاعات دیدگاه های شما پردازش می‌شوند.