http://codenamu.org/2014/11/13/scrape-webpage-for-10-minutes
○유용한 simplehtmldom 활용법 10가지…

http://onesunny.cafe24.com/test2

http://codenamu.org/2014/11/13/scrape-webpage-for-10-minutes
▲자바스크립트를 활용한 스크롤러

https://sourceforge.net/projects/simplehtmldom/
▲ 최신 php crawler api 가져오기

최신 simplehtmldom을 가져와도 접속이 안되는 경우 활용해본 php 연결 코드 – 작동 확인

http://simplehtmldom.sourceforge.net/manual.htm
simplehtmldom 활용 메뉴얼 링크

<?php

include('simple_html_dom.php');

   $ch = curl_init();

   $agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/5';
   curl_setopt($ch, CURLOPT_URL, 'https://en.wikipedia.org/wiki/Diabetes_mellitus');
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
   curl_setopt($ch, CURLOPT_TIMEOUT, 10);
   curl_setopt($ch, CURLOPT_HEADER, false);
   curl_setopt($ch, CURLOPT_REFERER, 'https://en.wikipedia.org/wiki/Diabetes_mellitus');
   curl_setopt($ch, CURLOPT_USERAGENT, $agent);
   $content = curl_exec($ch);
   curl_close($ch);



   // content 뿌려지면 가져오기 성공

   //echo $content;



   // html dom parser

   $dom = new simple_html_dom();

   $dom->load($content);

   // body 태그의 내용을 담는다 (body 태그 포함)

   $A_sitebody = $dom->find('h1#firstHeading',0)->plaintext;

   echo $A_sitebody

?>

댓글 남기기