PHP: BOT, Web, Crawler, Spider ?
I am looking for how to make one of these, but my searches on google and other search engines turn to failure. I have yet to produce the results I need to produce... Im trying to make a bot that is capable of logging into a site, storing cookies and going between two pages, to keep me logged in.
while im not actually on the page. This is not going to be used for... 'Cheating' purposes of any like. I merely want to fake my logged in time on a site whicth is actually my school page.
View 1 Replies (Posted: September 05, 2007, 02:50:28 PM)
Sponsored Links:
Related Forum Messages:
Based Web Crawler Or JAVA Based Web Crawler?
i have some doubt about PHP based web crawlers,can it run like the java thread based one? i am asking it because, in java the thread can be executed again and again, i dont think, PHP have something like thread function, can you guys please say, which web crawler will be more use full?A PHP Based or A Java Based
Posted: Jul 27 10 at 7:52
View 2 Replies!
View Related
Spider?
i programmed a little link database. Now i want to build a script which takes the url out of the db and tests if the site is still available or not. Any ideas how i could do that?!?
Posted: September 25th, 1999, 06:49 AM
View 1 Replies!
View Related
Spider A Url
I'm looking for a simple php script to spider a url and get information on links from that page. Does anyone have any ideas of where to look for such a script?
Posted: September 01, 2007, 05:33:26 AM
View 3 Replies!
View Related
How Do You Know That Something Is NOT A Spider?
I would like spiders to find a particular php page but... within the page an email is sent to advise me it's been accessed. I'd like to NOT get the email everytime the page is crawled so it was suggested I list all the bots I know. I thought it would be simpler to send the email when the UA starts with "Mozilla". Is it that simple or are there other starts to the UA string for browsers?
Posted: 1:51 pm on July 21, 2005
View 1 Replies!
View Related
Where To Get 'website Spider' Bot For PHP?
Hi, I have a somewhat weird request, but I have a legit reason. My new CMS is having performance problems, and we are trying to resolve the issue by implementing a bot or spider that grabs the HTML of the site every 30s. I was wondering if you guys could kindly tell me where to get such bots. The program will be installed on another server with cronjob to run every 15s. The 'website performance' check services out there are limited to 2 minute interval. I need something that hits the target site in question more frequently.
Posted: 6:05 pm on Feb. 5, 2005
View 1 Replies!
View Related
URL Contains Sessionids, Spider In SE?
I have a article directory each of my articles submitted have sessionids for example, http://www.listbuildersuccess.com/i...m?cat_id=4&id=1 http://www.listbuildersuccess.com/i...m?cat_id=4&id=2 and so on.. What can i do so that every articles submitted to our article directory can be indexed and spider in search engine?
Posted: April 11th, 2003, 09:15 AM
View 14 Replies!
View Related
PHP Spider Trap
I recently set up my own spider trap after reading about it here. I finally got sick of site-suckers driving up my bandwidth to the point I had to upgrade my hosting package twice. So anyway, I don't use Perl much and decided to make a PHP trap. It's working nicely and just wanted to post it up here in case anyone wants to use it. *Notes:
Posted: 1:48 pm on Mar. 7, 2004
View 1 Replies!
View Related
Php To Spider A Website
I am looking for a script that I can use to spider a website, and then pull the images... I know how to do it for a single page, but, I would like to be able to do this for the entire site.
Posted: July 17th, 2005 01:25 AM
View 4 Replies!
View Related
Simple Spider
I reposted this from "Regex within PHP" because I feel this is a PHP lproblem not Regex. And what I am trying to do is start at a pre-defined page, find all the links on that page and run the spider on all of the pages that were found, and return the results from those pages that were spidered. Code:
Posted: October 04, 2007, 01:22:09 PM
View 4 Replies!
View Related
Quick Spider
I'd like to be able to quickly spider the referring page from which a specific page on my site is accessed, using php. I'd then like to be able to extract the title and description from the referring page, and display that. Is ther a script available that would do that, or most of it?
Posted: 8:11 pm on Sep. 23, 2004
View 1 Replies!
View Related
PHP Spider Script
I'm just wondering out of curiosity, how are spider/crawler scripts made? Just the basic setup and stuff is all i'm wondering (and also how do you get it to "follow" links?) I already searched a couple of times on google and found some stuff, but some of it just didn't have the info i was looking for.
Posted: 09-07-2006, 02:07 PM
View 2 Replies!
View Related
Detecting A Spider
I have made a visitor counter for a site that adds a record to the database each time a page is viewed. To make it a little more accurate, I was wondering if there is a way to detect if the page is being viewed by a search engine spider instead of a human? That way I could use a condition to not execute the database update if the visitor is a spider... or mark the record "Spider".
Posted: 07-24-2005, 02:42 AM
View 2 Replies!
View Related
Server Software Spider?
What I want to be able to do is spider lots of web sites and return the type of server software they use (like Netcraft). I want it to be automated - so it might have to follow links on pages in order to get to other servers..
Posted: September 21st, 2001, 07:34 AM
View 1 Replies!
View Related
Spider Friendly URL's
I know I've seen tutorials on this kind of thing before, but now that I'm ready to do it I can't find one. I know it has something to do with editing the conf file....can anyone point me in the right direction? I figured out how to change the title tag dynamically so now search engines would see different tags if they could just spider my pages such as http://321webmaster.com/index.php?cat_id=3&subcat_id=79
Posted: January 2nd, 2002, 04:27 AM
View 14 Replies!
View Related
Spider Trap Clarification
In the PHP spider trap solution the following code is added to the .htaccess file: SetEnvIf Request_URI "^(/403.*.htm¦/robots.txt)$" allowsome <Files *> order deny,allow deny from env=getout allow from env=allowsome </Files> What exactly is the string inside the SetEnvIf meant to be doing? It looks to me like "If the user is requesting a file called "403<#*$!>.htm" or "robots.txt" set the env to allowsome. I'm kind of confused because it doesn't look like regular RegEx to me (the grouping and the forward slashes look odd to me). Before I ask a load (more?) of silly questions, am I reading this correctly?
Posted: 4:52 am on July 28, 2005
View 1 Replies!
View Related
Web Spider/domain Copy
I work for a company which has just switched over to a new web system. The old system is VERY unstable and the database is completely unreadable, yet they want a backup of the old system before they take it offline. I figure that the easiest thing to do would be to launch a spider (or crawler, some people are picky about terminology) that will go through the entire domain and copy the content to flat files. I just need a snapshot of the domain, but I can't find any software to do it in Linux, so I have turned to PHP.
Posted: 09-07-2006, 01:51 PM
View 1 Replies!
View Related
I Need A Spider Simulator For My Site
Does anyone know if I can get a spider simulator like the Sim Spider on WebmasterWorld. I would like to offer my visitors something like that. So they could put their URL in the http field and have it output a list of pages found on their site or the site they want to spider.
Posted: 4:37 pm on Mar. 25, 2005
View 1 Replies!
View Related
Trying To Make A FTP Spider... With Loop
I've been trying to code a FTP spider, to function as a search engine for the FTP servers on our network. The code writes down all of the files with the ftp server name in front in text files. It's about to work, just one little problem... When I want to spider a directory with directories inside of it I have a small problem. The spider only writes down the first dir in the list. However, when there are only files inside of a directory that I want to spider the script writes them down perfectly... A week or two ago I had gotten this almost working, however because of the end of school and such I had to postpone the coding, now I'm kinda lost in my own code... <?php $ftp = $_GET["ftp"]; $dir = $_GET["dir"]; $ftp_host = $ftp; $ftp_user = "anonymous"; $ftp_password = "anonymous"; echo "Connecting to $ftp_host via FTP...<BR>"; flush(); ob_flush(); $conn = ftp_connect($ftp_host); $login = ftp_login($conn, $ftp_user, $ftp_password); $mode = ftp_pasv($conn, TRUE); if ((!$conn) || (!$login) || (!$mode)) { die("FTP connection has failed !"); } echo "Login Ok.<BR>"; flush(); ob_flush(); $mode = ftp_pasv($conn, TRUE); function itemize_dir($contents) { foreach ($contents as $file) { if(ereg("([-dl][rwxstST-]+).* ([0-9]*) ([a-zA-Z0-9]+).* ([a-zA-Z0-9]+).* ([0-9]*) ([a-zA-Z]+[0-9: ]*[0-9])[ ]+(([0-9]{2}:[0-9]{2})|[0-9]{4}) (.+)", $file, $regs)) { $type = (int) strpos("-dl", $regs[1]{0}); $tmp_array['line'] = $regs[0]; $tmp_array['type'] = $type; $tmp_array['rights'] = $regs[1]; $tmp_array['number'] = $regs[2]; $tmp_array['user'] = $regs[3]; $tmp_array['group'] = $regs[4]; $tmp_array['size'] = $regs[5]; $tmp_array['date'] = date("m-d",strtotime($regs[6])); $tmp_array['time'] = $regs[7]; $tmp_array['name'] = $regs[9]; } $dir_list[] = $tmp_array; } //return $tmp_array; return $dir_list; } function spider($ftp, $dir, $conn) { //$conn = $ftp_host; $buff = ftp_rawlist($conn, $dir); $items = itemize_dir($buff); foreach($items as $line=>$item) { $file = $ftp . $dir . $item['name']; $fh = fopen('ftp.txt', 'a') or die("can't open file"); fwrite($fh, $file . ""); fclose($fh); echo $file; echo "<BR>"; //echo $item['name']; if(strpos("/", $ftp, $dir . $item['name']) == TRUE) { $ydir = "/"; } if($item['type'] == 1) { return $dir . $item['name'] . "/"; } } } $work = spider($ftp, '/', $conn ); while($var = 0) { if( $work != 0 ){ $works = spider($ftp, $work, $conn ); } else { $var = 1;} } ?> And I do specify the ftp server in the variable: ?ftp=
Posted: 07-02-2006, 09:24 AM
View 3 Replies!
View Related
.htaccess Https And A Spider
I need to write a program that runs from a web server, logs into a site that uses .htaccess on a secure server, pass a string to the script on the secure server, spider all the results of the string then input all the results into a MySQL database. Is there a way for php to log into a web site that uses .htaccess? Is there a way to write a spider that will spider the results and populate the results into a MySQL database?
Posted: 11-17-2005, 09:39 PM
View 1 Replies!
View Related
Web Spider / Bot - Automatically Scroll
i need it so you submit your url then it automatically scrolls it for the description and keywords then puts it in the database. my html code: Code: [Select]<form action="submit_url.php" method="get"> <input type="text" name="url" value="url" /> <input type="submit" name="submit" value="submit" /> </form> php code so far: <html> <head> [Code].... all it does is reads the website if you put 'echo $url' at the bottom it just reads and prints the web page.
Posted: November 23, 2010, 03:30:06 PM
View 6 Replies!
View Related
Web Crawler PHP
I am supposed to construct a page that searches in specific websites to extract information, like those sites from where you can rent a car for example. There is a form in the site where the user selects some fields (for instance departure and drop-off date), then the data are submitted to the other page that searches 2-3 sites and finds which cars are available on those dates. I wanted to ask if there are ready scripts to do that, if not, some hints on how to start. I am familiar with PHP forms and data extraction from mysql databases, but when you extract data from other sites, I have no clue how I can begin and deal with it...
Posted: 11-02-2006, 08:05 PM
View 2 Replies!
View Related
PHP Web Crawler
I am working on a PHP Web Crawler and am having trouble parsing links out of a page all that happens is that array is printed out here is the script. <? $f = fopen("http://www.google.com","r"); $inputStream = fread($f,65535); fclose($f); if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) { $matches= strip_tags($matches); print_r($matches); } ?> Can some one please help me?
Posted: 09-12-2006, 08:59 PM
View 1 Replies!
View Related
Web Crawler
I have a script that parses out links in a page, now I want to figure out how to follow those links. Here is the script: <? $f = fopen("http://www.theotaku.com","r"); while( $buf = fgets($f,1024) ) { preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$buf,$words); for( $i = 0; $words[$i]; $i++ ) { for( $j = 0; $words[$i][$j]; $j++ ) { $cur_word = strtolower($words[$i][$j]); print "Indexing: $cur_word<br>"; } }
Posted: 09-24-2006, 08:55 AM
View 1 Replies!
View Related
Spider Site To Check For Updates
I am looking to create a spider preferably in PHP (if this cannot be done in php then any other language) to check an entire website for updates. I want to be able to have something set up to check a site when an update is made. That is the mostly what I need, just so I know when and what files are updated. I would also like to be able to then scrape the site and compare the page that has been changed to one that I define. I have a site that I monitor and a global site that is very slightly different. I can match the pages up to compare to one another as to be aware of a change on the global site to change on my site.
Posted: 2:03 pm on Aug. 31, 2007
View 1 Replies!
View Related
Spider That Will Grab Links Off Of A Page
I am writing a simple spider that will grab links off of a page. So far I have no problem grabbing all the links off the first seed page. But I am stuck on how I can get it to follow those links to the next page and grab additional links perpetually. Code:
Posted: March 25, 2007, 03:57:59 AM
View 5 Replies!
View Related
Local Spider & Search Recomendations?
I'm looking to have my own (prebuilt and free) spider and search index on my site...I'm long overdue and I'm looking for suggestions! - PHP / MySQL - Admin Control Panel - Ability to configure the spider to crawl aautomated times to scan or manual scan. - Ability to have results on my own pages instead of some styled up page thats not mine nor looks like my site. The first thing I came across is PHP Dig but I don't want to miss any gems if they make this one look like a rock.
Posted: 4:01 am on Sep. 3, 2005
View 1 Replies!
View Related
Execute A Spider / Scraper But Without It Timing Out
I need to scrape pages for info at varying intervals, which means calling the bot at those intervals, to load a link form the database and scrap the page the link points to. The problem is, loading the bot. If I load it with javascript (like an Ajax call) the browser will throw up an error saying that the page is taking too long to respond yadda yadda yadda, plus I will have to keep the page open. If I do it from within PHP I could probably extend the execution time to however long is needed but then if it does throw an error I don't have the access to kill the process, and nothing is displayed in the browser until the PHP execute is completed right? I was wondering if anyone had any tricks to get around this? The scraper executing by itself at various intervals without me needing to watch it the whole time.
Posted: Feb 25 09 at 12:52
View 4 Replies!
View Related
Detect A Search Engine Spider?
I am making a site where certain content will be limited to "members only", where membership is free. Now I want the google bot or whatever bot to be able to see and index this content, but when a user visits it, I want to hide it from them unless they are a member (I already do that). So basically I want to have a function that I can call that will return true or false if the current page is being requested by a search engine spider. I know it's possible because I regularly see forums doing that; posts are hidden unless you register but if you look through the google cached version, the posts are visible. How can I do that?so far all I have is the following, so that the rest of my code works. function is_spider(){ return true; } [code]......
Posted: August 11, 2010, 07:36:21 PM
View 1 Replies!
View Related
Php Based Crawler
The problem is, im trying to make a central portal so that all of ma frieds blogs recent post can be seen on it. so that its easy to see who posted wot and all... the process needs to be that when i add a URL, the crawler then keeps cheking on the URL's. if theres a new post made it has to appear on ma central portal with the title and descriotion.. so is there a way to do this or any script out there that is currently doing this..
Posted: 8:34 am on Oct. 11, 2006
View 1 Replies!
View Related
Image Crawler
how to script image crawler? i'm developing using windows OS and php4. is it true that we can manipulate image easily using php5 only?
Posted: 03-03-2006, 07:48 AM
View 1 Replies!
View Related
Crawler Identifier
I am running a website with specific functions which collects informations about users preferences on that website. But often crawlers came to my site and my scripts insert records about their visits. Is there a quick and easy solution to identify crawlers so I could neglect crawler informations.
Posted: 06-26-2006, 02:59 PM
View 3 Replies!
View Related
Keep Crawler In One Domain?
I'm writing a simple php crawler, essentially a class which recursively crawls the website by detecting link tags and going deeper. The problem is that I would like to contain it within the domain it's crawling, otherwise it will start following links to other domains and start a never ending chain reaction. My idea was to scrabble a regex that would dissect the bare domain name of the website (eg. domain.com) and check every link against it. The regex itself will have to be quite long, since I will have to include all TLDs in it. UPDATE: parse_url is not a solution - only can give HOST name not DOMAIN name.
Posted: May 21 at 9:02
View 1 Replies!
View Related
Parse A Url For Crawler?
i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this../tets/index.htmlif it is /test/index.html we can add with base url http://www.example.com/test/index.html
Posted: Sep 6 10 at 15:18
View 3 Replies!
View Related
|