PHP: BOT, Web, Crawler, Spider ?

I am looking for how to make one of these, but my searches on google and other search engines turn to failure. I have yet to produce the results I need to produce... Im trying to make a bot that is capable of logging into a site, storing cookies and going between two pages, to keep me logged in.

while im not actually on the page. This is not going to be used for... 'Cheating' purposes of any like. I merely want to fake my logged in time on a site whicth is actually my school page.


Based Web Crawler Or JAVA Based Web Crawler?

i have some doubt about PHP based web crawlers,can it run like the java thread based one? i am asking it because, in java the thread can be executed again and again, i dont think, PHP have something like thread function, can you guys please say, which web crawler will be more use full?A PHP Based or A Java Based

View 2 Replies View Related

Like A Spider Or Bot

I want to get all the emails from a site. Is it possible to do that in php3 with mysql?

View 2 Replies View Related


i programmed a little link database. Now i want to build a script which takes the url out of the db and tests if the site is still available or not. Any ideas how i could do that?!?

View 1 Replies View Related

How Do You Know That Something Is NOT A Spider?

I would like spiders to find a particular php page but... within the page an email is sent to advise me it's been accessed.

I'd like to NOT get the email everytime the page is crawled so it was suggested I list all the bots I know.

I thought it would be simpler to send the email when the UA starts with "Mozilla". Is it that simple or are there other starts to the UA string for browsers?

View 1 Replies View Related

Spider A Url

I'm looking for a simple php script to spider a url and get information on links from that page. Does anyone have any ideas of where to look for such a script?

View 3 Replies View Related

PHP Spider/Bot Detection

I was just wondering if anyone had any PHP code that could detect a bot/spider crawling your site? Similare to browser detection.

View 1 Replies View Related

Where To Get 'website Spider' Bot For PHP?

Hi, I have a somewhat weird request, but I have a legit reason. My new CMS is having performance problems, and we are trying to resolve the issue by implementing a bot or spider that grabs the HTML of the site every 30s.

I was wondering if you guys could kindly tell me where to get such bots. The program will be installed on another server with cronjob to run every 15s.

The 'website performance' check services out there are limited to 2 minute interval. I need something that hits the target site in question more frequently.

View 1 Replies View Related

URL Contains Sessionids, Spider In SE?

I have a article directory each of my articles submitted have sessionids for example,

and so on.. What can i do so that every articles submitted to our article directory can be indexed and spider in search engine?

View 14 Replies View Related

Spider Friendly URL's

I know I've seen tutorials on this kind of thing before, but now that I'm ready to do it I can't find one. I know it has something to do with editing the conf file....can anyone point me in the right direction?

I figured out how to change the title tag dynamically so now search engines would see different tags if they could just spider my pages such as

View 14 Replies View Related

PHP Spider Trap

I recently set up my own spider trap after reading about it here. I finally got sick of site-suckers driving up my bandwidth to the point I had to upgrade my hosting package twice.

So anyway, I don't use Perl much and decided to make a PHP trap. It's working nicely and just wanted to post it up here in case anyone wants to use it.


View 1 Replies View Related

Php To Spider A Website

I am looking for a script that I can use to spider a website, and then pull
the images... I know how to do it for a single page, but, I would like to be
able to do this for the entire site.

View 4 Replies View Related

Spider Problem

i have build a site with php nuke and i have problem with spiders get indexed my site.Google index only my first page. Code:

View 1 Replies View Related

Quick Spider

I'd like to be able to quickly spider the referring page from which a specific page on my site is accessed, using php. I'd then like to be able to extract the title and description from the referring page, and display that. Is ther a script available that would do that, or most of it?

View 1 Replies View Related

What Are The Limitations Of PHP As A Web Spider?

What are the limitations of PHP as a web spider? Most PHP Spiders I have seen usually can index around a 100,000 pages. Can a PHP Spider be designed to index millions or even billions of pages? What are some of the limitations of MySql to store that information.

View 2 Replies View Related

PHP Spider Script

I'm just wondering out of curiosity, how are spider/crawler scripts made? Just the basic setup and stuff is all i'm wondering (and also how do you get it to "follow" links?) I already searched a couple of times on google and found some stuff, but some of it just didn't have the info i was looking for.

View 2 Replies View Related

Detecting A Spider

I have made a visitor counter for a site that adds a record to the database each time a page is viewed. To make it a little more accurate, I was wondering if there is a way to detect if the page is being viewed by a search engine spider instead of a human? That way I could use a condition to not execute the database update if the visitor is a spider... or mark the record "Spider".

View 2 Replies View Related

Simple Spider

I reposted this from "Regex within PHP" because I feel this is a PHP lproblem not Regex. And what I am trying to do is start at a pre-defined page, find all the links on that page and run the spider on all of the pages that were found, and return the results from those pages that were spidered. Code:

View 4 Replies View Related

Server Software Spider?

What I want to be able to do is spider lots of web sites and return the type of server software they use (like Netcraft). I want it to be automated - so it might have to follow links on pages in order to get to other servers..

View 1 Replies View Related

Spider Trap Clarification

In the PHP spider trap solution the following code is added to the .htaccess file:

SetEnvIf Request_URI "^(/403.*.htm¦/robots.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome

What exactly is the string inside the SetEnvIf meant to be doing?

It looks to me like "If the user is requesting a file called "403<#*$!>.htm" or "robots.txt" set the env to allowsome. I'm kind of confused because it doesn't look like regular RegEx to me (the grouping and the forward slashes look odd to me).

Before I ask a load (more?) of silly questions, am I reading this correctly?

View 1 Replies View Related

Web Spider/domain Copy

I work for a company which has just switched over to a new web system. The old system is VERY unstable and the database is completely unreadable, yet they want a backup of the old system before they take it offline. I figure that the easiest thing to do would be to launch a spider (or crawler, some people are picky about terminology) that will go through the entire domain and copy the content to flat files. I just need a snapshot of the domain, but I can't find any software to do it in Linux, so I have turned to PHP.

View 1 Replies View Related

I Need A Spider Simulator For My Site

Does anyone know if I can get a spider simulator like the Sim Spider on WebmasterWorld. I would like to offer my visitors something like that. So they could put their URL in the http field and have it output a list of pages found on their site or the site they want to spider.

View 1 Replies View Related

Trying To Make A FTP Spider... With Loop

I've been trying to code a FTP spider, to function as a search engine for the FTP servers on our network.

The code writes down all of the files with the ftp server name in front in text files.
It's about to work, just one little problem...

When I want to spider a directory with directories inside of it I have a small problem. The spider only writes down the first dir in the list. However, when there are only files inside of a directory that I want to spider the script writes them down perfectly...

A week or two ago I had gotten this almost working, however because of the end of school and such I had to postpone the coding, now I'm kinda lost in my own code...

$ftp = $_GET["ftp"];
$dir = $_GET["dir"];
$ftp_host = $ftp;
$ftp_user = "anonymous";
$ftp_password = "anonymous";

echo "Connecting to $ftp_host via FTP...<BR>";
$conn = ftp_connect($ftp_host);
$login = ftp_login($conn, $ftp_user, $ftp_password);

$mode = ftp_pasv($conn, TRUE);

if ((!$conn) || (!$login) || (!$mode)) {
   die("FTP connection has failed !");
echo "Login Ok.<BR>";
$mode = ftp_pasv($conn, TRUE);

function itemize_dir($contents) {
   foreach ($contents as $file) {
       if(ereg("([-dl][rwxstST-]+).* ([0-9]*) ([a-zA-Z0-9]+).* ([a-zA-Z0-9]+).* ([0-9]*) ([a-zA-Z]+[0-9: ]*[0-9])[ ]+(([0-9]{2}:[0-9]{2})|[0-9]{4}) (.+)", $file, $regs)) {
           $type = (int) strpos("-dl", $regs[1]{0});
           $tmp_array['line'] = $regs[0];
           $tmp_array['type'] = $type;
           $tmp_array['rights'] = $regs[1];
           $tmp_array['number'] = $regs[2];
           $tmp_array['user'] = $regs[3];
           $tmp_array['group'] = $regs[4];
           $tmp_array['size'] = $regs[5];
           $tmp_array['date'] = date("m-d",strtotime($regs[6]));
           $tmp_array['time'] = $regs[7];
           $tmp_array['name'] = $regs[9];
       $dir_list[] = $tmp_array;
   //return $tmp_array;
   return $dir_list;

function spider($ftp, $dir, $conn) {
//$conn = $ftp_host;
$buff = ftp_rawlist($conn, $dir);
$items = itemize_dir($buff);
foreach($items as $line=>$item)
$file = $ftp . $dir . $item['name'];  
$fh = fopen('ftp.txt', 'a') or die("can't open file");
fwrite($fh, $file . "");

  echo $file;
  echo "<BR>";
  //echo $item['name'];

if(strpos("/", $ftp, $dir . $item['name']) == TRUE) {
$ydir = "/";

if($item['type'] == 1) {
return $dir . $item['name'] . "/";
  } } }

$work = spider($ftp, '/', $conn );

while($var = 0) {  
if( $work != 0 ){
$works = spider($ftp, $work, $conn );
else {
$var = 1;}

And I do specify the ftp server in the variable: ?ftp=

View 3 Replies View Related

.htaccess Https And A Spider

I need to write a program that runs from a web server, logs into a site that uses .htaccess on a secure server, pass a string to the script on the secure server, spider all the results of the string then input all the results into a MySQL database.

Is there a way for php to log into a web site that uses .htaccess?

Is there a way to write a spider that will spider the results and populate the results into a MySQL database?

View 1 Replies View Related

Web Spider / Bot - Automatically Scroll

i need it so you submit your url then it automatically scrolls it for the description and keywords then puts it in the database. my html code:

Code: [Select]<form action="submit_url.php" method="get">
<input type="text" name="url" value="url" />
<input type="submit" name="submit" value="submit" />
php code so far:

all it does is reads the website if you put 'echo $url' at the bottom it just reads and prints the web page.

View 6 Replies View Related

Spider Site To Check For Updates

I am looking to create a spider preferably in PHP (if this cannot be done in php then any other language) to check an entire website for updates. I want to be able to have something set up to check a site when an update is made. That is the mostly what I need, just so I know when and what files are updated.
I would also like to be able to then scrape the site and compare the page that has been changed to one that I define. I have a site that I monitor and a global site that is very slightly different. I can match the pages up to compare to one another as to be aware of a change on the global site to change on my site.

View 1 Replies View Related

Local Spider &amp; Search Recomendations?

I'm looking to have my own (prebuilt and free) spider and search index on my site...I'm long overdue and I'm looking for suggestions!

- Admin Control Panel
- Ability to configure the spider to crawl aautomated times to scan or manual scan.
- Ability to have results on my own pages instead of some styled up page thats not mine nor looks like my site.

The first thing I came across is PHP Dig but I don't want to miss any gems if they make this one look like a rock.

View 1 Replies View Related

Does Search Engines Can Spider PHP Pages?

Does anyone know if the search engines can spider PHP pages?

View 6 Replies View Related

Spider To Respect The Robots.txt Protocol?

How when you use fopen() to read a web page, do you parse the text of a web page and insert it into a database? How do you follow links? How do you set up your php spider to respect the robots.txt protocol.

View 1 Replies View Related

Execute A Spider / Scraper But Without It Timing Out

I need to scrape pages for info at varying intervals, which means calling the bot at those intervals, to load a link form the database and scrap the page the link points to. The problem is, loading the bot. If I load it with javascript (like an Ajax call) the browser will throw up an error saying that the page is taking too long to respond yadda yadda yadda, plus I will have to keep the page open.

If I do it from within PHP I could probably extend the execution time to however long is needed but then if it does throw an error I don't have the access to kill the process, and nothing is displayed in the browser until the PHP execute is completed right? I was wondering if anyone had any tricks to get around this? The scraper executing by itself at various intervals without me needing to watch it the whole time.

View 4 Replies View Related

Spider That Will Grab Links Off Of A Page

I am writing a simple spider that will grab links off of a page. So far I have no problem grabbing all the links off the first seed page. But I am stuck on how I can get it to follow those links to the next page and grab additional links perpetually. Code:

View 5 Replies View Related

Copyrights 2005-15, All rights reserved