Tracking Forums, Newsgroups, Maling Lists
Home Scripts Tutorials Tracker Forums
  Advanced Search
  HOME    TRACKER    PHP


Advertisements:




SuperbHosting.net & Arvixe.com have generously sponsored dedicated servers and web hosting to ensure a reliable and scalable dedicated hosting solution for BigResource.com.







Web Crawler PHP


I am supposed to construct a page that searches in specific websites to extract information, like those sites from where you can rent a car for example. There is a form in the site where the user selects some fields (for instance departure and drop-off date), then the data are submitted to the other page that searches 2-3 sites and finds which cars are available on those dates.

I wanted to ask if there are ready scripts to do that, if not, some hints on how to start. I am familiar with PHP forms and data extraction from mysql databases, but when you extract data from other sites, I have no clue how I can begin and deal with it...


View 2 Replies (Posted: 11-02-2006, 08:05 PM)

Sponsored Links:

Related Forum Messages:
Based Web Crawler Or JAVA Based Web Crawler?
i have some doubt about PHP based web crawlers,can it run like the java thread based one? i am asking it because, in java the thread can be executed again and again, i dont think, PHP have something like thread function, can you guys please say, which web crawler will be more use full?A PHP Based or A Java Based

Posted: Jul 27 10 at 7:52

View 2 Replies!   View Related
PHP Web Crawler
I am working on a PHP Web Crawler and am having trouble parsing links out of a page all that happens is that array is printed out here is the script.

<?
$f = fopen("http://www.google.com","r");
$inputStream = fread($f,65535);
fclose($f);
if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) {
$matches= strip_tags($matches);
print_r($matches);
}
?>

Can some one please help me?

Posted: 09-12-2006, 08:59 PM

View 1 Replies!   View Related
Web Crawler
I have a script that parses out links in a page, now I want to figure out how to follow those links. Here is the script:

<?
$f = fopen("http://www.theotaku.com","r");
while( $buf = fgets($f,1024) )
{
preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$buf,$words);

for( $i = 0; $words[$i]; $i++ )
{
for( $j = 0; $words[$i][$j]; $j++ )
{
$cur_word = strtolower($words[$i][$j]);
print "Indexing: $cur_word<br>";
}
}

Posted: 09-24-2006, 08:55 AM

View 1 Replies!   View Related
Php Based Crawler
The problem is, im trying to make a central portal so that all of ma frieds blogs recent post can be seen on it. so that its easy to see who posted wot and all...

the process needs to be that when i add a URL, the crawler then keeps cheking on the URL's. if theres a new post made it has to appear on ma central portal with the title and descriotion..

so is there a way to do this or any script out there that is currently doing this..

Posted: 8:34 am on Oct. 11, 2006

View 1 Replies!   View Related
PHP: BOT, Web, Crawler, Spider ?
I am looking for how to make one of these, but my searches on google and other search engines turn to failure. I have yet to produce the results I need to produce... Im trying to make a bot that is capable of logging into a site, storing cookies and going between two pages, to keep me logged in.

while im not actually on the page. This is not going to be used for... 'Cheating' purposes of any like. I merely want to fake my logged in time on a site whicth is actually my school page.

Posted: September 05, 2007, 02:50:28 PM

View 1 Replies!   View Related
Image Crawler
how to script image crawler? i'm developing using windows OS and php4. is it true that we can manipulate image easily using php5 only?

Posted: 03-03-2006, 07:48 AM

View 1 Replies!   View Related
Crawler Identifier
I am running a website with specific functions which collects informations about users preferences on that website. But often crawlers came to my site and my scripts insert records about their visits. Is there a quick and easy solution to identify crawlers so I could neglect crawler informations.

Posted: 06-26-2006, 02:59 PM

View 3 Replies!   View Related
Keep Crawler In One Domain?
I'm writing a simple php crawler, essentially a class which recursively crawls the website by detecting link tags and going deeper. The problem is that I would like to contain it within the domain it's crawling, otherwise it will start following links to other domains and start a never ending chain reaction. My idea was to scrabble a regex that would dissect the bare domain name of the website (eg. domain.com) and check every link against it. The regex itself will have to be quite long, since I will have to include all TLDs in it.

UPDATE: parse_url is not a solution - only can give HOST name not DOMAIN name.

Posted: May 21 at 9:02

View 1 Replies!   View Related
Parse A Url For Crawler?
i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this../tets/index.htmlif it is /test/index.html we can add with base url http://www.example.com/test/index.html

Posted: Sep 6 10 at 15:18

View 3 Replies!   View Related
What Is The Working Of Web Crawler?
Will web crawler crawl the web and create a database of the web or it will just create a searchable index of web? If suppose it creates an index, who will exactly will gather the data of web pages and store it in database?

Posted: Aug 17 10 at 2:45

View 1 Replies!   View Related
Checking If Referrer Is Web Crawler
I have a book affiliate website. Whenever a visitor clicks on one
of the books, a script adds one to a field in a mysql database and then
takes the visitor to the shopping basket on the book website.

I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?

I've checked HTTP_REFERER but it seems to be empty for what I assume
are crawled clicks.

Posted: June 28th, 2006 11:45 AM

View 4 Replies!   View Related
Visiting Other Sites (crawler)
is it possible to visit other sites in PHP, in the code. I need this for a crawler for my search engine.

Posted: December 16th, 2001, 07:53 AM

View 2 Replies!   View Related
Website Crawler And Indexer
i am trying to create a crawler and indexer for my site and its search page. what i want to know is, is there an easy way for me to extract each link for a page or is it possible to do this with a php function. I am doing it this way cos i am gonna have a crawler that logs all the links with my site an then a indexer will go along and index the page and it contents.

Posted: 05-11-2007, 07:01 PM

View 3 Replies!   View Related
Make A Simple Crawler?
I have a web page with a bunch of links. I want to write a script which would dump all the data contained in those links in a local file.

Has anybody done that with PHP? General guidelines and gotchas would suffice as an answer.

Posted: Feb 22 10 at 18:23

View 6 Replies!   View Related
Web Crawler For Competitive Pricing
I am thinking of writing an application that will pseudo-track competing websites to ensure that our prices stay competitive, etc. I looked at possibly using the Google Shopping Search API, but I felt that it could possibly be lacking in flexibility and not all of our competitors are fully listed or updated regularly. My question, is where is a good place to start with a PHP based webcrawler? I obviously want a crawler that is respectful (even to our competitors), so it will hopefully obey the robots.txt and throttling. (To be fair, I think I am even going to host this on a third party server and have it crawl our websites to show no biases.) I looked around via google and I couldn't find any mature packages -- only some poorly written sourceforge scripts that haven't been maintained in over a year, despite being labeled as beta or alpha.

Posted: Jan 18 at 19:06

View 1 Replies!   View Related
Get Data From Crawler To Site
what is the best way to get data from external crawler to my DATA BASE, to my site i work in LAMP environment, is web services is good idea ? the crawler run every 15 minutes.

Posted: Jun 15 09 at 8:22

View 1 Replies!   View Related
Build A Web Crawler Such As MLBot?
I am looking to build a web crawler such as MLBot. It must recognise robots.txt and ROBOTS meta tag, but in saying that when a site such as Wordpress shows visitor stats it lists the crawler (eg MLBot [URL].... So how can I build a crawler that will list as HackAliveBot or HackAliveCrawler and will recognise robots.txt and ROBOTS meta tag

Posted: June 08, 2010, 03:32:39 AM

View 1 Replies!   View Related
Create A Web Crawler In Python?
i created a web crawler in python and it gets data from specific parts in websites and stores this data in a mysql database which is later displayed in my website. however when i display the data in my website it appears with weird characters like this:After many years of theft, there�s still more to steal and Here�s how to reclaim forests,notice the question mark in the triangle. when i used the function mb_detect_encoding, it tells me the data is in ascii yet the default collation is latin_swedish_ci, but when i save the data in the database i override the default and use utf-8 instead please tell me what could be wrong.

Posted: Jul 29th, 2009

View 1 Replies!   View Related
Teaching A Crawler To Identify A Blog
I am currently trying to teach a web crawler how to identify blogs,
that is I am trying to determine a fairly inclusive set of criteria
that will help my crawler to identify them.

I have noticed that many Blogs include

div class=blogsomething (A format class conveniantly named blog)

xml tags

and/or php code.

I do know that cms(content management system) is used for several
blogs, does anyone else have any suggestions to help me determine
criteria.

I am aware that any criteria is subjective, especially when
considering sites such as slashdot which has been around longer than
Blogs...

Posted: July 17th, 2005 09:42 AM

View 2 Replies!   View Related
Crawler For Ajax Based Websites?
Maybe this is gonna sound naive and all, but is there something even remotely close to a php crawler for ajax based websites?

Posted: May 20 at 10:50

View 2 Replies!   View Related
Creating A Simple Site Crawler
Basically, for my final year project I am making a webcomic site. But i wanted a feature that told you when comics hosted on external servers (such as explosm, penny arcade etc) were updated.

I know this can be done, it's done on [URL]. I've tried looking into the technology needed, and as far as I can see, I need CURL.

I was just wondering if anyone could maybe point me in the right direction to a tutorial, pseudo code, or if this is possible to implement (at least for a few major comics) by a week on wednesday.

and if any one is interested, my site is [URL], still heavily under construction

Posted: Apr 21, 2009, 02:44

View 3 Replies!   View Related
Simple Crawler To Echo Links?
I wanted to make a simple crawler in php that would let me get the links in a web page, echo their url, and crawl to other pages to do the same under a certain domain. Would using cURL be necessary here? Also..how would one specify depth of the crawler. I have this so far :

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach( $dom->getElementsByTagName('a') as $node ) {
[code]....

Posted: Jul 6 at 1:21

View 1 Replies!   View Related
Mp3 Link Crawler For Dynamic Links?
i am writing an crawler that will go around a specific set of websites and crawl all the mp3 links into the database. I don't want to download the files, just crawl the link, index them and be able to search them.using php and how some sites linke [URL]....

Posted: Mar 18 10 at 6:54

View 1 Replies!   View Related
Crawler Script Suddenly End With No Error?
I have written a web crawler script. It will visit a large number of URL's with cURL.

After around 2-3 minutes of running, it will just stop, with no error output or notices.

I have these settings:
Code: [Select]set_time_limit(0);
ini_set('display_errors',1);
error_reporting(E_ALL|E_STRICT);

Posted: October 24, 2009, 06:39:49 PM

View 6 Replies!   View Related
Crawler Index/crawl Session?
I am new to php and want to know if I store data in php session in a page will crawlers crawl the data in the sessions? Will crawler still crawl the rest of the page?

Posted: 03-11-2011, 01:51 AM

View 1 Replies!   View Related
Site Crawler Died While It's Running?
I wrote a site crawler to get links and images to create site map but it killed while running! so it's not my whole class

class pageCrawler {
.......
private $links = array();
public function __construct ( $url ) {
ignore_user_abort ( true );
set_time_limit ( 0 );
register_shutdown_function ( array ( $this, 'callRegisteredShutdown' ) );
$this->host = $urlParts [ 'host' ];

[Code]...

it's general trend of my class it's work but suddenly it's crashed suddenly. i set set_time_limit(0) to do forever but my process dosent't finish because my shoutdoown function dosent execute !

Posted: Jun 27 at 12:53

View 1 Replies!   View Related
Searching Flash Crawler Script?
i am searching flash crawler script in php.but i did not found last two days

Posted: July 16th, 2009, 06:51 AM

View 1 Replies!   View Related
Showing Crawler Agent Display Name In Forums End. Etc. ?
My crawler is best , but " Don't Showing Crawler Agent display name in forums end. etc. ( and error this "only , Unnamed Spider, Unknown Spider , Unknows Crawler etc. ) Code:


Posted: 10-06-2006, 10:35 AM

View 2 Replies!   View Related
Write A Web Crawler For Specific User Agent?
I need to write a web crawler, and want to be able to crawl using a known user agent. For example, I want my crawler to act as an iphone to crawl the mobile site of a website, then crawl again using Mozilla PC agent, etc.

That way, Ill be able to crawl every "type" of site (mobile & PC). However, I also want to be able to set my crawler's user agent, so webmasters also see in their stats that it's a crawler that visited their whole website, not real users.

So my question is, do you guys know how to set a mobile agent + a crawler agent at the same time, in PHP?

Posted: May 14 at 14:39

View 3 Replies!   View Related
Crawler Coding: Determine If Pages Have Been Crawled?
I am working on a crawler in PHP that expects m URLs at which it finds a set of n links to n pages (internal pages) which are crawled for data. Links may be added or removed from the n set of links. I need to keep track of the links/pages so that i know which have been crawled, which ones are removed and which ones are new.How should i go about to keep track of which m and n pages are crawled so that next crawl fetches new urls, re-checks still existing urls and ignores obsolete urls?

Posted: Aug 27 10 at 23:46

View 1 Replies!   View Related
Crawler - Rebuild Safari Web Clip Functionality?
is there a way to rebuild Mac OSX Snow Leopard's Dashboard Widget 'Web Clip' on a PHP website?Something like a crawler or scraper.I thought about using file_get_contents to getting the page content into the page, but how do I select a section on the external page? And does this work with session/login content as well?

Posted: Apr 6 10 at 23:47

View 1 Replies!   View Related
Hyperlink - Web Crawler Links/page Logic?
I'm writing a basic crawler that simply caches pages with PHP.All it does is use get_file_contents to get contents of a webpage and regex to get all the links out <a href="URL">DESCRIPTION</a> - at the moment it returns:

Array {
[url] => URL
[desc] => DESCRIPTION
}

The problem I'm having is figuring out the logic behind determining whether the page link is local or sussing out whether it may be in a completely different local directory.It could be any number of combinations: i.e. href="../folder/folder2/blah/page.html" or href="google.com" or href="page.html" - the possibilities are endless.

Posted: Dec 11 08 at 22:45

View 3 Replies!   View Related
Web Crawler - Find External Links And Get Data?
Possible Duplicate: Finding and Printing all Links within a DIV I'm trying to make a mini crawler..when i specify a site.. it does file_get_contents()..then get the data i want.. which i've already done.. now i want to add code that enables it to find..any external links on the site it is on.. and get the data ..basically..instead of me specifying a site..it just follows external links and get the data if available...here is what i have..

<?php
$link = strip_tags($_GET['s']);
[code]....

Posted: Aug 15 10 at 6:36

View 2 Replies!   View Related
Make A Crawler To Fetch Particular Web Page's Content?
i try to make a crawler that crawls a web page & retrieves the stock information from google,but can't do it .

Posted: Jan 3rd, 2008

View 5 Replies!   View Related
Crawler Mandatory Agecheck Page In Drupal?
we have a big community website build in drupal, where the site has a mandatory agecheck before you can access the content of the website it checks for a cookie to be present, if not, you get redirected to the agecheck page.now we believe crawlers get stuck on this part, they get redirected to the agecheck and never get to crawl the full website.what would be the best way to deal with something like this?

one of the issues with crawlers is also that when someone in the community posts something to his wall on facebook, facebook crawls the page back to fetch images and description(which are specified in meta tags)but facebook gets also redirected to the agecheck page.would a useragentcheck work if i add the facebook crawler?if so:would anyone know the facebook crawlers exact name then?The solution below is one that we also came a cross on the net.if adding the facebook crawler to that list works then it would solve all the problems we are having with this agecheck page.

Posted: Aug 19 09 at 10:18

View 2 Replies!   View Related
Web Crawler - Search Certain Sites (remote) For Certain Type Of Files
I want to search certain sites (remote) for certain type of files. I dont know from where to start.

Posted: July 11, 2007, 03:54:16 PM

View 4 Replies!   View Related
Web Crawler - Doesn't Update Page Until Finishing Loading
I want to write a crawler script with php and it is necessery to show pages which is indexing online. however, php doesn't update page real time, sometimes it write a few echos together and wait until finishing loading, sometimes nothing seems in page until finishing loading. here is an example about what I'm talking:

<?php
echo '1<br>';
sleep(2);
echo '2<br>';
sleep(2);
echo '3<br>';
sleep(2);
echo '4<br>';
?>

I tried on wamp and lamp and results were same. is there any way to show echos real time?

note: I found an online crawler which has this feature: [URL]

Posted: May 7 at 13:56

View 2 Replies!   View Related
Redirection Affects Way Crawler Or Robot Views Website?
for example if in my index.php i have something like:

<?php
header('Location: /mypublicsite/index.php');
?>

what do the crawlers and/or robots get? just a blank page? or they actually arrive to /mypublicsite/index.php?

Posted: Aug 25 10 at 9:47

View 4 Replies!   View Related
Crawler - Generate A List Of All The Pages Contained In A Website Programmatically?
How is it possibe to generate a list of all the pages of a given website programatically using PHP?

What I'm basically trying to achieve is to generate something like an sitemap, in nested unordered list with links for all the pages contained in a website.

Posted: Jan 28 10 at 1:50

View 2 Replies!   View Related
Copyright © 2005-08 www.BigResource.com, All rights reserved