Go Back   Computer Forums > Web Design | Website Development > Web Programming
Join Computer forums Today

Thread Tools Search this Thread Display Modes
Old 03-10-2010, 08:31 PM   #1
Baseband Member
Join Date: Feb 2010
Posts: 22
Default compiling data from databases...

Aloha, I am attempting to make a small web page where I can enter a list of urls and it will take certain data about those pages and make a list that has all the data in it.

In more detail...

I am a avid reader of fanfiction and have a list of about 500 books that I have or need to read. I am looking to be able to take that list of urls and have it grab the summary information about each of those stories and compile a list for me.

I am not asking anyone to do this for me but, since I only know HTML, it would be great if someone was willing to point me in the right direction and give me some helpful information. I would help if I knew what "language" would be the easiest to do this in...it would be really great if some one would give me an idea of the process that I would need code to do this...and it would be nice to have a few key terms that research to be able to get this done with out having to learn ALL the functions of said "language".

I am wanting to take links list this...
And have it print out a list that will list things like Story name, author, world count, ratting, etc. Here is an example of what it would look like...
Any help and ideas would be great...

herbycanopy is offline   Reply With Quote
Old 03-15-2010, 12:34 PM   #2
Baseband Member
Join Date: Feb 2010
Posts: 22
Default Re: compiling data from databases...

Thanks for the tips on things to research people as it turns out imacros seems like it might work best.

herbycanopy is offline   Reply With Quote
Old 04-21-2010, 09:11 PM   #3
In Runtime
Daeva's Avatar
Join Date: Dec 2005
Posts: 407
Send a message via AIM to Daeva Send a message via MSN to Daeva Send a message via Yahoo to Daeva
Default Re: compiling data from databases...

You have awoken me from my posting slumber, so thanks.

What you are asking about could be extremely complicated depending on how robust you want the solution to be.

The core functionality you are looking for is referred to as "Screen-scraping". There is a right way and a wrong way to do this (the wrong way will get the IP address you run it from blocked in some cases).

Essentially what you are asking about is a rudimentary web crawler.

Here is my advice to you:

1.) Don't make the script that searches the web-pages a web-page (if that makes sense).
2.) Instead, make this part of your "project" a program (executable) that runs on a computer and goes out to those websites, grabs the info and updates a database (Java is free and there are many powerful IDE's that will help you get started quickly).
Other possible alternatives include: c++,.NET(IDE is not free), php and Ruby. There are others but these are the languages I would recommend, not necessarily in any order.
3.) Once you have the database populated you SHOULD use a web-interface to display the findings that have been placed in the database.

As for HOW to do it. That is far beyond the scope of my post but here are some links to get you pointed in the right direction.

http://www.4guysfromrolla.com/webtech/070601-1.shtml <-- Love this one.



Keep in mind Screen Scraping is NOT a basic task to perform.

Here is a list of already written screen-scrapers that you could potentially use to populate your database.


Hope this helps.

Please keep in mind that in some cases, this can be considered an "unfavorable" activity at best when it is done incorrectly or for malicious reasons.
**Official Self-proclaimed glorified excessive (insert additional adjectives here) post editor/modifier.
Edit = Best feature ever
Daeva is offline   Reply With Quote
Old 04-21-2010, 09:48 PM   #4
Baseband Member
Join Date: Feb 2010
Posts: 22
Default Re: compiling data from databases...

I have been using iRobot for my screen scrapping, exporting that to a XML file then using XSL to convert all that information into a readable format. Once that is done I copy that to MS Word then print to a PDF file.

It may not be the fastest way of doing this but it really is not that bad because iRobot was a free and easy to learn program and XSL took me about 10 minutes to learn using the tutorials at 3wschool.

This has worked so well for me that I have made my list go from 500+ books to 50,000+ books, though I am looking for a nice way to save them all as pdf's right now...lol. There is program called Ficfiction downloader that works for that but you have to do them one at a time and the and I can not find a non-buggy macro program plus then you can not use your computer when it is running...lol. All-in-all this is alot of work just so that I can read them with my mp3 player.

This is all I had to code...
<xsl:for-each select="books/Book">
	<font color="#0000FF"><xsl:value-of select="Story"/></font> - <font color="#FF0000"><xsl:value-of select="Author"/></font><br />
	<xsl:value-of select="Summary"/><br />
    <font color="#CCCCCC"><xsl:value-of select="Crossover"/> - <xsl:value-of select="Genre"/> - <xsl:value-of select="Ships"/><br />	
    Chapters: <xsl:value-of select="Chapters"/> Word Count: <xsl:value-of select="Words"/> Rating: <xsl:value-of select="Rated"/> Reviews: <xsl:value-of select="Rated"/><br />
    Updated: <xsl:value-of select="Updated"/> Published: <xsl:value-of select="Published"/></font><br />
    <br />
Though I will admit that I really still do not understand what XSL-FO is for...lol.
herbycanopy is offline   Reply With Quote
Old 04-22-2010, 06:04 PM   #5
Site Team
berry120's Avatar
Join Date: Jul 2009
Location: England, UK
Posts: 3,422
Default Re: compiling data from databases...

Perhaps jumping in a bit late, but:

.NET(IDE is not free)
There are free IDEs out there - express edition of visual studio is free, and there's 3rd party ones like sharpdevelop which are also free.

Another thing to note is that screen scraping is notoriously unreliable - fine if you just want to grab things once, but don't expect to say run the program again next year and have the same books turn up. If the HTML even changes slightly, depending on how you've coded the thing, it could still throw everything off enough to make the information unusable...
Save the whales, feed the hungry, free the mallocs.
berry120 is offline   Reply With Quote
Old 04-22-2010, 06:08 PM   #6
Baseband Member
Join Date: Feb 2010
Posts: 22
Default Re: compiling data from databases...

Yeah that is very true, I found that out the hard way already then the site I was using changed the html for the search results. Though with iRobot it only took me about 10 minutes to find the change and fix it.

herbycanopy is offline   Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

All times are GMT -5. The time now is 01:51 PM.

Powered by vBulletin® Version 3.8.8 Beta 4
Copyright ©2000 - 2016, vBulletin Solutions, Inc.
Search Engine Friendly URLs by vBSEO 3.6.0