Children Charitable Foundation
PO Box 4481, Kumasi, Ashanti Region, Ghana tel: 00233268330902 email:
The wikipedia for schools download schools-wiki is a great resource of information for children; and is particularly appropriate for developing countries. This is because whereas children in Europe may take the Internet for granted, Internet in developing countries is either absent, unreliable or expensive.For instance even in 2012 the average monthly wages is still around the 100cedis mark, yet a months worth of USB dongle Internet credit is around 30 cedis.
The wikipedia for schools download is basically a bunch of htm pages and image files, there is no software with it; it does have a rudimentary contents system of picture icons and text links. However since there is a mass of information (20 million words) a search system would be more than useful.
Basically four elements are needed for a search facility of the school-wiki download:
a database server
a software script that searches the data
web servers are the software that are installed on Internet servers, and also on " intranet "
(offline) networks.The simple solution of getting an all in one MySQl database, phpMyAdmin, and a web server which
is php enabled is to use XAMPP from apache-friends. For Linux all you have to do is download the tarball,
unpack it in in /opt directory. See the documentation at apache-friends:
So far, so good, how are we going to get data?If you look at the anchor link below, which is a link from the school wiki, you will see that the words albert einstein between “ > and < /a > in the anchor link below, are very relevant to a link that goes to a page about Einstein.
<a href="../../wp/a/Albert_Einstein.htm" >Albert Einstein</a>
Basically then all thats needed for a database is to get anchor links from all htm pages in the schools wiki, and take out the text between “ > and < /a > and use it as search-able " key words " There's different ways of doing it, but starting with the grep usage below we will get
grep -ohP '<a href.*?</a>' wp/a -r > output.txt
a page of anchor links written out to a file called output.txt. After that we need to strip out the text between “ > and < /a > arrange it next to the full anchor link, and give each line a key value.
You can use php to do this and the basic method is to take each full anchor link, identify the position in the string of “ > then the position in the string of < /a > and the bit we want will be from the location of one to the other.
In other words if the characters " > start at position 39 (call this position x) and < / a > characters start at position 58 (call this y) in the text string above , then the characters that make up " albert einstein " in computer terms,will be x+1 for the next y-x positions.
This has been done using the steps outlined above and here is a
beta MySQl dump:
MySQl dump : see sql dump
Download the sql dump get sql dump download
After that all we will need is a simple search page with a text box ,for users to submit search
terms, and code to query our database. Thats been done and you can get it here:
All files in the school-wiki download point to an index.htm file, which doesn't incorporate a search script
;to get around this I have amended index.htm with a small piece of code which will re-direct to wiki-index.php ,
(which does have, a search script Incorporated into it). Therefore you need to replace the index.htm when you unpack
the school-wiki download with my amended file. Get the amended file here:
Putting it all together(instructions Linux)
get XAMPP tarball, and download it to /opt directory. Unpack the tarball; once unpacked you will see in opt directory, a lampp directory. Inside the lampp directory you will see several directories including htdocs.htdocs is where your web files, and the school-wiki goes. So download the school-wiki tarball to the htdocs directory and unpack it. Then put my wiki-index.php and index.htm files also into htdocs. Neither of these files will conflict with xampp since it uses index.php as its main file. Start xampp from the command line(logged in as root) using : /opt/lampp/lampp start
Then fire up browser and go first to :
This takes you to the main XAMPP page;using the phpMyAdmin link, create a database called "wiki"
then download simple.sql to the desktop.
Use the import tab to import simple.sql into the wiki database.After that with your browser go to
This will take you to the school-wiki page with a search text box;simply type in a key word say " einstein" you should get link relevant to your search word.
If there is a problem you may have to edit the " user" and " password" lines in wiki-index.php ; try " root" for user and " " for pass.Otherwise configure your own MySQl user and pass in XAMPP, and amend wiki-index.php accordingly