Search Engine Blog
Here is what's happening on Wallabyup (recent posts at the top of the page).
Map Working Well - October 2025
Map buttons and layers are working really well.
There are now quick tiles up to zoom level 20 and then after zoom level 20 (L21+) it switches to a vector database lookup. The database lookup is slower but at zoom level 21+ you are only calling on the database for 1 granny flat (still quick).
Todo: fix minor bugs (i.e. proposed granny flats are too close to an existing buiding (or overlapping). Then run the "granny flat tile generation script" for 8? hours to render all of Newcastle to Wollongong granny flats.
On Hold For A Bit - 26th April 2025
Putting the web-crawling spider on hold for a bit and taking a break for a couple of weeks, site will stay online (search + map).
The granny flat map is coming along well, just have to replace a CSS "transform scale" style with a code-friendly hard coded x="0" y="0". After this change a "granny flat tile layer" will be in place on the map (which will be quicker than the current database lookup using vector graphics).
Logo Edit - 24th March 2025
It's based on the hand gestures I use when saying "wallaby up". I took a photo of my hand (imitating a wallaby paw) and just traced it in illustrator.
Slippy Map With Sydney Base Tiles Added - 1st November 2024
This
Sydney slippy map will eventually be a property map of places where you can see floor plans of possible places to build.
It's just a base map of streets/parks/etc. at the moment but I needed to put something out there just to show what I have been working on for so long.
New Server - 12th September 2024
The public facing web server has been upgraded with quicker RAM/CPU/M.2 SSD/Motherboard.
I've also switched from PC towers to instead an open shelf motherboards so the CPU fan runs quieter. You can see from the
Australian interest stats page that the Wallabyup spider/crawler bot did a lot of "recrawls" crawling recently (updating existing pages in the database)... that's due to the backlog while the backup server was online during the migration.
Downloads Page Added - 8th July 2024
You can now
download 4 different databases from Wallabyup.
If you wanted to start your own search engine, for example, you could download the "All URLs" tabbed CSV file which has the URLs of about 70 million Australian web pages. You could then do a simple script to cycle through the rows and run multiple scripts at once.
7,000 Australian Sites Deleted - 26th February 2024
I have been cleaning up and kulling various troublesome websites (spammers, invalid, etc.) over the last couple of years but this dump was bigger than usual (7,000 websites with 15 million pages dumped).
I ran a script which got websites with a lot of pages (e.g. 50,000 pages) but not many backlinks (e.g. 4 backlinks) and classed them as invalid (deleted from the Wallabyup index). Looking through the list it was a lot of undesirable content however there were websites that I liked that I didn't want to delete.
One example of a website I did
not want to delete was Techbuy.com.au which is a good electronics supplier (I sometimes have used). It has over 54,000 pages online but on closer inspection many of the pages were;
- "new products" pages going back as far as 24 years (year 2000),
- pages that grouped various terms but sometimes being empty searches, or
- other things like discontinued product pages. So many pages but not many backlinks warranted a auto-classification as invalid.
If any websites see a message in their Wal site profile "Big site, low backlinks... so send to invalid database" then you will need to use a robots meta tag to stop WallabyupBot crawling so many pages (or delete some pages). After than contact me and I will recrawl your site.
Bot Blocking By Retailers - 22nd January 2024
I have added a page about
big box retailers blocking bots from indexing their product pages. Most retailers are blocking blocks indiscriminately which makes the bottleneck for Google search even worse.
Small searching engines, whether competing with Google or providing a new way to search, often can't email the retailer as emails mentioning robots/indexing get flagged as SEO spam and go straight to the trash or an AI generated "no-reply" response.
Will Find More Links - 27th November 2023
Links recorded by the Wallabyup bot will now be recorded whether they are follow (normal link) or no follow (nofollow meta tag used). The idea is that all links will be followed to discover more websites on the Australian internet. If the link is just a plain old normal link then link juice will flow to the outlink (improved social weighting in SERPs (search engine ranking position) however if the link is a nofollow link then the link juice is excluded from tallying backlinks (no effect to SERPs (search engine ranking position).
New Social Weighting + Spammers Kulled - 20th September 2023
I have updated the scoring for the social weighting portion of the results points. Now the URL of a page gets less of the portion of total backlinks than a home page (home page gets more weighting/points).
I have also classed a bunch of websites who have thousands of pages and no (or just 1) backlinks as invalid due to too many spammers trying to hog every category and suburb in the search engine results. If you have a legit reason to be re-instated then just email me.
AU Internet Mostly Done - 28th August 2023
It looks like the WallabyupBot has found most pages on the Au internet as the "new pages to crawl" database is no longer maxed-out meaning any new pages found will be crawled without restriction (as apposed to a month ago when the database had a maximum limit and paused adding to a full database). Recrawls of existing pages and new sites will be continued to be crawled/found.
New Sites Focus - 10th August 2023
I have ramped up the 'pages crawled per day' and over the next 3 months I should find a lot more Australian websites (1,000 new .au sites per day)... so by the new year 95% of websites should be indexed. Then in the new year the bot will go deeper looking for the last 5% of websites to find.
New Quicker Server - May 2023
I've upgrade the server with new DDR5 and quicker RAM, more CPU cores, and a quicker M.2 SSD.
No More Dual Databases - New Queue System - New Pages Added - March 2023
I have stopped using a dual database setup with both MySQL and Apache Solr and am instead just using Solr for the big databases and MySQL for the small databases. Things are running much smoother now with less processing and hopefully the write speeds for my M.2 SSD storage device will last longer, and hopefully the processor and motherboard coils last longer too.
I have changed the way pages are crawled to stop the same sites on the same IP or top level domain being hit too often. Now I have a 1 minute "freshBlock" which I fill with different sites and the block is only crawled when there are enough unique IP addresses and unique top level domain names so that the same IP address doesn't get hit too often and has about a 5 second gap (on average) between crawls. I'm also doing a check every hour on a sample range of crawled sites to make sure that "pages hit on the same second" are less than 1% of traffic (rare).
The backlog of recrawls has caught up so more new pages are being found.
Catching Up On Recrawls - January 2023
The bot has to catch up recrawling existing pages in the Wallabyup index so not many
new pages are being added to the database. Hopefully over the next month the recrawls backlog will catch up and more new pages will be found in March.
Quicker Storage Drive - 22nd December 2022
I crashed another storage drive but this new replacement drive is quicker (faster page load times). Before I used a Samsung 980 Pro M.2 SSD but now I'm on the Kingston KC3000 M.2 SSD. The site was online (not much downtime) while I installed the new OS but the spider that crawls the web was offline for a while.
Other Aussie Search Engines - 1st December 2022
The only other search engine written by an Australian, seems to be
Bonzamate written by
Ben Boyter.
It's a site that doesn't appear to use social weighting/popular backlinks in rankings and only indexes .au domains.
I can't find Wallabyup in it's index and his sites are not in Wallabyup. So next time the Wallabyup crawls this page, Bonzamate should appear in Wal results if the query "Australian Search Engine" is searched.
More Sites, Better Results - 28th November 2022
I fixed a bug where more new sites will be found now. Sites like Jetstar.com do not redirect the base URL to their httpS page.
I re-did anchor text weighting (on the index page) to prioritise sites with a higher score. I also re-did the site-profile.php page to prioritise sites with a lower score (spam backlinks are prioritised).
There was a bug relating to how the spider preferenced recrawls/inlinks/outlinks disproportionately (now the spider prioritises recrawls so the database index is fresh).
New Algorithm - 14th November 2022
I have simplified the main database query to cut search times to 1/3rd of what it was but I still have a 1/2 second leak I need to investigate.
New Scoring Criteria - 7th November 2022
I have added a "keywords across site column" in the scoring calc. If the site has a theme with your search query then it will get points. This is in addition to if the query is in the domain name then it will also get points.
New Domain Name - 4th November 2022
.au domains (no .com/net/org needed) are now a thing so I did a switch-o-roo from wallabyup.com to wallabyup.au and also added a short URL wal1.au.
Down Time - 1st November 2022
I moved house and had 3 days downtime. Next time I move I will spend some money to duplicate setups so the new site can be tested and working before switching off the old site. Sorry to everyone who uses Wal... both of you are valued searchers.
Priorities - 11th October 2022
I'm focusing more on my
housing for the disabled website rather than this site however I have some priorities I wish to do. Wal is being worked on 1 morning a week.
Words Across Site Score - 5th September 2022
I'm thinking of doing a keywords across website column in the scoring calculation. At the moment if the domain name has the users query in it then the site gets a ranking boost but it would be better, if say, the query words were in more than 20%? of pages then it should also get a boost. This way domain names that might be creatively named (rather than use keywords) can get a boost.
Edit: the words across site score is now operational.
Finding More Pages Focus - 17 June 2022
The bot has caught up to updating the recrawl backlog. Now all legit pages in the index have been crawled recently (the index is fresh). The bot will now focus on finding more Australian pages.
Recrawl Focus - 8th June 2022
Within 2 weeks the recrawl backlog should be caught up (to keep the database up to date with the latest content). After that the bot will focus
not on recrawls but rather on finding more pages from Australian sites with an accommodation for more internal pages from popular sites (popular meaning more sites with more backlinks).
Recrawls Improved - 27th May 2022
The recrawl rate should be improved now (it's doubled). I should be able to recrawl 500K pages a day to keep the database up to date. Plus 100K invalid pages and 100K? new pages = 700K total page crawls per day. I might tinker with things and try get to 1 million total page crawls a day.
Recrawls - 10th May 2022
I'm shifting the focus for the Wallabyup bot away from finding new sites to instead recrawling existing pages to keep the database fresh (kull 404ed pages from the database that go offline).
SiteHammer - 2nd May 2022
When a search result is classed as a spammer* it gets collapsed into a SiteHammer JavaScript toggle (you have to click to view the spammer's result). I changed the criteria from "if a spammer found" to instead do a calculation based on a ratio of legit backlinks and how many spammers they link to.
An example is that 2 websites can link to thousands of websites where a small percent will be spammers but the legit website with lots of non-spam-flagged backlinks will not get SiteHammered (whereas the spammer with no backlinks
does get SiteHammered).
* for things like 1,000 subdomains, linking to a known spammer, etc.
Surrounding Suburbs List - April 2022
I've added a function where if a suburb is detected in a searchers query it will list the surrounding suburbs as check boxes (closest distance first). It's handy for if you're searching for something but you don't know the suburb name because it's new or a long aboriginal name. Example: when your search query is "Camden" it suggest; "Elderslie", "Narellan", etc. checkboxes.
Quicker Web Crawler - 29th March 2022
Some adjustments were made to the database (quicker processing) with the web crawler recently having recorded 700,000 pages in 1 day.
The total page count in index is approaching 30 million with most pages not changing (site admins host manny static pages) and therefore not needing to be crawled too often. The pages that change (e.g. news home pages, etc.) will be crawled more regularly.
Thanks + Wallabyup Backup - 23rd March 2022
The site is getting more popular with both visitors doubling the traffic for the month. Neil was one of the visitors and he emailed me with a message of support. Thanks Neil for taking the time to contact me.
Also... The VPS is now online but I'm just using it as a backup as my home setup is slightly quicker.
Now I have both;
1) a LAN home web server, and
2) a remote VPS web server
... to chose from.
Edit: the VPS didn't work out so it is no longer in use.
Synonyms Added - 4th March 2023
After over a week of trial and error coding I have mostly improved the synonym suggestions.
My method is to;
- search the database using the searchers query (e.g. "dog adopt"),
- then get valued words from the returned results query (e.g. dog
s, pet, adopt
ion, home, rescue, contact, help, etc.),
- then using both the searchers query and "results valued words" get the root word for queries (e.g. from "adopt" from "adoption"),
- then get suffixes using the root word (get dogs from root dog, adoption from root adopt... but NOT "home", "contact", "help", etc.),
- then kull low quality words from the list (get top 15 words and kull the rest),
- then do the main database query (with searches query and suffixes),
- then cycle though main database results and build a title/heading text block,
- then look for word combinations of the searchers query to find double words like "home loan" and "real estate",
- then return suffixes (checked checkboxes), and synonyms (unchecked checkboxes).
Page Load Times Fixed - 14th February 2022
Page load times were slow for a day but are now down to a fraction of a second.
Format Changed + Plus Searches + SiteHammer - 12th February 2022
All databases have been converted to Apache Solr fulltext searching (previously just the main "page content" database was on Solr with the rest on SQL). Now both SQL and Solr indexes run in duplication.
After a few days of running duplicate databases there is less than 1% row count difference between SQL and Solr which will mean 1% results are missing... in other words I am working on finding out why some pages are missing between the 2 different versions.
Plus searches have been added. Going from
dog adopt Riverstone to
dog adopt +Riverstone means the page
must contain the word "Riverstone".
SiteHammer bans (offending results collapsed and only displayable via a toggle) have been adjusted to be less aggressive. E.g. if 2 or more websites on the same IP are in results then the 2nd and after websites are SiteHammered (only the 1st result is shown). This SiteHammers a lot of legitimate 2nd-owned sites, includes sites on my own network, but bans many more spammers.