Using wget to recursively retrieve part of a website

January 9th, 2012

Sometimes it is useful to be able to download a set of pages from a website that more or less form a closure on linking.

There are tools for doing this in Firefox.

I wanted a tool that could handle retries easily (my Firefox tool can’t as far as I can tell), and could produce a set of webpages that I could transfer easily to my itouch.

wget does this quite easily.

An appropriate command-line is:

wget -r -k -np -p -nc “$1″
# -r recursive
# -k convert links for local viewing
# -np dont move into parent directory
# -p load page prerequisites (images etc)
# -nc dont download documents that are already there (could use -N to only download newer versions)

You can interrupt it, remove any partially-downloaded files, then run it again, and it will only download the files that haven’t been downloaded yet.

There are additional options available in wget to filter to specific names, directories, and so on.

Detecting memory leaks in C++

December 27th, 2011

We can easily detect memory leaks in C++ by overriding operator new and delete.

// copyright Hugh Perkins 2011
// you can freely use and distribute this code, as long as you preserve the above
// copyright notice for this module

// override operators new and delete, so we can check for memory leaks

int memoryallocated = 0;

const int maxallocations = 1000;
int memoryaddresses[maxallocations + 1];
int sizes[maxallocations + 1];
bool initializedmemory = false;
int numallocates = 0;

void *operator new( size_t size ) {
   if( !initializedmemory ) {
      //cout << "initializing memory " << endl;
      initializedmemory = true;
      for( int i = 0; i < maxallocations; i++ ) {
          memoryaddresses[i] = 0;
          sizes[i] = 0;
      }
   }
   //cout << "operator new( " << size << ")" << endl;
   memoryallocated += size;
   numallocates++;
   void *p_mem = malloc(size);
   int i = 0;
   for( i = 0; i < maxallocations; i++ ) {
      if( memoryaddresses[i] == 0 ) {
         memoryaddresses[i] = (int)p_mem;
         sizes[i] = size;
         break;
      }
   }
   if( i == maxallocations ) {
      cout << "error: no space for more memory allocations" << endl;
      abort();
   }
   //sizebyaddress[(int)p_mem] = size;
   return p_mem;
}

void operator delete( void *p ) {
   int size = 0;
   for( int i = 0; i < maxallocations; i++ ) {
      if( memoryaddresses[i] == (int)p ) {
         size = sizes[i];
         memoryaddresses[i] = 0;
         sizes[i] = 0;
         break;
      }
   }
   //cout << "operator delete()" << endl;
   memoryallocated -= size;
   free( p );
}

class MemoryChecker {
public:
   MemoryChecker();
   ~MemoryChecker();
   int memory;
   int allocates;
};

MemoryChecker::MemoryChecker() {
   this->memory = memoryallocated;
   this->allocates = numallocates;
   //cout << "memorychecker" << endl;
}

MemoryChecker::~MemoryChecker() {
   //cout << "~memorychecker" << endl;
   int memoryused = memoryallocated - memory;
   int allocatesdelta = numallocates - allocates;
   if( memoryused == 0 ) {
      cout << "Memory allocates: " << allocatesdelta << endl;
   } else {
      cout << "ERROR: memory leaked: " << memoryused << " allocates: " << allocatesdelta << endl;
   }
}

Then, if we want to detect a leak, use like this:

int main() {
   MemoryChecker checker;
   {
      // possibly leaky code
   }
}

That’s it! If it leaks, it will give a message “Error: memory leaked”.

Getting boost test to work on windows and with cmake

December 8th, 2011

Getting boost test to work on windows and with cmake

This is not the only way to get boost test to work on windows ,most likely, but it is a way which works.

Boost version used

boost 1.48.0

Building boost

It’s best just to build what you need from boost. It’s not worth trying to build all of it, unless you have lots of time and disk-space.

To build just boost test, first run bootstrap, then run b2 using the following command-line:

b2 --with-test link=shared variant=debug threading=multi runtime-link=shared stage

Setting ‘link’ to ‘shared’ works very well. I couldn’t get boost test working on windows with ‘link’ set to ‘static’, plus cmake defaults to using link=shared.

That’s the hard bit done.

A gotcha: if you’re testing using the commandline, cl, you must enable exceptions, which are not enabled by default, otherwise you will see a link error for a method boost::throw_exception().

Running cmake

Now, cmake is quite easy. In the CMakeFiles.txt, you want some lines like:

find_package(Boost COMPONENTS unit_test_framework REQUIRED )
include_directories( ${Boost_INCLUDE_DIR} )
link_libraries( ${Boost_UNIT_TEST_FRAMEWORK_LIBRARY_DEBUG} )

When you run cmake-gui on Windows, after running ‘configure’ the first time, it will give an error message asking to set BOOST_ROOT, so set BOOST_ROOT, by doing ‘add entry’, and then rerun configure, and it should work ok.

Moving wordpress site to new domain

November 22nd, 2011

This is just a note to self really, so I dont forget. Here is a summary of how to move a wordpress blog to a new domain: change wordpress domain name steps.

Going to copy and paste the contents here, in case the site goes down:

“0) prepare the new host information and new domain name address and the database host.

1) log in blog and go to Site Admin page, and in the Settings , change the WordPress Address(URL) and the Blog Address(URL) in the form and save it (when you done this, if you want to log in you should change the data in the database , it store in the table named wp_options).

2) log in phpmyadmin and select the current database, in the SQL form , you should copy the query string and paste it into the form and update the database.
a: For links and attachments in the conten. the query string is :
UPDATE `wp_posts` SET `post_content` = replace(post_content, ‘http://www.your-old-domain.com’, ‘http://www.your-new-domain.com’);
b: For the posts permalinks stored in the database. (By the way , you may skip this step):
UPDATE `wp_posts` SET `guid` = replace(guid, ‘http://www.your-old-domain.com’,’http://www.your-new-domain.com’);
c: If you created the custom filed in the post , you should copy this also:
UPDATE `wp_postmeta` SET `meta_value` = replace(meta_value, ‘http://www.your-old-domain.com’,’http://www.your-new-domain.com’);

3) export the current database and download to the local directory.

4) upload all files and import the exported database, if the database information is different , make sure check it correctly in wp-config.php in the blog root directory.

5) open browser and visit the new blog address , and there is it.”

Edit: actually, it’s more complicated than this: you probably need to update option_value in options, and a few other tables too, but using the above sql as a guideline it is fairly easy to change the others.

Finally got to ‘excellent’ karma in slashdot!

May 9th, 2011

Finally got to ‘excellent’ karma in slashdot!

http://slashdot.org/~hughperkins

Running avidemux on ubuntu without overheating a notebook

May 4th, 2011

I ran avidemux for a bit last night, to bake in some sub-titles. The core temperatures reached 80 degrees, and then the fan turned on to full, which was really loud, and didn’t sound very good for the fan.

Two methods were able to reduce the temperature:
- the best method was to install the cpu frequency scaling applet, and assign both cores to ‘powersave’
— this reduced the frequency from 1.3GHz to 1.2GHz, which doesnt sound a lot, but which had a huge effect on temperature
- it is possible possible to use schedtool to assign an application to a single processor core
— actually, this was a lot less effective than using the frequency scaling
— syntax: schedtool -a 0×1 pid

When I am at the computer, I will simply use the frequency scaling, since that reduces the temperature significantly. If I’m leaving avidemux to run unattended, I will use both frequency scaling and schedtool to make sure the computer doesn’t burn out whilst I’m not looking.

Simple samba configuration to let windows users see your files at home

May 3rd, 2011

I wanted a flat-mate to be able to copy a file off my computer. We didn’t have a usb key with enough free disk space.

I didn’t want to mess around with opening shares on their computer. I didn’t want to be responsible for a security hole on their windows computer.

So I looked at setting up samba. Took a while!

There is a tool to help set it up, which is ‘system-config-samba’. It does make life easier.

It seemed to me to be really unobvious how to set up a simple read-only share with guest access. This post solved it for me:

- change security to ‘share’: “security = share”
- create a unix account, but don’t configure it as a samba account
- assign it as the guest account: guest account = guestaccountname
- create a share
- add ‘guest ok = yes’ to the share configuration
- and ‘browseable = yes’, ‘read only = yes’

After restarting smbd and nmbd, the workgroup, server, and share should be visible on the network to windows users, and not need any sort of authentication to navigate into, or view the files inside.

Upgraded from ubuntu lynx to meerkat: couldn’t tell the difference!

May 3rd, 2011

I finally got around to upgrading from lynx to meerkat. I figured that since meerkat has been out for 6 months now, it should be reasonably stable. And certainly I haven’t seen any issues with it yet.

On the other hand I haven’t seen any changes either!
- java is still 1.6.0_20 ok, I suppose that makes sense; it’s not like java 1.7 is out yet :-P
- firefox is still 3.6
- python is still 2.6
- openoffice is still ‘openoffice’, not ‘libreoffice’
- blender is still 2.49a

Nothing seemed different really! Maybe because I did an upgrade, rather than a fresh install? Sometimes I notice that fresh installs change quite a lot more things than upgrades.

Baking subtitles into a video

May 3rd, 2011

I wanted to bake chinese subtitles into a video.

I tried mencoder, but couldn’t get it working, despite a bunch of googling.

I discovered that avidemux is available for ubuntu, and works quite well. It’s in the ubuntu repositories.

It has a gui, and it’s really easy to use.

It can handle cropping, subtitles, and other format changes, in a really user-friendly fashion.

To add subtitles, one first needs to change the video output format from ‘copy’ to … something else.

Then, one can click on ‘filters’ and add subtitles.

Gotchas for me:
- if one is using an .srt file, one might need to remove the first subtitle, the ’0′ subtitle, if it exists, otherwise a message like ‘subtitle format unknown’ appears
- I needed to change the font to /usr/share/fonts/truetype/arphic/uming.ttc , otherwise the chinese characters showed up as white squares

That’s about it. I wish I’d known about it before. It’s so much easier to use than fiddling around with ffmpeg or mencoder command-lines.

Network traffic statistical analyzer

May 3rd, 2011

Just putting this here so I can find it easily in the future.

I wanted a way of seeing how much traffic was going from my computer to other hosts. Actually, a room-mate was downloading a file from my computer, and I wanted an easy way of monitoring the progress, and seeing when it was done.

iftop did exactly what I wanted.

Thank-you to noah’s packet sniffing page for the heads-up.

For each host, you can see outgoing bandwidth, incoming, and a barchart of the relative bandwidth to each host. The figures are pretty much real-time, updating once a second or two. For example, I didn’t want to have to use tcpdump, then process a dumpfile. I just wanted a realtime display of bandwidth to each host, and iftop does exactly that.