Google and ‘The Dead Past’

Earlier this year we saw the launch of Google’s Street View system – here – and with it came a plethora of complaints about the invasion of privacy implications.   I was one of the happy complainants – Google had a view right through my house, showing people in the house.  To be honest, their reaction was swift and the imagery was removed, but it was an invasion of privacy and I’m still to be convinced that there is any long term gain to be obtained from the system.  yes, I’m aware of all the ‘well, you can see what a neighbourhood’s like before buying a house there’ arguments, but if you do all your checking out of the largest investment you’ll ever make on the Internet then you deserve to find yourself living between a Crack Den and a student house.

Enough…step back and breathe…the title of this piece is ‘Google and ‘The Dead Past’ – now what on Earth do I mean by that?

Science Fiction afficionados amongst you may recognise part of the title as coming from an old story by Isaac Asimov, in which a researcher develops a time viewer to look in to the past, only to eventually realise that the past starts exactly a fraction of a second ago – for all practical, human purposes, the past, to his machine, is identical to the present.  He’s accidentally invented the world’s finest surveillance machine.  As a character says at the end of the story – ‘Happy goldfish bowl to you, to me, to everyone, and may each of you fry in hell forever.’

Now, there’s a looooong way to go between Google and eternal damnation through surveillance, but as is often pointed out, the road to Hell is firstly paved with good itentions and always starts with a single step.  Let’s do soem of that old style extrapolation, though, and see what we’ve got coming up in our future.  Here are a few things that have been posited and talked about as being part of our online future,  some of which are already here, some of which are extrapolations, all of which are technically feasible, if not yet politically acceptable.

  1. Decreased latency between changes in the online world and those changes turning up in Search Engines.  At the moment we might expect a day or so even on busy sites regularly trawled by search engines – a possible future might be that items get folded in to search space within hours.  We’re also already heading towards Tweets being searchable – perhaps future APIs will allow combined searches of facebook, Twitter and general webspace all in one shot?
  2. Use of  ‘mechanical turk’ approaches in encouraging people to use their spare time to classify images, scan online video, etc.  to tag media that are currently not searchable by search engines in their raw form. Imagine that being idone in near real-time.  DARPA are already researching tools to extract context out of text and digitised speech; perhaps some degree of automated scanning of video will follow.  And it’s not outlandish to suggest that what might be useful for the military will sooner or later find its way into civillian online life.
  3. The possibilities inherent in IP Version 6 for a massively enlarged Internet Protocol addressing space make it easier than ever to ensure that everything that can have a separate IP address will have a separate IP address.  Combine that with the geolocation capabilities that come with reduced cost GPS chip sets – many phones now have GPS built in – and the tracking of devices (and their owners) in real time or near real time, sold to us as extensions of the social media experience, becomes a reality.
  4. The increasing usage of ‘Cloud’ computing where everything about you is stored not on your computer or phone but on a ‘cloud’ storage system run by your phone company (T-Mobile?), software supplier (Microsoft?), media seller (Amazon?) puts all your digital life in to teh network – where it can be scanned and examined in transit or in storage.

Add to the technical advances the willingness for peopel to share their activities via Social media (or eventually the commoditisation of their activity patterns and media interests, as ISPs and phone companies realise that people will give up a lot of privacy for cheaper connectivity) and we are perhaps heading towards the science fiction scenario described above.

If people were concerned about the impact of Street View on their lives – a single snapshot taken as a one off – imagine the possible impact of your real-life world being captured as a mosaic by different sources and then being rendered and made searchable by interconnected search tools.  A phone call positions you in one place, photographs taken on the same phone and geo-tagged by the software are sent to a searchable social media site and so identify who you were with and when.  You show up in other photos,  as a recipient of a call from another phone, and so on.  The other evening I was asked ‘Who doesn’t want to be tagged in these photos?’ – the new social nicety for people who are concerned over the privacy of their friends.   Sooner or later I’m certain that nicety will slip by the wayside, and it will be up to us to police our own image online.

A recent business enterprise where people are being asked to monitor CCTV cameras in their spare time  – Internet Eyes – may be regarded as distastefully intrusive, but I do wonder whether it’s the start of a whole range of ‘mechanical turk’ type activities where people are encouraged to act as high-tech lace-curtain twitchers.  That past is not looking as dead anymore.

Are you feeling spied on yet?  If not, I’m sure you soon will be.

Death of a celebrity

This weekend the singer Stephen Gately died at his residence in Majorca.  At the time of writing, the cause of death is unknown but suicide,  foul play and drugs abuse are not being suggested.  I was provoked in to making this post by the reaction to the death that I noticed from various friends and acquaintances who took teh death quite hard but who also commented on the ‘gallows humour’ and apparent indifference of people to the fellow’s passing.

Mr Gately was clearly well loved by friends, family and fans.  I have to say that he meant little to me – a passing aquaintance with his name on the news – but unfortunately those who live as celebs must die as celebs, and part of that is the sick jokes marking their passing.  Since the widespread uptake of email, and especially since the web, this sort of humour has followed celebrity death as quickly and inexorably as paparazzi photographers and ambulance chasing lawyers.  Before electronic media, one at least had to wait for the jokes to appear in the newspapers / magazines or be passed from people who’d heard them from a friend who in turn heard them from a guy who knew the gardener of the dead celeb.

It’s rarely anything personal – it’s a coping mechanism, perhaps some of the milder jokes even provide the 21st Century version of marking the death of someone by printing the borders of the newspapers in black.  As some of you will know I was Admin on Sheffield Forum for a couple of years.  How to handle posted ‘dead person humour’ was an ongoing problem.  I used to apply the rule of 24 – within the first 24 hours it’s not nice – after that, it happens.  It may not be nice but it’s a byproduct of being in the celebrity food chain.  When you stop swimming in the media seas, your body sinks and the local bottom dwellers come and dismember the body, so to say….

One comment made stuck with me; imagine going to bed at 33 years old and not waking up.  When I was a kid I lost a friend who died at age 11.  As a younger man I lost a friend who died at 21.  Every morning in the developing world people in their 30s don’t wake up because they’ve died in the night of malnutrition, AIDS, Malaria, Cholera.  At the risk of sounding callous, I’m afraid that death is not the preserve of the poor, the sick, the elderley and the nobodies in the world.  It’s pretty Catholic in it’s tastes and can strike out at anyone – not just people who immediately surround us, and those of our modern pantheon of celebrities that our media choose to inform us are worthy of dying publically.  Don’t get me wrong; I’m not hypocritical enough to comment that I feel the death of total strangers in the developing world at all in my life – I don’t – but neither am I willing to go to serious grief over a celebrity who I didn’t know from Adam and who doesn’t even know I personally exist, except as part of a demographic.

I’m willing to admit to being sad at the deaths of three celebs in particular – John Peel, Joe Strummer and Johnny Cash.  I grew up with their music playing an important part of my life to varying degrees, so can empathise with people who’ve felt the loss of Mr Gatley as a figure in their musical upbringing – and especially those who’ve actually met the fellow.  Whilst we can all reflect on John Donne’s words about ‘ask not for whom the bell tolls, it tolls for you’  it’s worth also reflecting on whether your feelings are genuinely inspired by the death, or inspired by the media scrum surrounding the death suggesting how we should feel.

Meanwhile, back in June…..

Originally a Facebook Note, June 8th 2009, after the EU Elections….

The problem with democracy is that sometimes it allows people to vote for folks that you personally don’t want to gain any sort of power. Unfortunatley, that’s democracy for you. She can be a total bitch. FWIW, I voted for the Socialist Party inspired ‘No2EU – yes to democracy’.

There is an old latin saying – Ut sit magna, tamen certe lenta ira deorum est – the wrath of the gods may be great, but it is slow – that we can perhaps borrow and replace gods with people. To everyone saying how disgusted they are with their regions, their countrymen, stating people are idiots, etc. I ask them to think about the following.

From 2003 (beginning of the Iraq war) through to this weekend, the major parties in the UK have singularly managed to ignore or disregard the concerns and criticisms of voters. The Government has thundered on, ignoring calls for inquiries on numerous issues, ignoring the fears and concerns of voters on a number of issues, whilst keeping their “heads down in the pig bin, saying ‘Keep on digging'”, in the words of Pink Floyd. I’ve lost count of the number of people and groups who’ve asked the Government to reconsider their policies on things like immigration, ID cards and personal privacy, civil rights and freedom of expression in the UK. Phillip Pullman put it better than I can… http://www.joep.communityhost.org.uk/?p=71

We’ve recently had the financial debacle and then the site of MPs ignoring repeated requests for over 18 months to release details of their expenses.

And people wonder why voters voted the way they did? After the Government and major parties have acted with such hubris and contempt?

To be honest, we’re lucky this morning that we don’t have a handful of BNP and far-right groups holding seats all over Europe.

Voters used the only power left to them to get the attention of their leaders – the one thing that, as yet, New Labour haven’t removed from us. The power of the ballot. And when people used it they no doubt considered how they’d been ignored, taken for granted, treated as idiots and generally regarded as sheep who would quite happily vote for the famous ‘rosette on a dog’ representing the major parties.

Guess what – they didn’t. They said ‘Listen. We will not go this way again with you. You repeatedly ignore our concerns. You treat as as children and with contempt. Listen. We’re going to take the one course of action that will get your attention. We will vote for the parties that you and all those with a vested interest in the current system don’t want us to vote for’.

And that’s what they’ve done. Vox populi – the voise of the people. Keep on ignoring that voice – so apparently quiet in Westminster and Islington and the in inner circles of New Labour and the other major parties – and this will keep happening.

My final question – what are WE going to do about it? Many of you will know I’m a Libertarian – I believe in small government, and maximum involvement of the people in that governance. Wearing badges and shouting slogans and signing petitions is not enough. Wherever you live there are going to be issues in YOUR community that need tackling – social and political issues that left a mess for long enough will provide more grist to the mill of those on the extreme right and left who want to remove freedoms from us all.

Get out there and start fixing YOUR community and YOUR society. Listen to the people who’ve said ‘Enough’s enough.’ Work with them to address those issues that they’re concerned about and maybe, just maybe, we can collectively remind all politicians that they’re there because we give them the permission to be there.

If you need to blame anyone, need to feel ashamed or disgusted with anyone – just look to Westminster. Briefly – don’t dwell on it. Then look back to wherever you are and start fixing this mess.

Wolfram Alpha – too early released or over-hyped?

In case you’re saying, “Wolfram what?”, here’s a little reading:

http://www.wolframalpha.com/

http://www.bbc.co.uk/blogs/technology/2009/05/does_wolfram_work.html

http://news.bbc.co.uk/1/hi/technology/8052798.stm

http://www.guardian.co.uk/news/blog/2009/may/18/wolfram-review-test-google-search

http://www.theregister.co.uk/2009/05/19/dziuba_wolfram/

http://www.theregister.co.uk/2009/03/17/wolfram_alpha/

http://www.theregister.co.uk/2009/05/18/wolfram_alpha/

 

OK – I’ll start by announcing a vested interest here.  I occasionally write software that attempts to make sense out of straight English questions and phrases, and then by cunning trickery makes the response from the program appear ‘sensible’ as well.  So I know something about how to make software appear smarter than it actually is.  And I’m afraid that at first glance I regard Wolfram Alpha as over-hyped, under-delivering and pretty much unsure of it’s position in the world.

But, the folks at Wolfram Research score highly for getting the coverage they’ve managed!

WA is described as a Computational Knowledge Engine, rather than a search engine.  However, it’s raison d’etre is to answer questions, and nowadays any piece of software on the internet that does that is always going to be regarded by users as some sort of search engine, and the ‘Gold Standard’ against which all search engines tend to be judged is Google.  So, first question…

Is it fair to compare WA and Google?

Not really, and Wolfram himself acknowledges this.  WA is regarded by the company as a means of getting information out of teh raw data to be found on the Web, and it does this by having what’s called ‘curated’ data – that is, Wolfram’s team manage sources used for the data and also the rpesentation of the data.  This makes it very good at returning solid factual and mathematically oriented data in a human readable form. 

Whereas Google will return you a list of pages that may be useful, WA will return data structured in to a useful looking page of facts – no links, just the facts.  And a list of sources used to derive the infromation. The results displayed are said to be ‘computed’ by Wolfram Research, rather than just listed as is the case of a search engine.

Is it a dead end?

WA relies on curated data – that is, a massaging and manipulation process to get the existing web data in to a format that is searchable by the WA algorithms and that is then also presentable in a suitable fomat for review.  This is likely to be a relatively labour intensive process.  Let’s see why…

In a perfect world, all web data would be tagged with ‘semantic tagging’ – basically additional information that allows the meaning of a web page to be more explicitly obvious.  Google, for all it’s cleverness, doesn’t have any idea about the meaning of web page content – just how well or poorly it’s connected to other web pages and what words and phrases appear withjin the page.  They do apply a bit of ‘secret sauce’ to attempt to get teh results o your search closer to what you really want, assuming you want roughly the same as others who’ve searched the Google search space for the same thing.  Semantic tagging would allow a suitably written search engine to start building relationships between web pages based on real meaning.  Now, you might just see the start of a problem here…..

If a machine can’t derive meaning from a web page, then the Semantic tagging is going to have to be human driven.  So for such a tool to be useful we need to have some way of ensuring as much web data as possible would be tagged.  Or, start from tomorrow and say that every new page should be tagged, and write off the previous decade of web content.  You see the problem.

What the WA team have done is taken a set of data from the web, and massaged and standardised it in to a format that their software can handle, then front-ended this system with a piece of software that makes a good stab at natural langauge processing to get the meaning of your question out of your phrase.  For example, typing in ‘Compare the weather in the UK and USA’ might cause the system to assume that you want comparative weather statistics for those two countries.  (BTW – it doesn’t, more on this later)

The bottom line here is that the data set has had to be manually created – something that is clearly not posisble on a regular basis.  And a similar process would ahve to be carried out to get things semantically tagged.  And if we COULD come up with a piece of sofwtare that could do the semantic analysis of any piece of text on the web, then neither of tehse approaches would be needed anyway.

In a way, WA is a clever sleight of hand; but ultimately it’s a dead end that could potentially swallow up a lot of valuable effort.

Is it any good?

The million dollar question.  Back to my ‘Compare the weather in the UK and US’ question.  the reason I picked this was that WA is supposed to have a front end capable of some understanding of the question, and weather data is amongst the curated data set.  I got a Wolfram|Alpha isn’t sure what to do with your input. response.  So, I simplified and gave WA : “Compare rainfall london washington” – same response.  I then went to Google and entered the same search.  And at the bottom of Page 1 found a link : http://www.skyscrapercity.com/showthread.php?t=349393 that had the figures of interest.  Now, and before anyone starts on me, I appreciate that the data that would have been provided by WA would have been checked and so would be accurate.  But I deliberately put a question to WA that I expected it should be able to answer if it was living up to the hype.

I then gave WA ‘rainfall london’ as a search and got some general information (not a lot) about London.  Giving ‘rainfall london’ to Google and found links to little graphs coming out of my ears.  A similar search on rainfall washington to Google gave me similar links to data on Washington rainfall.

WA failed the test, I’m afraid. 

Will it get better?

The smartness of any search tool depends upon the data and the algorithms.  As we’re relying on curated data here, then improvements might come through modifications to data, but that might require considerable effort.  If the algorithms are ‘adaptive’ – i.e. they can learn whether answers they gave were good or bad – then there might be hope.  This would rely on a feedback mechanism from searchers to the sofwtare, basically saying ‘Yes’ or ‘No’.  If the algorithms have to be hand crafted – improvement is likely BUT there is the risk of over-fitting the algorithms to suit the questions that people have asked – not the general searching of what MAY be asked.

And time passes…

As it turned out, this post never moved from ‘Draft’ to ‘Published’ because of that thing called ‘Life’.  So, a month or two have passed, and I’ve decided to return to Wolfram Alpha and see what’s changed….

Given the current interest in the band Boyzone, I did a quick search.  WA pointed me to a Wiki entry – good – but nothing else.  Google pointed me to stacks of stuff.  Average rainfall in London got me some useful information about rainfall in the last week.  OK….back to one of my original questions ‘Compare rainfall London Washington’ – this time I got the London data with the Washington equivalent on it as well – sort of what I wanted.  Google was less helpful this time than back when I wrote this piece.

So…am I more impressed?  Maybe a little.  Do I feel it’s a dead end?  Probably, yes, except in very specific areas taht might already be served by things like Google and Wiki anyway.

Do I have an alternative solution for the problem?

If I did, do you think I’d blog it here and expose myself to all that criticism? 🙂