Are you a ‘but’ man?

sergeantbilkoI was reminded earlier today, whilst reading a book called ‘Life 101’,  of a useful piece of advice from one of the more under-rated personal development gurus of the mid 20th Century – Sergeant Ernest Bilko of the United States Army.  Let’s listen to what he has to say on the topic of a three letter word…

You said, “but.” I’ve put my finger on the whole trouble. You’re a “but” man. Don’t say, “but.” That little word “but” is the difference between success and failure. Henry Ford said, “I’m going to invent the automobile,” and Arthur T. Flanken said, “But . . .”

And so it was, according to Bilko, that Ford remains in history whilst Flanken doesn’t even make the footnotes.

‘But’ is indeed one of the words in the English language that fills me with trepidation.  During my years in consulting, hearing someone agree with what you were proposing, and then adding the word ‘…but’ (complete with pause) to the end of a sentence was the equivalent of telling me that I was as likely to get cooperation as I was to win the Nobel Prize for Physics and Literature in the same year.

There some occasions when it’s valuable to pull someone up short before they thunder off and implement some plan or other that at best can be described as ‘unwise’.  And there are times when the use of but can provide a useful reminder for folks that their master plan requires a few tweaks before it will work properly.  But often ‘but’ is used as a prelude to a road-block.

Rather than ‘but’ I now try and use ‘and’ or ‘or’ instead of ‘but’ – then rephrase the part of teh sentence after the old ‘but’ to look towards solutions.  For example:

I’d like to buy a new computer, but it costs too much.

becomes

I’d like to buy a new computer, and in order to give me time to save the extra money, I’ll put the purchase off for a month and see if I can do some overtime in the meantime to help raise the extra cash.

The first sentence becomes, in the but-less second sentence, an intention with a timescale and a partial solution to the problem of money.  As the guys at Honda say, ‘and’ is a great little word – it opens up opportunities for solutions, rather than closing things down.

Don’t be a but-nik!

The Brass Neck of Ordnance Survey

739px-Iberian_Peninsula_antique_mapOne of my interests is in GIS systems – Geographical Information Systems – and other aspects of computerised and online mapping.  Thanks to Googlemaps, it’s been possible for developers to create map-driven applications for nothing – Google allows access to their mapping infrastructure free for many applications, and it’s brilliant.  To anyone who hasn’t taken a look or had a play, have a look at Google Maps and for you programming types out there, take a look at the Google Maps API.

Now, what really peeves me as a UK citizen is that our own Ordnance Survey – the folks who make maps – haven’t got any facility for getting hold of mapping data free of charge.  I am aware of a rather scrappy ‘trial set’ of data that is available for use with GIS systems, but honestly – the OS was traditionally funded by the UK Government and it is only in recent years that it has been spun off.  It should not be beyond the capabilities of the current Government – who’ve always whined about innovation and creativity being a driving force of British business – and the OS to make available a system similar to the Google Maps one using UK Centric OS data, at negligible cost to software developers and end users, to actually make it easier for the development of geographically based applications on the Web, on the Mobile Internet and on our desktops.

But it hasn’t happened yet.  And this morning I find out about the ‘Geovation’ project – a project to attempt to generate innovative ideas based on the use of geographical data and concepts.  Hey, it’s supported by the OS!  I can see nothing on the site that suggests that there’s any OS data available to play with – indeed I think the only data set mentioned is Google Maps!

To be honest, this is shaping up to be an astonishing lost opportunity for the Ordnance Survey – they could have leveraged this project by making data or even some sort of API available at a reasonable cost for small businesses  or zero cost for non-commercial development and research.  It doesn’t look like it’s going to happen – I get the impression they’re going to lurk around picking up good ideas from people and then take them back and see what money they can make from them.

I may be wrong on all counts – I genuinely and sincerely hope I am, and that there is a nice, cheap, API and full UK dataset out there waiting to support companies and individuals looking at the Geovation Challenge.  Why do I think there isn’t, though?

Wolfram Alpha – how not to make friends and influence people!

Hmmm…this is becoming WA corner recently – take a look at my previous piece here.  I was less than impressed with the technology and considered it either over-hyped or released too early in to the world.  However, I did hope that as time progressed there might be improvements in the results set returned and, more importantly from a developer point of view, an API published that would allow developers to build new applications to stretch and maybe improve WA.

So, this week an API was announced for Wolfram Alpha on the company’s Blog and I was pretty excited about the prospect of trying out a few things.  Despite my grumbles about the results returned, I was hopeful that with a suitable API encouraging third party developments, the underlying technology and data sets at WA might see an improvement.  My hopes survived for as long as it took me to start reading the small print – in particular this little document, the price list.  Now, I’m aware that WA has cost money to develop but to charge for developers to make use of teh API seems to be one of the dumbest and most counter-productive things they could do.  There are some ‘pioneer grants’ available for the developers, but I get the impression that these are still likely to involve shelling out money.

Google do not charge developers for use of the API until you start using the API in ‘closed’ systems and with a large number of calls.  They certainly don’t charge you during the development cycle – they have more sense.

Now, let’s assume I wanted to develop an API based application for WA – what we in the trade call a ‘proof of concept’ model – i.e. something that proves whether or not the bright idea that we sketched out on the back of a beer-mat in the pub will actually work.  How many requests might I get through to develop such an application?  Well, the other day I wrote some code to retrieve data from a Postcode / Geocode system’s API.  Now, this was a VERY simple application – send a Postcode, retrieve a list of addresses, send a code number, retrieve a full street address with map reference.  Let’s say 2 calls to the remote API for something very straight forward.  During code development and ‘in house’ testing I made about 30 or 40 API calls.  Now, during more formal testing on the client site that’s going to increase somewhat – probably in to the low-hundreds.  And this is for a problem with a well defined structure, with a finite returnable answer set – i.e lists of addresses, a single address or nothing at all, all in a set, predictable format. 

By the very nature of the sort of problem that WA has been set up to deal with, the problems passed up via an API are unlikley to be as well defined and the results set returned is also unlikely to be as simple to deal with as my addresses.  When I did some API work with Google for a client I found I was generating hundreds of API calls and responses during development, let alone testing.  For WA, I’m looking at $60 for 1000 API requests, and $0.08 for each additional request beyond the thousand I initially pay for.  Obviously, I can buy a bigger bundle, but the inference is clear – it ain’t gonnna be cheap developing for the WA API. 

API developments typically involve a learning curve for the API syntax and methods of use.  This is par for the course and to be expected.  However, when the API is interfacing to a curated data set like WA, we have an additional problem of whether the data set will actually contain the sort of data that we’re wanting to get back.  And whether it will be available in the sort of format we’re interested in.  And whether the curated data is timely compared to the data that is being made available through non-curated data sets like those available via Google – or other APIs, for that matter.  Clearly, if your problem space IS covered by WA and the data set WA has available contains what you want in the format in which you want it, then perhaps the API fee is worthwhile.  But for those developers wanting to try something new out, they’re most likely to look to free APIs to test their ideas, and spend time and energy working the wrinkles out in an environment that isn’t costing them pennies for the simplest query.

I’m afraid WA have dropped the ball big time here; by charging for ALL development use of the API they’ve alienated a large source of free development and critical expertise.  Look at how Google has benefited from the sheer number of developers doing ‘stuff’ with their various APIs.  Can you imagine that happening had they charged all the way?  Hardly likely. 

If WA were to make a limited  ‘sandbox’ set of data available for developers via a free of charge API, that would at least allow the developers to get the wrinkles out of their code.  The company could then charge for use of the ‘live’ WA datasets, and would have the additional advantage of the code being run against the live system being reasonably bug free.  By charging from the first line of code written, they’re restricting the development of their own product and driving people in to the arms of Google, Amazon, Bing and the like.  WA doesn’t appear to be offering a lot that is truly revolutionary; so-so natural language query interface against a curated data set.  I doubt it will be long before third party developers start producing the same from Google.

Wolfram Alpha – too early released or over-hyped?

In case you’re saying, “Wolfram what?”, here’s a little reading:

http://www.wolframalpha.com/

http://www.bbc.co.uk/blogs/technology/2009/05/does_wolfram_work.html

http://news.bbc.co.uk/1/hi/technology/8052798.stm

http://www.guardian.co.uk/news/blog/2009/may/18/wolfram-review-test-google-search

http://www.theregister.co.uk/2009/05/19/dziuba_wolfram/

http://www.theregister.co.uk/2009/03/17/wolfram_alpha/

http://www.theregister.co.uk/2009/05/18/wolfram_alpha/

 

OK – I’ll start by announcing a vested interest here.  I occasionally write software that attempts to make sense out of straight English questions and phrases, and then by cunning trickery makes the response from the program appear ‘sensible’ as well.  So I know something about how to make software appear smarter than it actually is.  And I’m afraid that at first glance I regard Wolfram Alpha as over-hyped, under-delivering and pretty much unsure of it’s position in the world.

But, the folks at Wolfram Research score highly for getting the coverage they’ve managed!

WA is described as a Computational Knowledge Engine, rather than a search engine.  However, it’s raison d’etre is to answer questions, and nowadays any piece of software on the internet that does that is always going to be regarded by users as some sort of search engine, and the ‘Gold Standard’ against which all search engines tend to be judged is Google.  So, first question…

Is it fair to compare WA and Google?

Not really, and Wolfram himself acknowledges this.  WA is regarded by the company as a means of getting information out of teh raw data to be found on the Web, and it does this by having what’s called ‘curated’ data – that is, Wolfram’s team manage sources used for the data and also the rpesentation of the data.  This makes it very good at returning solid factual and mathematically oriented data in a human readable form. 

Whereas Google will return you a list of pages that may be useful, WA will return data structured in to a useful looking page of facts – no links, just the facts.  And a list of sources used to derive the infromation. The results displayed are said to be ‘computed’ by Wolfram Research, rather than just listed as is the case of a search engine.

Is it a dead end?

WA relies on curated data – that is, a massaging and manipulation process to get the existing web data in to a format that is searchable by the WA algorithms and that is then also presentable in a suitable fomat for review.  This is likely to be a relatively labour intensive process.  Let’s see why…

In a perfect world, all web data would be tagged with ‘semantic tagging’ – basically additional information that allows the meaning of a web page to be more explicitly obvious.  Google, for all it’s cleverness, doesn’t have any idea about the meaning of web page content – just how well or poorly it’s connected to other web pages and what words and phrases appear withjin the page.  They do apply a bit of ‘secret sauce’ to attempt to get teh results o your search closer to what you really want, assuming you want roughly the same as others who’ve searched the Google search space for the same thing.  Semantic tagging would allow a suitably written search engine to start building relationships between web pages based on real meaning.  Now, you might just see the start of a problem here…..

If a machine can’t derive meaning from a web page, then the Semantic tagging is going to have to be human driven.  So for such a tool to be useful we need to have some way of ensuring as much web data as possible would be tagged.  Or, start from tomorrow and say that every new page should be tagged, and write off the previous decade of web content.  You see the problem.

What the WA team have done is taken a set of data from the web, and massaged and standardised it in to a format that their software can handle, then front-ended this system with a piece of software that makes a good stab at natural langauge processing to get the meaning of your question out of your phrase.  For example, typing in ‘Compare the weather in the UK and USA’ might cause the system to assume that you want comparative weather statistics for those two countries.  (BTW – it doesn’t, more on this later)

The bottom line here is that the data set has had to be manually created – something that is clearly not posisble on a regular basis.  And a similar process would ahve to be carried out to get things semantically tagged.  And if we COULD come up with a piece of sofwtare that could do the semantic analysis of any piece of text on the web, then neither of tehse approaches would be needed anyway.

In a way, WA is a clever sleight of hand; but ultimately it’s a dead end that could potentially swallow up a lot of valuable effort.

Is it any good?

The million dollar question.  Back to my ‘Compare the weather in the UK and US’ question.  the reason I picked this was that WA is supposed to have a front end capable of some understanding of the question, and weather data is amongst the curated data set.  I got a Wolfram|Alpha isn’t sure what to do with your input. response.  So, I simplified and gave WA : “Compare rainfall london washington” – same response.  I then went to Google and entered the same search.  And at the bottom of Page 1 found a link : http://www.skyscrapercity.com/showthread.php?t=349393 that had the figures of interest.  Now, and before anyone starts on me, I appreciate that the data that would have been provided by WA would have been checked and so would be accurate.  But I deliberately put a question to WA that I expected it should be able to answer if it was living up to the hype.

I then gave WA ‘rainfall london’ as a search and got some general information (not a lot) about London.  Giving ‘rainfall london’ to Google and found links to little graphs coming out of my ears.  A similar search on rainfall washington to Google gave me similar links to data on Washington rainfall.

WA failed the test, I’m afraid. 

Will it get better?

The smartness of any search tool depends upon the data and the algorithms.  As we’re relying on curated data here, then improvements might come through modifications to data, but that might require considerable effort.  If the algorithms are ‘adaptive’ – i.e. they can learn whether answers they gave were good or bad – then there might be hope.  This would rely on a feedback mechanism from searchers to the sofwtare, basically saying ‘Yes’ or ‘No’.  If the algorithms have to be hand crafted – improvement is likely BUT there is the risk of over-fitting the algorithms to suit the questions that people have asked – not the general searching of what MAY be asked.

And time passes…

As it turned out, this post never moved from ‘Draft’ to ‘Published’ because of that thing called ‘Life’.  So, a month or two have passed, and I’ve decided to return to Wolfram Alpha and see what’s changed….

Given the current interest in the band Boyzone, I did a quick search.  WA pointed me to a Wiki entry – good – but nothing else.  Google pointed me to stacks of stuff.  Average rainfall in London got me some useful information about rainfall in the last week.  OK….back to one of my original questions ‘Compare rainfall London Washington’ – this time I got the London data with the Washington equivalent on it as well – sort of what I wanted.  Google was less helpful this time than back when I wrote this piece.

So…am I more impressed?  Maybe a little.  Do I feel it’s a dead end?  Probably, yes, except in very specific areas taht might already be served by things like Google and Wiki anyway.

Do I have an alternative solution for the problem?

If I did, do you think I’d blog it here and expose myself to all that criticism? 🙂

You pays peanuts…..

And you get monkeys.

I assume most of us have heard this phrase. It’s become almost a mantra with me in my professional life because the last 6 months have exposed me to an interesting aspect of the freelance world that I’ve not been aware of until now; the fact that there are a Hell of a lot of people out there expecting a lot of work for next to nothing!

Allow me to elaborate…I get most of my work through ‘word of mouth’ – this has always been the way and after 20 odd years in IT it seems to have worked well. But I still like to chase the odd new client – after all, nothing wilts faster than laurels that have been sat on, as they say. In many ways, the availability of Internet web sites that allow people wishing work to be done to advertise their requireents for people like me to pick up the jobs should have ade things easier, but it hasn’t.

In fact, I’m beginning to regard such sites as one of the worst things that has happened to ‘professional’ freelancers and contractors, because they have totally distorted the market. Don’t get me wrong; I’m a firm believer in market forces but these sites are actually pushing the markets for freelance development work to the brink of extinction. And this isn’t going to be a rant about out-sourcing…

My concern is that people are posting requests for work like the following:

“Develop a highly interactive and very aesthetic media review website. A good example is Yahoo! TV. The site is going to cater for commercial considerations i.e web ads. Want a site that would load fast as well.
Hence, beautiful but efficient. Must do the job. “

This is a real advert, tweaked for punctuation and spelling in two places.  Now – this isn’t a hobby site, it’s not a charity.  The poster is open in that there will be advertising and will be catering for ‘commercial considerations’.  That’s the full ‘job brief’ against which people are expected to bid, by the way.  Now, let’s assume that we can put something together like the Yahoo TV site – here and ignore the content and imagery side of things for now.  It’s got forums, photo galleries, all sorts of cute stuff.  I wouldn’t even want to try tackling it – a wise man knows his limitations, after all.  But I can guess the sort of development time – you’re looking at the minimum of 2-3 man-months here, I’d estimate.  

And the suggested budget?  £250.  Yes, Two Hundred and Fifty Pounds.  No missing zeroes.

I cannot imagine the most desperate out sourcer being willing to work for that sort of money, let alone a programmer in the UK, US or Europe.

Oddly enough I came across this today:

http://technology.timesonline.co.uk/tol/news/tech_and_web/the_web/article5483244.ece?token=null&offset=0&page=1

An article in the Times dealing with Amazon’s Turk’ project which harnesses the available time of people to do online jobs of various sorts.  Where you might be expected to work for a couple of pence an hour, if that.

Digital exploitation?  You betcha.  There are projects that rely on the good nature of people to get things done – projects where the bottom line is a better, publically and freely available service, rather than profits to corporations who can already dictate terms to much of the online world.

Some years ago I was involved in film making and there was a very rich culture of ‘No-budget’ filming, where productions were put together with no budget except for the essentials of film stock or tape – everything else was borrowed, begged or blagged.  But part of the contract was that anyone involved would get a copy of the material for their own portfolio and an on-screen credit – ‘Credit and VHS’ – as well as being fed and watered on set.  This model could, of course, be exploited but rarely was, because the world of film making was relatively insular and someone pulling a fast one would immediately find it difficult to crew-up next time around.

Perhaps we need to start being similarly watchful in the information marketplace?