ASP.NET and Oracle – how to stay sane!

I’m currently doing some development work using ASP.NET against an Oracle database.  I have to say that I’ve had more frustrating development experiences, but most of those involved mainframe computers or…oh yes….Visual BASIC 6.0 against Oracle.  Just what is it about Oracle and Microsoft?  Gah!

Anyway – rant over.  In this piece I’d like to share a few useful tips for developing with ASP.NET and Oracle if you’re used to developing with ASP.NET and SQL Server.  There’s nothing magic here, and I’m no expert, but hopefully these pointers might assist anyone else in the position that I’ve found myself in!

Identity Fields

One thing that looks missing from Oracle in the first instance is the ‘Identity’ field that is often used a Primary key field in SQL Server.  It IS possible to implement this in Oracle – one has to use what’s called a ‘Sequence’ and either include a trigger on the ID field of the table to give you the sequence number added automatically or remember to add it via the INSERT command:

CREATE SEQUENCE table_seq
    MINVALUE 1
    MAXVALUE 999999999999999999999999999
    START WITH 1
    INCREMENT BY 1
    CACHE 20;

This generates a sequence called table_seq, starting at 1, incrementing by 1 each time, and going up to a VERY large number!  The CACHE 20 line tells Oracle to generate a cache of 20 values from the sequence.  To use this sequence after creation, you can access it via an INSERT command as follows:

INSERT INTO datatable
(id, name)
VALUES
(table_seq.nextval, 'Joe Pritchard');

the ‘id’ field is the PK field of the table, and the table_seq.nextval gets the next value from the sequence.  To create a truly ‘auto incrementing’ PK field, you create a trigger on the table:

create trigger datatable_trigger
before insert on datatable
for each row
begin
select table_seq.nextval into :new.id from dual;
end;

Run this and then you can add a new row to the table without specifying the id field:

INSERT INTO datatable
(name)
VALUES
( 'Joe Pritchard');

Boolean Fields

Oracle doesn’t support them.  the best approach I’ve found is to have an integer field and treat 0 as false and 1 as true.  This then works well with ASP.NET checkboxes.  For example:

<asp:TemplateField HeaderText=”Is Admin.” SortExpression=”IsAdministrator” > 

<ItemTemplate > <asp:CheckBox runat=”server” ID=”IsAdministrator” Text=’<%# Bind(“IsAdministrator”) %> Checked=’<%# Bind(“IsAdministrator”) %> />  

</ItemTemplate>
 </asp:TemplateField>

Don’t forget the provider Name

When setting up a SQLDataSource control, don’t forget to specify the provide Name in the ConnectionString.  If you do, the error message obtained is not exactly meaningful at first glance, referring as it does to Unicode!  

 

<asp:SqlDataSource ID=”SqlDataSource1″ runat=”server” ConnectionString=”<%$ ConnectionStrings:ConnectionString %> ProviderName=”<%$ ConnectionStrings:ConnectionString.ProviderName %>

This also requires you to specify the Provider in the ConnectionString:

 < add name=ConnectionString connectionString=Data Source=www.myserver.co.uk;User ID=jp;Password=test;Unicode=True providerName=System.Data.OracleClient/>

Watch table and field name lengths

This can be extremely frustrating.  And I mean extremely!  If you are likely to find yourself explicitly specifying the table name and the field name in a SELECT statement, for example, then the combained length MUST NOT exceed 30 character (this includes the ‘.’ separating table and field – so keep table and field names as short as is practicable.

Quote marks around table and field names

When putting SQL statements together for use by SQLDataSource or other ASP.NET controls that use the OracleClient provider, don’t forget to surround the Oracle field and table names with quotation marks:

SELECT "id", "name" FROM "names"

or

SELECT "names"."id", "names"."name" FROM "names"

Parameter Handling

If you are using Parameters with a SQLDataSource control, don’t forget that the OracleClient uses a colon instead of the ‘@’ sign:

DeleteCommand=’DELETE FROM “moad_agrippa_users” WHERE “UserID” = :UserID’

The other thing to note is that the parameter does not require quotation marks around it.

I hope this piece has been useful – it will act as an aide-memoire for me the next time I come back to work on Oracle / ASP.NET sites!

Exclusion 2.0 – is daft jargon necessary?

turtleI just came across this on my Twitter feed – a reference to a ” ‘Future of the web’ Turtle” at Open 09.  Yup – a turtle.  After some Googling about and learning more than I ever wanted to know about our green, aquatic co-travellers on Planet earth, I eventually went to the Open 09 web site where I found the following:

“And in the true spirit of social media, the content of the sessions will be decided by the delegates contributing to what will happen on the day via the OPEN 09 blogs. The blogs are the virtual spaces where the themes for sessions – we’re calling them ‘Turtles’ – will be debated and decided. We’ll be adding more Turtles that focus on particular areas of the creative industries.”

Ahhh…that explained it.  A blog for a session / seminar.  Cool.  So why call them turtles?  This seems to be an increasing habit amongst the more bleeding edge practitioners of web development to create a new (and often meaningless) lexicon to describe what they do.

Sorry, guys, but this is the sort of meaningless jargonny media-waffle that just produces an exclusive air around a lot of these sorts of events.   My own impression is that the same people attend the round of conferences and seminars, chucking ideas around, hatching turtles, but rarely communicating what the Hell is actually happening to the rest of the world.

I earn my crust through web and software development.  As I said to a potential client / colleague yesterday – I’m a ‘meat and potatoes’ sort of developer.  My clients expect me to deliver reliable, working systems within budget that add value to their business.  For many businesses, Social Media is still something that swallows up their bandwidth rather than adds to the bottom line, and I’m not sure that this sort of jargon helps us get any sort of message across.

My view of jargon is that it’s used by people of a shared culture to reduce the amount of communication necessary to get a particular concept over to their co-practitioners in an agreed form.  This fad simply makes it looks like we’re trying to keep these sorts of events as ‘parties for the cool kids only’ and that cannot be good. 

Or that we’re trying to hide the fact we have nothing relevant to give businesses – which is even worse.

I was a twit not to Tweet!

twitter-logoMany moons ago I posted a piece on here – ‘Am I a twit not to twitter’.  Well, I’ll admit it.  Yes, I was a twit not to Tweet, and I’m happy to say that.  I can’t argue with objective facts, so here’s my brief thoughts on what converted me.  Just in case anyone wishes to follow me, I’m on twitter, funnily enough, asJoePritchard.  Serious lack of imagination there but no excuse for missing me! 

So, here are my hints and observations from a beginning Twit!  There are plenty of articles around with more detailed hints and tips of how to use Twitter, and I’m not going to re-hash what’s said elsewhere.  These observations are my personal thoughts and insights, for what they’re worth, as to how I found that Twitter could be useful.

 

Two Way Street

I think the first thing that I learned about twitter (or rather had it pointed out to me) was that it’s a two way street; if you want people to follow you you need to follow people, and that you need to have an idea of what you want to gain from Twitter.

Identify what you want

Apart from keeping up with your friends and colleagues, I’ve found Twitter invaluable for getting a good newsfeed from sites of interest.  In fact, I’ve found it a better proposition than RSS feeds.

Use a Twitter Client

When I first tried Twitter out, I used the Twitter web interface to use the Twitter service. It didn’t work well for me – so this time I decided to try out a couple of dedicated Twitter applications.  I have Twhirl and Tweetdeck installed and they’ve both made using Twitter on a regular basic much easier – I just leave them running quietly in the background, they dynamically update, and they make it a pleasure to Tweet.

Think of it as less intrusive MSN

I’ve actually used Twitter as a form of MSN with some people – it’s more spread out in time than a typical MSN conversation, more compact than Email and certainly doesn’t clutter my inbox with lots of short mails.

Use it for promotion

I’ve recently re-activated this Blog and integrated it with both Twitter and Facebook, and have been studying the referral logs to see where blog referrals are coming from.  There does appear to be a fair amount of traffic from Twitter.  A recent event I participated in – ActionForInvolvement’s Climatewalk – made significant use of Twitter in the run up to the event to promote it and encourage re-tweeting about the event.  Again, I gather that the results were well worthwhile!

If you need to, run multiple accounts

I was considering tweeting on behalf of my business from within my ‘personal’ Twitter account but I’ve decided to set up a separate account for the business.  The reason?  People following my business may not be very interested at all in everything else I do.  Let’s call it ‘brand protection’ – I want my business brand and my ‘JoePritchard’ brand to be different entities online.  Whilst folks who know me will know that I run ’em both, the separation will be useful for business connections who I really don’t want in my personal life – and vice versa!

Be picky in following and blocking

Spam has certainly increased on Twitter.  When someone follows me, I’ve got Twitter configured to mail me.  I always go and check out their profile, and then determine first of all whether to block or not.  Folks who look like spammers always get reported; if someone seems to be mainly pedalling MLM or just looks ‘dodgy’ in terms of their content or places linked to – again, block ’em.  I can’t understand why American High School kids of either sex can think that I can be interested in reports of their weekends drinking or shopping and don’t bother completing any parts of their profile  – sorry guys, you get blocked.  I know this sounds arrogant of me, but I want followers who know me or who are interested in what I say or consider that I somehow add value for them.  If you are a US High School kid who IS interested in what I say, then let me know – but have something of interest to me on your profile, somewhere!  In return, when I follow, I want to be following people that I know, am interested in or who add value to my online life by introducing me to new stuff or ideas.  Twitter does seem to encourage the ‘numbers game’ in people.  I prefer quality.

And that’s that – I’m going to start using Twitter Lists shortly and will let you know how I get on.  And then there’s the API stuff….watch this space.

The Brass Neck of Ordnance Survey

739px-Iberian_Peninsula_antique_mapOne of my interests is in GIS systems – Geographical Information Systems – and other aspects of computerised and online mapping.  Thanks to Googlemaps, it’s been possible for developers to create map-driven applications for nothing – Google allows access to their mapping infrastructure free for many applications, and it’s brilliant.  To anyone who hasn’t taken a look or had a play, have a look at Google Maps and for you programming types out there, take a look at the Google Maps API.

Now, what really peeves me as a UK citizen is that our own Ordnance Survey – the folks who make maps – haven’t got any facility for getting hold of mapping data free of charge.  I am aware of a rather scrappy ‘trial set’ of data that is available for use with GIS systems, but honestly – the OS was traditionally funded by the UK Government and it is only in recent years that it has been spun off.  It should not be beyond the capabilities of the current Government – who’ve always whined about innovation and creativity being a driving force of British business – and the OS to make available a system similar to the Google Maps one using UK Centric OS data, at negligible cost to software developers and end users, to actually make it easier for the development of geographically based applications on the Web, on the Mobile Internet and on our desktops.

But it hasn’t happened yet.  And this morning I find out about the ‘Geovation’ project – a project to attempt to generate innovative ideas based on the use of geographical data and concepts.  Hey, it’s supported by the OS!  I can see nothing on the site that suggests that there’s any OS data available to play with – indeed I think the only data set mentioned is Google Maps!

To be honest, this is shaping up to be an astonishing lost opportunity for the Ordnance Survey – they could have leveraged this project by making data or even some sort of API available at a reasonable cost for small businesses  or zero cost for non-commercial development and research.  It doesn’t look like it’s going to happen – I get the impression they’re going to lurk around picking up good ideas from people and then take them back and see what money they can make from them.

I may be wrong on all counts – I genuinely and sincerely hope I am, and that there is a nice, cheap, API and full UK dataset out there waiting to support companies and individuals looking at the Geovation Challenge.  Why do I think there isn’t, though?

Real Time Search – how important?

searchglassWell, both Microsoft and Google have stated that they’re adding the capability to search Twitter feeds in real-time to their search engines.   What does this mean to us mere mortals who tweet and search?

The example that I’ve seen given about the usefulness of Real Time Search (RTS) is to do with skiing – not a topic close to my heart, or one which I know much about.  My knowledge stops at things strapped to your feet and the requirement for snow…  Anyway, the example given is that you Google your favourite ski resort and along side the nromal search results returned by Google, there would also be a number of relevant, recent Tweets, that could, for example, include information about current conditions on the slopes.  The Tweets will appear based on their content or, if the Tweeter has set their account up accordingly, the location from which the Tweet has been made (geocoded Tweet).  On a purely technical basis, this is quite something.  The hamsters powering Google’s server will be running around in their wheels like crazy…

There has been an add in available for a while for Firefox using Greasemonkey that does something similar, and the effect is pretty cool, although I’m yet to be convinced about the value of most Tweets in terms of conveying information meaningful to alot of people, except in a few sets of circumstances. 

As for the importance of this combination of Tweets and Search Engine results, it’s pretty early in the game to tell but I have my own concerns and thoughts on the issue that I’ll share here.  And then in a few months time I can come back and either pat myself on the back or quietly remove this post…

Privacy

A little while ago I published this item – ‘Google and The Dead Past’ in which I commented on the convergence of search technologies – Search Engine, Twitter and Facebook being three data sources – and expressed a fear that we might be moving very slowly towards a form of voluntary surveillance society, where our regular use of Social Networks  would result in much of our lives being available for review on search engines in near real-time if we weren’t careful.  Well, we now have Tweets being folded in to the Search mix; I assume that it won’t belong before Twitpics get included, and then if Facebook open up their API to facilitate searching,  my comments in that article are coming closer to reality!

Of course,  just as with standard Search Engine manegemnt on a website, it is posisble to exclude your tweets form this search.  Google have had a few gremlins with this, but they’re getting there, and it’s likely that, were they ever to join the party, Facebook would do the same thing. Whether people would avail themselves of these tools is another matter.

Relevance

Just how the search engine’s ranking system will be applied to Tweets is an inetersting question.  For example, Google’s Pagerank algorithm relies on many things, including links to a page, links from it, the nature of the links, etc.  as well as content.  This is simply not going to work on Tweets, so it’s safe to assume that some other form of relevance rating will be used.  And Bing will have something totally different – as will any other Search Engine involved in searching Tweets.  I am forced to wonder how relevant the results of Real Time Search will be.  Obviously it will improve with time, but so will the ability of spammers to game the system.

Perspective

Those of us old enough to remember the TV news reports of the Falklands War in 1982 would remember that events could happen in the South Atlantic a good few days before we saw it on the news.  By the time of the First Gulf War, CNN was reporting on events as they happened from it’s own reporters and within hours from the wider military theatre of operations.  By the Second Gulf War, in 2003, there were journalists embedded with infantry units carrying satellite phones and digital cameras and literally reporting on ongoing fire-fights.  It’s been said that the Falklands were reported from the point of view of the Government, the First Gulf War from the point of view of the generals and the Second Gulf War from the perspective of an infantry Platoon leader or tank commander.

The result is that whilst the Platoon Leader point of view gives us immediacy, it allows no time for contemplation of wider issues.  And the immediate perspective of one person in a large news event, for example, can give a very distorted view.  I very much expect that Tweets in search result could easily give rise to ‘firestorms’ of rumour that flare up and then get corrected within minutes.  What impact this will have on news gathering and the general emotional health of people doing searches on new stories – to be seeing a view of the world that is from the bottom up, changing every few minutes, I’m not sure.  Whilst this sort of immediate citizen journalism is great in theory I’m not sure that it’s good in practice;  tweets available to all on a Real Time Search might manipulate the news as much as report it.

So…Real Time Search important?  Conceivably yes – but perhaps in the wrong way.

A good time to upgrade WordPress!

Wordpress LogoI’ve just upgraded various blogs I look after – including my own – to WordPress 2.8.5.  This release is regarded as a ‘hardening’ release by WordPress themselves, and if you’re reasonably up to date the upgrade is a piece of cake – the automatic installer does it all for you.

It might also be a good time to take a look at your WordPress setup in general.  Good practice with any website installation tehse days states that the less you have on a website, the less places there are for malware to hide, so one thing to do immediately is to remove any unused themes or plugins – use your FTP client to back them up if you can’t lay your hands on your originals.  If you do decide to change theme or use the Plugins again, just install them.  Whilst there are some nasties that lurk in the ‘Default’ theme, it’s probably best to leave that installed because it gives you a fallback position if a Plugin breaks your custom theme.

If you have statistics running, take a good look at any ‘spikes’ in the page views.   I use the WordPress stats package and find it perfectly adequate for my needs – which is basically stroking my ego to see if people are reading what I write.  Looking at my page view, I noticed a spike over 3 days early last week – twice as many hits on the site as usual.  Unless you’ve recently done a push for readership, or have blogged on a matter of wide interest, this can indicate a compromise of your site – as I found.

The stats package provides a list of search terns that are used Looking at things in more detail I noticed that whilst the pages accessed were familiar to me, the search terms that were used to get there were most certainly not.  ‘Girlfriends boobs’ is not something I tend to write about on this site!!  Now, given that those terms must have been on the site somewhere to get the hit.  I took a look at the logs provided by my hosting company, and also wandered around my site with FTP.  I DID find evidence of some dodgy looking links, buried in a sub-directory inside teh WordPress installation being accessed by looking at the logs.  However, checking with FTP revealed noting – I realsied that my upgrade to 2.8.5 had wiped out the evidence.  I’ve not had any similar strange search terms showing up since then.

So – summing up:

  1. Keep upgraded.
  2. Remove anything you don’t need.
  3. Install some simple stats and watch Page Views for unexpected spikes.  Get a ‘feel’ for the normal sort of readership levels of your site.
  4. Keep an eye on search terms used to get to your blog.  If ‘odd’ search expressions turn up then start ferreting around. If you have a Google account, register your site with Google and keep an eye on unfollowable links, etc.  Learn what logs are available from your hosting provider and use them.

That’s my lesson for today on WordPress!  As for the upgrade – 2.8.5 works like a charm and has no bad habits that I can find!

Wolfram Alpha – how not to make friends and influence people!

Hmmm…this is becoming WA corner recently – take a look at my previous piece here.  I was less than impressed with the technology and considered it either over-hyped or released too early in to the world.  However, I did hope that as time progressed there might be improvements in the results set returned and, more importantly from a developer point of view, an API published that would allow developers to build new applications to stretch and maybe improve WA.

So, this week an API was announced for Wolfram Alpha on the company’s Blog and I was pretty excited about the prospect of trying out a few things.  Despite my grumbles about the results returned, I was hopeful that with a suitable API encouraging third party developments, the underlying technology and data sets at WA might see an improvement.  My hopes survived for as long as it took me to start reading the small print – in particular this little document, the price list.  Now, I’m aware that WA has cost money to develop but to charge for developers to make use of teh API seems to be one of the dumbest and most counter-productive things they could do.  There are some ‘pioneer grants’ available for the developers, but I get the impression that these are still likely to involve shelling out money.

Google do not charge developers for use of the API until you start using the API in ‘closed’ systems and with a large number of calls.  They certainly don’t charge you during the development cycle – they have more sense.

Now, let’s assume I wanted to develop an API based application for WA – what we in the trade call a ‘proof of concept’ model – i.e. something that proves whether or not the bright idea that we sketched out on the back of a beer-mat in the pub will actually work.  How many requests might I get through to develop such an application?  Well, the other day I wrote some code to retrieve data from a Postcode / Geocode system’s API.  Now, this was a VERY simple application – send a Postcode, retrieve a list of addresses, send a code number, retrieve a full street address with map reference.  Let’s say 2 calls to the remote API for something very straight forward.  During code development and ‘in house’ testing I made about 30 or 40 API calls.  Now, during more formal testing on the client site that’s going to increase somewhat – probably in to the low-hundreds.  And this is for a problem with a well defined structure, with a finite returnable answer set – i.e lists of addresses, a single address or nothing at all, all in a set, predictable format. 

By the very nature of the sort of problem that WA has been set up to deal with, the problems passed up via an API are unlikley to be as well defined and the results set returned is also unlikely to be as simple to deal with as my addresses.  When I did some API work with Google for a client I found I was generating hundreds of API calls and responses during development, let alone testing.  For WA, I’m looking at $60 for 1000 API requests, and $0.08 for each additional request beyond the thousand I initially pay for.  Obviously, I can buy a bigger bundle, but the inference is clear – it ain’t gonnna be cheap developing for the WA API. 

API developments typically involve a learning curve for the API syntax and methods of use.  This is par for the course and to be expected.  However, when the API is interfacing to a curated data set like WA, we have an additional problem of whether the data set will actually contain the sort of data that we’re wanting to get back.  And whether it will be available in the sort of format we’re interested in.  And whether the curated data is timely compared to the data that is being made available through non-curated data sets like those available via Google – or other APIs, for that matter.  Clearly, if your problem space IS covered by WA and the data set WA has available contains what you want in the format in which you want it, then perhaps the API fee is worthwhile.  But for those developers wanting to try something new out, they’re most likely to look to free APIs to test their ideas, and spend time and energy working the wrinkles out in an environment that isn’t costing them pennies for the simplest query.

I’m afraid WA have dropped the ball big time here; by charging for ALL development use of the API they’ve alienated a large source of free development and critical expertise.  Look at how Google has benefited from the sheer number of developers doing ‘stuff’ with their various APIs.  Can you imagine that happening had they charged all the way?  Hardly likely. 

If WA were to make a limited  ‘sandbox’ set of data available for developers via a free of charge API, that would at least allow the developers to get the wrinkles out of their code.  The company could then charge for use of the ‘live’ WA datasets, and would have the additional advantage of the code being run against the live system being reasonably bug free.  By charging from the first line of code written, they’re restricting the development of their own product and driving people in to the arms of Google, Amazon, Bing and the like.  WA doesn’t appear to be offering a lot that is truly revolutionary; so-so natural language query interface against a curated data set.  I doubt it will be long before third party developers start producing the same from Google.

Google and ‘The Dead Past’

Earlier this year we saw the launch of Google’s Street View system – here – and with it came a plethora of complaints about the invasion of privacy implications.   I was one of the happy complainants – Google had a view right through my house, showing people in the house.  To be honest, their reaction was swift and the imagery was removed, but it was an invasion of privacy and I’m still to be convinced that there is any long term gain to be obtained from the system.  yes, I’m aware of all the ‘well, you can see what a neighbourhood’s like before buying a house there’ arguments, but if you do all your checking out of the largest investment you’ll ever make on the Internet then you deserve to find yourself living between a Crack Den and a student house.

Enough…step back and breathe…the title of this piece is ‘Google and ‘The Dead Past’ – now what on Earth do I mean by that?

Science Fiction afficionados amongst you may recognise part of the title as coming from an old story by Isaac Asimov, in which a researcher develops a time viewer to look in to the past, only to eventually realise that the past starts exactly a fraction of a second ago – for all practical, human purposes, the past, to his machine, is identical to the present.  He’s accidentally invented the world’s finest surveillance machine.  As a character says at the end of the story – ‘Happy goldfish bowl to you, to me, to everyone, and may each of you fry in hell forever.’

Now, there’s a looooong way to go between Google and eternal damnation through surveillance, but as is often pointed out, the road to Hell is firstly paved with good itentions and always starts with a single step.  Let’s do soem of that old style extrapolation, though, and see what we’ve got coming up in our future.  Here are a few things that have been posited and talked about as being part of our online future,  some of which are already here, some of which are extrapolations, all of which are technically feasible, if not yet politically acceptable.

  1. Decreased latency between changes in the online world and those changes turning up in Search Engines.  At the moment we might expect a day or so even on busy sites regularly trawled by search engines – a possible future might be that items get folded in to search space within hours.  We’re also already heading towards Tweets being searchable – perhaps future APIs will allow combined searches of facebook, Twitter and general webspace all in one shot?
  2. Use of  ‘mechanical turk’ approaches in encouraging people to use their spare time to classify images, scan online video, etc.  to tag media that are currently not searchable by search engines in their raw form. Imagine that being idone in near real-time.  DARPA are already researching tools to extract context out of text and digitised speech; perhaps some degree of automated scanning of video will follow.  And it’s not outlandish to suggest that what might be useful for the military will sooner or later find its way into civillian online life.
  3. The possibilities inherent in IP Version 6 for a massively enlarged Internet Protocol addressing space make it easier than ever to ensure that everything that can have a separate IP address will have a separate IP address.  Combine that with the geolocation capabilities that come with reduced cost GPS chip sets – many phones now have GPS built in – and the tracking of devices (and their owners) in real time or near real time, sold to us as extensions of the social media experience, becomes a reality.
  4. The increasing usage of ‘Cloud’ computing where everything about you is stored not on your computer or phone but on a ‘cloud’ storage system run by your phone company (T-Mobile?), software supplier (Microsoft?), media seller (Amazon?) puts all your digital life in to teh network – where it can be scanned and examined in transit or in storage.

Add to the technical advances the willingness for peopel to share their activities via Social media (or eventually the commoditisation of their activity patterns and media interests, as ISPs and phone companies realise that people will give up a lot of privacy for cheaper connectivity) and we are perhaps heading towards the science fiction scenario described above.

If people were concerned about the impact of Street View on their lives – a single snapshot taken as a one off – imagine the possible impact of your real-life world being captured as a mosaic by different sources and then being rendered and made searchable by interconnected search tools.  A phone call positions you in one place, photographs taken on the same phone and geo-tagged by the software are sent to a searchable social media site and so identify who you were with and when.  You show up in other photos,  as a recipient of a call from another phone, and so on.  The other evening I was asked ‘Who doesn’t want to be tagged in these photos?’ – the new social nicety for people who are concerned over the privacy of their friends.   Sooner or later I’m certain that nicety will slip by the wayside, and it will be up to us to police our own image online.

A recent business enterprise where people are being asked to monitor CCTV cameras in their spare time  – Internet Eyes – may be regarded as distastefully intrusive, but I do wonder whether it’s the start of a whole range of ‘mechanical turk’ type activities where people are encouraged to act as high-tech lace-curtain twitchers.  That past is not looking as dead anymore.

Are you feeling spied on yet?  If not, I’m sure you soon will be.

Wolfram Alpha – too early released or over-hyped?

In case you’re saying, “Wolfram what?”, here’s a little reading:

http://www.wolframalpha.com/

http://www.bbc.co.uk/blogs/technology/2009/05/does_wolfram_work.html

http://news.bbc.co.uk/1/hi/technology/8052798.stm

http://www.guardian.co.uk/news/blog/2009/may/18/wolfram-review-test-google-search

http://www.theregister.co.uk/2009/05/19/dziuba_wolfram/

http://www.theregister.co.uk/2009/03/17/wolfram_alpha/

http://www.theregister.co.uk/2009/05/18/wolfram_alpha/

 

OK – I’ll start by announcing a vested interest here.  I occasionally write software that attempts to make sense out of straight English questions and phrases, and then by cunning trickery makes the response from the program appear ‘sensible’ as well.  So I know something about how to make software appear smarter than it actually is.  And I’m afraid that at first glance I regard Wolfram Alpha as over-hyped, under-delivering and pretty much unsure of it’s position in the world.

But, the folks at Wolfram Research score highly for getting the coverage they’ve managed!

WA is described as a Computational Knowledge Engine, rather than a search engine.  However, it’s raison d’etre is to answer questions, and nowadays any piece of software on the internet that does that is always going to be regarded by users as some sort of search engine, and the ‘Gold Standard’ against which all search engines tend to be judged is Google.  So, first question…

Is it fair to compare WA and Google?

Not really, and Wolfram himself acknowledges this.  WA is regarded by the company as a means of getting information out of teh raw data to be found on the Web, and it does this by having what’s called ‘curated’ data – that is, Wolfram’s team manage sources used for the data and also the rpesentation of the data.  This makes it very good at returning solid factual and mathematically oriented data in a human readable form. 

Whereas Google will return you a list of pages that may be useful, WA will return data structured in to a useful looking page of facts – no links, just the facts.  And a list of sources used to derive the infromation. The results displayed are said to be ‘computed’ by Wolfram Research, rather than just listed as is the case of a search engine.

Is it a dead end?

WA relies on curated data – that is, a massaging and manipulation process to get the existing web data in to a format that is searchable by the WA algorithms and that is then also presentable in a suitable fomat for review.  This is likely to be a relatively labour intensive process.  Let’s see why…

In a perfect world, all web data would be tagged with ‘semantic tagging’ – basically additional information that allows the meaning of a web page to be more explicitly obvious.  Google, for all it’s cleverness, doesn’t have any idea about the meaning of web page content – just how well or poorly it’s connected to other web pages and what words and phrases appear withjin the page.  They do apply a bit of ‘secret sauce’ to attempt to get teh results o your search closer to what you really want, assuming you want roughly the same as others who’ve searched the Google search space for the same thing.  Semantic tagging would allow a suitably written search engine to start building relationships between web pages based on real meaning.  Now, you might just see the start of a problem here…..

If a machine can’t derive meaning from a web page, then the Semantic tagging is going to have to be human driven.  So for such a tool to be useful we need to have some way of ensuring as much web data as possible would be tagged.  Or, start from tomorrow and say that every new page should be tagged, and write off the previous decade of web content.  You see the problem.

What the WA team have done is taken a set of data from the web, and massaged and standardised it in to a format that their software can handle, then front-ended this system with a piece of software that makes a good stab at natural langauge processing to get the meaning of your question out of your phrase.  For example, typing in ‘Compare the weather in the UK and USA’ might cause the system to assume that you want comparative weather statistics for those two countries.  (BTW – it doesn’t, more on this later)

The bottom line here is that the data set has had to be manually created – something that is clearly not posisble on a regular basis.  And a similar process would ahve to be carried out to get things semantically tagged.  And if we COULD come up with a piece of sofwtare that could do the semantic analysis of any piece of text on the web, then neither of tehse approaches would be needed anyway.

In a way, WA is a clever sleight of hand; but ultimately it’s a dead end that could potentially swallow up a lot of valuable effort.

Is it any good?

The million dollar question.  Back to my ‘Compare the weather in the UK and US’ question.  the reason I picked this was that WA is supposed to have a front end capable of some understanding of the question, and weather data is amongst the curated data set.  I got a Wolfram|Alpha isn’t sure what to do with your input. response.  So, I simplified and gave WA : “Compare rainfall london washington” – same response.  I then went to Google and entered the same search.  And at the bottom of Page 1 found a link : http://www.skyscrapercity.com/showthread.php?t=349393 that had the figures of interest.  Now, and before anyone starts on me, I appreciate that the data that would have been provided by WA would have been checked and so would be accurate.  But I deliberately put a question to WA that I expected it should be able to answer if it was living up to the hype.

I then gave WA ‘rainfall london’ as a search and got some general information (not a lot) about London.  Giving ‘rainfall london’ to Google and found links to little graphs coming out of my ears.  A similar search on rainfall washington to Google gave me similar links to data on Washington rainfall.

WA failed the test, I’m afraid. 

Will it get better?

The smartness of any search tool depends upon the data and the algorithms.  As we’re relying on curated data here, then improvements might come through modifications to data, but that might require considerable effort.  If the algorithms are ‘adaptive’ – i.e. they can learn whether answers they gave were good or bad – then there might be hope.  This would rely on a feedback mechanism from searchers to the sofwtare, basically saying ‘Yes’ or ‘No’.  If the algorithms have to be hand crafted – improvement is likely BUT there is the risk of over-fitting the algorithms to suit the questions that people have asked – not the general searching of what MAY be asked.

And time passes…

As it turned out, this post never moved from ‘Draft’ to ‘Published’ because of that thing called ‘Life’.  So, a month or two have passed, and I’ve decided to return to Wolfram Alpha and see what’s changed….

Given the current interest in the band Boyzone, I did a quick search.  WA pointed me to a Wiki entry – good – but nothing else.  Google pointed me to stacks of stuff.  Average rainfall in London got me some useful information about rainfall in the last week.  OK….back to one of my original questions ‘Compare rainfall London Washington’ – this time I got the London data with the Washington equivalent on it as well – sort of what I wanted.  Google was less helpful this time than back when I wrote this piece.

So…am I more impressed?  Maybe a little.  Do I feel it’s a dead end?  Probably, yes, except in very specific areas taht might already be served by things like Google and Wiki anyway.

Do I have an alternative solution for the problem?

If I did, do you think I’d blog it here and expose myself to all that criticism? 🙂