Monday 19 March 2012

Suburb Lookup

Overview

Just about every contract I have had put the suburb state and postcode as separate fields,  these fields are then checked against a database of valid suburbs. We need a simple entry that takes part of the suburb and then guesses up the rest of it forcing the user to select a valid value in a very easy manner.

The visual idea

Simply put we want the user to be able to enter part of the suburb and then automatically look up the rest of the suburb.  For example



Of course we want to be able to enter a postcode / zipcode and get a list of suburbs as well.  This way we can handle a phone call where you cannot make out the suburb but get the postcode.

The data

Fortunately the post office give use the data to work on so please refer to the following websites:

Break down the solution

To make testable components to our solution we must break apart the visual components from the business elements.   I will write a simple Suburb state postcode / zipcode lookup routine in C# which loads the data from the CSV file provided by Australia Post.

Class for a suburb

using System;

namespace SuburbLookup
{
    /// <summary>
    /// simple data storage for the suburb state and postcode
    /// </summary>
    public class SuburbStatePostcode
    {
        /// <summary>
        /// Unique identifier so we can send small messages backwards and forwards
        /// </summary>
        public int Id { getset; }

        /// <summary>
        /// Suburb part of the address
        /// </summary>
        public string Suburb { getset; }

        /// <summary>
        /// Three character state code in uppercase
        /// </summary>
        public string State { getset; }

        /// <summary>
        /// Postcode part,  it is an integer in Australia but we keep it as a string
        /// </summary>
        public string Postcode { getset; }

        /// <summary>
        /// Create an empty suburb state and postcode
        /// </summary>
        public SuburbStatePostcode()
        {
            this.Suburb = string.Empty;
            this.State = string.Empty;
            this.Postcode = string.Empty;
            this.Id = 0;
        }

        /// <summary>
        /// Display suburb state and postcode
        /// </summary>
        /// <returns></returns>
        public override string ToString()
        {
            return String.Format("{0} {1} {2}"this.Suburb, this.State, this.Postcode);
        }

    }
}

This is a simple class to store the data I need and format it with ToString so I centralise my formatting.

Loading the data

using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
 
namespace SuburbLookup
{
    /// <summary>
    /// Load an Australia post postcode CSV into memory
    /// </summary>
    public class AustPostLoad
    {
        /// <summary>
        /// Loaded data from Australia Post
        /// </summary>
        public List<SuburbStatePostcode> Suburbs { getprivate set; }
 
        /// <summary>
        /// list of suburbs by postcode
        /// </summary>
        public Dictionary<stringList<SuburbStatePostcode>> ByPostcode { getprivate set; }
 
        /// <summary>
        /// Load a CSV file from Australia post
        /// </summary>
        /// <param name="pFilename">Filename to load</param>
        public AustPostLoad( String pFilename)
        {
            ByPostcode = new Dictionary<stringList<SuburbStatePostcode>>();
            Suburbs = new List<SuburbStatePostcode>();
            const int postcodePosition = 0;
            const int suburbPosition = 1;
            const int statePosition = 2;
            const int categoryPosition = 9;
            using (var reader = new Com.StellmanGreene.CSVReader.CSVReader(new FileInfo(pFilename)))
            {
                var columns = reader.ReadRow();
                if (columns == null)
                    throw new Exception("Empty file from " + pFilename);
                if ("Pcode".Equals(columns[postcodePosition])
                  && "Locality".Equals(columns[suburbPosition])
                  && "State".Equals(columns[statePosition])
                  && "Category".Equals(columns[categoryPosition]))
                {
                }
                else
                {
                    var message = string.Format("First three columns do not match Australia post template of Pcode,Locality, State and last Category");
                    Utilities.Logger.Fatal(message);
                    throw new Exception(message);
                }
                Utilities.Logger.Debug("AustPostLoad: Loading postcode data");
                columns = reader.ReadRow();
 
                int count = 0;
                var punctuation = new Regex("['/*&\\-(.]");
                while (columns != null)
                {
                    count++;
                    string category = columns[categoryPosition].ToString();
                    if (!Match(category, "LVR"))
                    {
                        var record = new SuburbStatePostcode()
                        {
                            Id = count,
                            Postcode = columns[postcodePosition].ToString(),
                            Suburb = columns[suburbPosition].ToString(),
                            State = columns[statePosition].ToString(),
                        };
 
                        if(punctuation.IsMatch( record.Suburb))
                        {
                            Utilities.Logger.Debug(record.Suburb);
                        }
 
                        if (!ByPostcode.ContainsKey(record.Postcode))
                            ByPostcode.Add(record.Postcode, new List<SuburbStatePostcode>());
                        Suburbs.Add(record);
                        ByPostcode[record.Postcode].Add(record);
                    }
                    columns = reader.ReadRow();
                }
            }
        }
 
        /// <summary>
        /// Case insensitive match of string within another
        /// </summary>
        /// <param name="value">value to test for containment</param>
        /// <param name="key">value within string</param>
        /// <returns></returns>
        protected static bool Match(string value, string key)
        {
            return value.IndexOf(key, StringComparison.CurrentCultureIgnoreCase) >= 0;
        }
    }
}

Now I create a class to load up only the portions of the file that I want.  I confirm the first record with column names to ensure that the format of the data has not changed and load each line.   I have specifically excluded ‘Large Volume Receivers’ postcodes because they are not used for day to day mailings.   If I wanted to be smarter I could flag the Post Office Box suburbs to only show them on postal addresses but it is probably not worth it for this implementation.
Note that if I was using this in a production system I would store the data into a SQL table.

The interface to search


For this programming exercise I want to be able to use different algorythms for searching later.  In production I would combine this base class with the implementation class to keep it simple.
using System;
using System.Collections.Generic;
 
namespace SuburbLookup
{
    /// <summary>
    /// Base class for all suburb lookup
    /// </summary>
    public abstract class SuburbBase
    {
        /// <summary>
        /// Class to return to the website for data entry on lookup
        /// </summary>
        public class Result
        {
            /// <summary>
            /// Unique reference to this suburb
            /// </summary>
            public int id { getset; }
 
            /// <summary>
            /// All parts of the suburb state and postcode combined
            /// </summary>
            public string value { getset; }
        }
        /// <summary>
        /// Search for a match based on data entered by the user, default 50 entries maximum
        /// </summary>
        /// <param name="text">Text entered by the user</param>
        /// <returns>List of matched suburbs</returns>
        public List<SuburbBase.Result> Search(String text)
        {
            return Search(text, 50);
        }
 
        /// <summary>
        /// search for a match based on data entered by the user
        /// </summary>
        /// <param name="text">Text entered by the user</param>
        /// <param name="count">NUmber of entries to return</param>
        /// <returns>List of matched suburbs</returns>
        public abstract List<SuburbBase.Result> Search(String text, int count);
 
        /// <summary>
        /// Get the search process based up on the settings
        /// </summary>
        /// <returns></returns>
        public static SuburbBase GetSearch()
        {
            if (SettingsStatic.UseLucene)
            {
                return new SuburbLucene();
            }
            return new SuburbSimpleText();
        }
    }
}

This provides a class for us to return a simple line of text representing the suburb state and postcode and a magical Id.   The line of text is what is shown on the autocomplete.

Simple Text Lookup

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
 
namespace SuburbLookup
{
    /// <summary>
    /// Simple class to look up a suburb state and postcode based on text presented
    /// </summary>
    public class SuburbSimpleText : SuburbBase
    {
 
        /// <summary>
        /// List of addresses to look up using a simple approach
        /// </summary>
        static List<SuburbStatePostcode> Suburbs = new List<SuburbStatePostcode>();
 
        static Dictionary<stringList<SuburbStatePostcode>> ByPostcode = new Dictionary<stringList<SuburbStatePostcode>>();
 
        static SuburbSimpleText()
        {
            var filename = Path.Combine(Utilities.DataPath, "pc-full.csv");
            AustPostLoad load = new AustPostLoad(filename);
            Suburbs = load.Suburbs;
            ByPostcode = load.ByPostcode;
        }
 
        public SuburbSimpleText()
        {
        }
 
        public List<String> PostcodeList(int postcode)
        {
            var search = postcode.ToString("0000"); //  Australia uses 4 digits for their postcodes  eg 0850 is in the Northern Territory
            return Suburbs.Where(x => x.Postcode == search).Select(x => x.ToString()).ToList();
        }
 
        public override List<SuburbBase.Result> Search(String text, int limit)
        {
            int postcode;
            if (string.IsNullOrWhiteSpace(text))
                return new List<SuburbBase.Result>();
            if (text.Length < 5 && int.TryParse(text, out postcode))
            {
                var result = new List<SuburbBase.Result>();
                foreach (var key in ByPostcode.Keys.OrderBy(x => x).Where(x => Match(x, text)))
                {
                    result.AddRange(ByPostcode[key].Select(x => new SuburbBase.Result() { id = x.Id, value = x.ToString() }).ToList());
                }
                return result;
            }
            else
            {
                return Suburbs.Where(s => Match(s.Suburb, text)).Select(x => new SuburbBase.Result() { id = x.Id, value = x.ToString() }).ToList();
            }
        }
 
 
        /// <summary>
        /// Case insensitive match of string within another
        /// </summary>
        /// <param name="value">value to test for containment</param>
        /// <param name="key">value within string</param>
        /// <returns></returns>
        protected static bool Match(string value, string key)
        {
            return value.IndexOf(key, StringComparison.CurrentCultureIgnoreCase) >= 0;
        }
    }
}

This will load a complete table of suburb postcodes into memory when first started using the static constructor.
If no text is provided then nothing is returned, it is not sensible to return any data for that anyway.
Next it will check whether only a postcode has been entered (4 digits) by checking the length and doing a int.tryparse.  If it is a postcode then all suburbs for that postcode will be returned,  so etering ‘2233’ will return ‘Heathcote NSW 2233’, ‘Engadine NSW 2233’ and etc.
If it is not a postcode then it will use the partial word to match text within the suburb (case insensitive)   So entering ‘Heath’ will return ‘Heathcote NSW 2233’, ‘Heathcote VIC 3523’, etc.

Service for MS Ajax

using System.Collections.Generic;
using System.Linq;
using System.ServiceModel;
using System.ServiceModel.Activation;
using System.ServiceModel.Web;
 
namespace SuburbLookup
{
    [ServiceContract(Namespace = "http://oowaratah.blogger.com")]
    [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)]
    public class WebService
    {
        /// <summary>
        /// Extract a list of suburbs matching the text criteria  Used for the MS AutoCompleteExtender
        /// </summary>
        /// <param name="prefixText">Text criteria</param>
        /// <param name="count">maximum number of rows to return</param>
        /// <returns>List to display as selection</returns>
        [OperationContract]
        [WebInvoke(Method = "POST")]
        public List<string> GetSuburbs(string prefixText, int count) // , string contextKey )
        {
            var Suburbs = SuburbLookup.SuburbBase.GetSearch();
 
            if (count < 1) // ignore count specified online and use default
                return Suburbs.Search(prefixText).Select(x=>x.value).ToList();
            else
                return Suburbs.Search(prefixText, count).Select(x => x.value).ToList();
        }
    }
}
 
And the web config
 
  <system.serviceModel>
    <bindings>
      <webHttpBinding>
        <binding name="ServiceAccess" />
      </webHttpBinding>
    </bindings>
    <diagnostics>
      <endToEndTracing activityTracing="true" messageFlowTracing="true" />
    </diagnostics>
    <behaviors>
      <endpointBehaviors>
        <behavior name="AspNetAjaxBehavior">
          <enableWebScript />
        </behavior>
      </endpointBehaviors>
    </behaviors>
    <serviceHostingEnvironment aspNetCompatibilityEnabled="true" multipleSiteBindingsEnabled="true" />
    <services>
      <service name="SuburbLookup.WebService">
        <endpoint address="" behaviorConfiguration="AspNetAjaxBehavior" binding="webHttpBinding" bindingConfiguration="ServiceAccess" contract="SuburbLookup.WebService" />
      </service>
    </services>
  </system.serviceModel>

This will return a simple text list for the MS Ajax web complete. 

The UserControl

<%@ Control Language="C#" AutoEventWireup="true" CodeBehind="Suburb.ascx.cs" Inherits="SuburbLookup.Suburb" %>
 
<asp:TextBox ID="SuburbBox" runat="server" CssClass="Suburb" />
<ajaxToolkit:AutoCompleteExtender 
    runat="server" 
    ID="SuburbAutoComplete" 
    TargetControlID="SuburbBox"
         ServicePath="~/Service/WebService.svc"
    ServiceMethod="GetSuburbs"
    MinimumPrefixLength="2" 
    CompletionInterval="1000"
    EnableCaching="true"
    CompletionSetCount="20" 
    />
 
This requests data from the web service with a partial suburb key.


That is it,  you may use this code under a BSD license.   Any changes welcome.