By Graham Asher on 14 July 2012
Full text searching is the ability to search for any word, phrase, or combination of words. In the case of CartoType maps, that means looking up map objects based on their names or other string attributes.
CartoType has long had a simple but fast search facility allowing either a perfect match or a match on names starting with certain text, but it is really not enough. Customers want to be able to search for any word in a name. This is particularly important in countries where nearly all street names start with a word meaning 'street': for example, 'calle' in Spain.
Over the next week or two we'll be releasing a new version of generate_map_data_type1, the tool for building CTM1 (map) files, which will be able to create a full-text index. In tandem with that there will be new API functions providing:
and, combined with these:
Naturally all the existing API functions will continue to work in exactly the same way.
These new API functions will also provide the basis of flexible address lookup (geocoding) functions, by which is meant the ability to specify any combination of street name, settlement name, postcode, house name or number, etc., and find the matching position on the map.
The new text index is also much faster to build than the old one. We've moved from a packed trie format built in N-squared time (or worse) to a simpler type of trie that is also compact and fast to search. The time taken to generate one unusually large CTM1 file was reduced fifty-fold.
Stop words are important in full text searching. These are words too common to be useful in a search, like 'the', or, in a map search, 'road', 'street', 'avenue', etc. We don't keep a fixed list of stop words. Instead we don't index any word which occurs too often - say, more than a thousand times. Of course we also index strings composed entirely of stop words, so you can still search for a street called 'The Avenue'. For the moment phrase searching ignores stop words, although we may revisit that decision.
Full text indexing and fuzzy matching now work properly. Address searching is next. Some preparatory work has already been done: by default generate_map_data_type1 now loads all the relevant address-related data into the map and converts it into string attributes on map objects.