Saturday, June 22, 2013

Solr 4.3.0 Spatial Search Setup for geonames database

One of my projects required a spatial search using geonames to match cities across multiple local data  sources.  We had few options to do this instead of deploying our own geo mapping service such as geonames webservice.   However our product use cases required relevancy tuning and boosting of certain geonames place types, so we decided to setup Solr.

 To setup Solr 4.x for spatial search,  I followed the instructions at http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4, but still ran into issues.   This is consolidation of information collected from various forums to setup spatial search on Solr 4.3 successfully.

The following changes are done to make my Solr 4.x installation:
  • SOLR2155 is only for Solr 3.x and it doesnt work on 4.3.  Solr 4.x supports alternative geospatial method.   So disable the following in the solrconfig.xml.
    • <queryParser name="gh_geofilt" class="solr2155.solr.search.SpatialGeoHashFilterQParser$Plugin" />
    • <valueSourceParser name="geodist" class="solr2155.solr.search.function.distance.HaversineConstFunction$HaversineValueSourceParser" />
    • If you have this enabled in Solr 4.x and used 3.x types, you should see the following error - Unable to create core: collection1 org.apache.solr.common.SolrException: org/apache/lucene/queryParser/ParseException
  • Download jts-1.8.jar into solr-webapp/webapp/WEB-INF/lib
  • If you are planning to import data from mysql,  download the mysql connector into solr-webapp/webapp/WEB-INF/lib
    • mysql-connector-java-5.1.25.tar.gz

Schema definition for geonames:
    <field name="id"            type="string" indexed="true" stored="true" required="true" />
    <field name="basic_name"    type="text_general"   indexed="true" stored="true" required="true" omitNorms="false" />
    <field name="utf8_name"     type="text_general"   indexed="true" stored="true" />
    <field name="latitude"      type="double" indexed="true" stored="true" />
    <field name="longitude"     type="double" indexed="true" stored="true" />
    <field name="feature_class" type="string" indexed="true" stored="true" />
    <field name="feature_code"  type="string" indexed="true" stored="true" />
    <field name="country_code"  type="string" indexed="true" stored="true" />
    <field name="population"    type="long"  indexed="true" stored="true" />
    <field name="elevation"     type="int"   indexed="true" stored="true" />
    <field name="gtopo30"       type="int"   indexed="true" stored="true" />
    <field name="timezone"      type="string" indexed="true" stored="true" />
    <field name="date_modified" type="date"   indexed="true" stored="true" />
    <field name="latlon" type="location_rpt" indexed="true" stored="true"/>

......
    <fieldType name="location_rpt"   class="solr.SpatialRecursivePrefixTreeFieldType"
               spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
               distErrPct="0.025"
               maxDistErr="0.000009"
               units="degrees"
            />
......


Geonames database
To setup geonames, download the database (allcountries.txt) and import into the Solr.



Testing the setup

Start the server.

Fire the query:    http://localhost:8983/solr/collection1/select/?fl=*,score&sort=score%20asc&q={!geofilt%20score=distance%20filter=true%20sfield=latlon%20pt=42.56667,1.48333%20d=1}&fq=feature_code:PPL

Note: The query includes 'score=distance' and 'sort=score' to include distance in the response.

Query Response

<response>

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="sort">score asc</str>
<str name="fl">*,score</str>
<str name="q">
{!geofilt score=distance filter=true sfield=latlon pt=42.56667,1.48333 d=1}
</str>
<str name="fq">feature_code:PPL</str>
</lst>
</lst>
<result name="response" numFound="2" start="0" maxScore="0.005452368">
<doc>
<str name="id">3039896</str>
<str name="basic_name">Mas de Ribafeta</str>
<str name="utf8_name">Mas de Ribafeta</str>
<double name="latitude">42.56936</double>
<double name="longitude">1.48837</double>
<str name="feature_class">P</str>
<str name="feature_code">PPL</str>
<str name="country_code">AD</str>
<long name="population">0</long>
<int name="gtopo30">1437</int>
<str name="timezone">Europe/Andorra</str>
<date name="date_modified">2011-04-19T00:00:00Z</date>
<str name="latlon">42.56936,1.48837</str>
<long name="_version_">1437926149679218688</long>
<float name="score">0.0045844303</float>
</doc>
<doc>
<str name="id">3041519</str>
<str name="basic_name">Arinsal</str>
<str name="utf8_name">Arinsal</str>
<double name="latitude">42.57205</double>
<double name="longitude">1.48453</double>
<str name="feature_class">P</str>
<str name="feature_code">PPL</str>
<str name="country_code">AD</str>
<long name="population">1419</long>
<int name="gtopo30">1465</int>
<str name="timezone">Europe/Andorra</str>
<date name="date_modified">2010-01-29T00:00:00Z</date>
<str name="latlon">42.57205,1.48453</str>
<long name="_version_">1437926150245449732</long>
<float name="score">0.005452368</float>
</doc>
</result>
</response>



No comments :