I need someone to develop for me a very simple "proof of concept" for a machine-learning application.
The input file contains the fields: [address addr_number addr_drctn addr_street city state zip dir neighborhd cross_st cmplx market_area community mapcol mappage maprow].
The reference file contains the fields [latitude, longitude]. The two files are related by the [address, city, state, zip] values.
I need the ML algorithm to be trained on the reference latitude, longitude values using the inputs from the input file. Given any held out input (or future input), the trained algorithm should predict the latitude/longitude values with reasonable accuracy (limited to what is possible).
Proper treatment (feature engineering) must be applied to the input fields, some of which contain variable-length strings. The FE approach used must be pre-approved by the employer prior to start of development.
You must solve the problem three times using three different ML algorithms, and provide a report comparing the relative performance and recommending which algorithm is best for this problem.
You may choose the programming language from one of the following: Languages allowed for this project are Java, Python, R, C, C++ and Scala. Open-source ML libraries must be used (no hand-coded algorithms will be accepted).
Please see the attached zip file for details. The password is posted at [url removed, login to view]