Find Jobs
Hire Freelancers

CTC #69a: Build Historical Listing Stats table

$30-100 USD

Cancelled
Posted almost 13 years ago

$30-100 USD

Paid on delivery
**Do you weave SQL in your dreams?** Here’s a simple task: to consolidate stats from one source into a table. But there’s a challenge: The source is a view built from several enormous tables, (think millions of rows), and the “easy?? query would lock up the server for hours, if not days. You need to have the “mad skills?? to live, breath and sleep SQL so you can get yourself under the hood, get dirty, and come up with a light, elegant, and highly efficient solution. Our end goal is to be able to display the following data for any given IMPORT_FILE_DATE. (Please note: For *this* task, you will not be writing any web/user-interface code!) ![][1] *Conceptually,* building hist_listing_stats from idx_history_view is a pretty straight-forward task: Each unique combination of IMPORT_FILE_DATE + AR_AREA in idx_history_view should yield a single corresponding row (i.e., only for this IMPORT_FILE_DATE and AR_AREA) in hist_listing_stats: periodType = PERIOD_DAY IMPORT_FILE_DATE = this IMPORT_FILE_DATE regionType = REGION_AREA region = this AR_AREA AVG_LP_LIST_PRICE = average of positive, non-NULL LP_LIST_PRICE NUM_LP_LIST_PRICE = count of positive, non-NULL LP_LIST_PRICE AVG_1BR_LIST_PRICE = average of positive, non-NULL LP_LIST_PRICE  where BR_BEDROOMS = 1 NUM_1BR_LIST_PRICE = count of positive, non-NULL LP_LIST_PRICE  where BR_BEDROOMS = 1 AVG_2BR_LIST_PRICE = average of positive, non-NULL LP_LIST_PRICE  where BR_BEDROOMS = 2 NUM_2BR_LIST_PRICE = count of positive, non-NULL LP_LIST_PRICE  where BR_BEDROOMS = 2 AVG_PRICE_PER_SQFT = average of (LP_LIST_PRICE / CTCX_SQUARE_FOOTAGE)  when each of those values are positive and non-NULL NUM_PRICE_PER_SQFT = count of (LP_LIST_PRICE / CTCX_SQUARE_FOOTAGE)  when each of those values are positive and non-NULL Repeat the entire process, using ZP_ZIP_CODE instead of AR_AREA; regionType = REGION_ZIP. (We’re planning on adding at least a third region type shortly, such as Neighborhood, so plan and modularize appropriately.) Most or all of the above should of course be calculated on the database server. ## Deliverables There are two complicating factors: 1. There are about 11 million rows of data. Most of the work here is not in writing easy queries, but in having the finesse to structure work so that it occurs in a highly efficient fashion. The entire process should completely finish within one hour. The benchmarking hardware is a Rackspace Cloud Server VM with 512MB RAM. Variability of VM timings is mitigated by using a "best of N tries" benchmarking approach. 2. idx_history_view is actually a view on several other tables. There is a strong concern that using the view directly will place far too much load on the database server. The software will very likely have to access the underlying tables directly, possibly with the addition of further optimization through indexing, etc. Developers are responsible for optimizing all aspects of work so that the servers can run in the most efficient fashion possible. It is critical that the software design takes the above points into consideration. This software needs to be able to handle skipped days (for the rare occasions when the server crashes, etc), as well as the special case of the very first run in production, when there will be approximately one year’s woth of “skipped?? days. The processing for one day should be fully self-contained within a Stored Procedure. This will be invoked with the IMPORT_FILE_DATE to be processed. This procedure will have SET AUTOCOMMIT=0, and START TRANSACTION and COMMIT TRANSATION statements. That means one day's worth of updates is an all-or-nothing atomic operation, fully handled by MySQL, without a need for clunky explicit locks. At the end of its processing, this daily procedure will update a status table (e.g. a new column to `idx_file_commit`.`hist_listing_stats`). That would mark this day as done and should prevent it from being re-processed in the future. The daily procedure itself should be invoked by another stored procedure (which which is aware of statuses and selects the next day to process). There shouldn’t be a need for any PHP code anywhere in the entire update process. **Schema:** // Enumerations define( "ENUM_PERIOD_TYPE_DAY", "'PERIOD_DAY'" ); define( "ENUM_REGION_AREA", "'REGION_AREA'" ); define( "ENUM_REGION_ZIP", "'REGION_ZIP'" ); CREATE Table hist_listing_stats ( periodType enum(PERIOD_DAY) NOT NULL default PERIOD_DAY  COMMENT = ’Period covered by this row.’, IMPORT_FILE_DATE date NOT NULL PRIMARY KEY  COMMENT = ’Stats as of this data import date.’, regionType enum(REGION_AREA, REGION_ZIP) NOT NULL  COMMENT = ’Type of region: area code or zip code.’, region int unsigned NOT NULL  COMMENT = ’Area code or zip code.’, UNIQUE KEY (periodType, IMPORT_FILE_DATE, regionType, region), AVG_LP_LIST_PRICE double NOT NULL  COMMENT = ’Average of positive, non-NULL listing prices in US$.’, NUM_LP_LIST_PRICE int NOT NULL  COMMENT = ’Number of positive, non-NULL listings.’, AVG_1BR_LIST_PRICE double NOT NULL  COMMENT = ’Average of positive, non-NULL 1br listing prices in US$.’, NUM_1BR_LIST_PRICE int NOT NULL  COMMENT = ’Number of positive, non-NULL 1br listings.’, AVG_2BR_LIST_PRICE double NOT NULL  COMMENT = ’Average of positive, non-NULL 2br listing prices in US$.’, NUM_2BR_LIST_PRICE int NOT NULL  COMMENT = ’Number of positive, non-NULL 2br listings.’, AVG_PRICE_PER_SQFT double NOT NULL  COMMENT = ’Average price per square foot in US$.’, NUM_PRICE_PER_SQFT int NOT NULL  COMMENT = ’Number of price per square foot data points.’, ) ENGINE = InnoDB, COMMENT = 'Basic statistics on the state of (what was) current listings at various points in time.’; You will be expected to deliver on this project using two milestones: **Development:** Prove you have a stand-alone working solution by presenting a live demonstration and/or some screen-shots, and deliver full source code. This development can be on your own server if you wish, or we can grant you access to a clean Rackspace Cloud Linux Server at our cost. If you use your own server, be aware that our database (and the corresponding database dump) is currently about five gig, but can be compressed down to 100MB using LRZIP. (You must of course have LRZIP installed.) **Integration:** You will be granted access to our integration server, where you will prove that your code works well with the rest of the system. **Environment:** PHP 5.3.2, MySQL 5.1.41, Ubuntu 10.04 LTS.
Project ID: 3363406

About the project

1 proposal
Remote project
Active 13 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
1 freelancer is bidding on average $60 USD for this job
User Avatar
See private message.
$59.50 USD in 7 days
0.0 (2 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
San Diego, United States
5.0
295
Payment method verified
Member since Aug 17, 2006

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.