Disk-backed log library

Closed Posted Feb 22, 2010 Paid on delivery
Closed Paid on delivery

What the Project Owner requires:

This project is for the development of a log data structure library. We use the term "log" here in the sense of a "log-structured file system" [1], a distinct notion from the textual diagnostic/auditing logs generated by applications. A log is a container for elements which supports appending elements to the end of the log, its single write operation, and reading elements in order or pseudo-order.

[1] [url removed, login to view]

## Deliverables

What the Project Owner requires:

This project is for the development of a log data structure library. We use the term "log" here in the sense of a "log-structured file system" [1], a distinct notion from the textual diagnostic/auditing logs generated by applications. A log is a container for elements which supports appending elements to the end of the log, its single write operation, and reading elements in order or pseudo-order.

The log constructor takes as input a directory name. The log may create any files in the directory that it needs, conforming to file system limits on maximum numbers of files per directory.

The log structure can operate in two modes: disk-backed and buffered-disk-backed. (This may be a flag passed to the constructor, or separate implementations of the log interface.) When the log operates in disk-backed mode, every operation writes to the disk. In disk-backed mode, it should not be possible to experience data loss for written log entries under power failure; that is, before the write operation returns successfully to the client application, the data has been written to disk and flushed.

In the buffered-disk-backed mode, write operations will return before hitting the disk, buffering the write in memory. By batching the disk writes together, this mode enables the log to sustain higher write throughput, for reduced reliability.

The log structure contains elements which are "binary large objects", or BLOBs. The log library contains a class representing the binary element type, with convenience methods for accessing the element as a Java byte array, ByteBuffer, and InputStream. The log library supports streaming writes; for example, the client application can construct a BLOB passing in an InputStream, and pass the blob to the log write operation, which will stream the InputStream to disk before completing the write.

Similarly, when reading elements written to the log, the log supports retrieving the element's size. The client application can request the element BLOB, which enables the client to read it as a byte array or as an InputStream.

The log structure enables the client application to optionally pass in an expected SHA1 hash of the BLOB. The log will check that the hash matches the BLOB's content. Because the log cannot expect to read the same BLOB InputStream twice, the log will check the hash of the BLOB's content while it's streaming the blob to disk. If the BLOB content does not match the client's passed SHA1 write hash, the write operation fails and is not written to the log. If the write operation succeeds, regardless of whether the client passed a hash, the log includes the SHA1 hash for inspection later during reading. The log always calculates and stores the SHA1 hash for every write.

The log should support arbitrarily large BLOB objects larger than 4 gigabytes, with the assumption that the underlying file system will also. The maximum log size should not be restricted by the library.

The log file format must be extremely robust and fault-tolerant. To reiterate, when operating in disk-backed mode, the log should persist each write to disk before returning successfully from the write operation. It should not be possible to sustain data loss under power failure corresponding to a write which has succeeded. When the log is re-opened, it performs a consistency check and ignores previous partial writes.

The log should also be fault-tolerant of disk failures. The log IO subsystem is implemented with the assumption that the disk might partially persist any file system write, stopping halfway through. The log file format will be designed with these requirements in mind.

The log should be extensively unit tested and functionally tested. Its tests should include tests which simulate disk failure at arbitrary points during writes.

Other requirements:

The log data structure should be delivered as Java source, complete with an Ant [url removed, login to view] which builds the project.

The project should be extensively unit-tested and have high line and branch coverage.

This is work-for-hire, meaning that the source code written becomes the sole property of the Project Owner, including copyright.

The project MAY NOT include any source code under copyright of other entities, including Open Source. The project MAY rely on existing open source or free libraries.

[1] [url removed, login to view]

Computer Security Engineering Java Linux MySQL PHP Project Management Software Architecture Software Testing UNIX Web Security

Project ID: #3199056

About the project

10 proposals Remote project Active Mar 15, 2010

10 freelancers are bidding on average $2826 for this job

dennisivw

See private message.

$3060 USD in 14 days
(105 Reviews)
7.3
SteveCodon

See private message.

$3400 USD in 14 days
(16 Reviews)
5.9
kraneware

See private message.

$2720 USD in 14 days
(9 Reviews)
5.9
AxactSolutions

See private message.

$3400 USD in 14 days
(7 Reviews)
3.8
Rampayoda

See private message.

$3102.5 USD in 14 days
(4 Reviews)
4.7
aartak

See private message.

$2040 USD in 14 days
(3 Reviews)
2.3
softvisionitsol

See private message.

$2890 USD in 14 days
(0 Reviews)
0.0
saliljosh

See private message.

$3400 USD in 14 days
(0 Reviews)
0.0
amitusaineu

See private message.

$2550 USD in 14 days
(0 Reviews)
0.0
turtlemastervw

See private message.

$1699.15 USD in 14 days
(1 Review)
0.0