Lab 3: Basic File I/O

Original author: Munehiro Fukuda

Revisions 2019: Morris Bernstein

Background

The purpose of this lab is to familiarize yourself with low-level Unix and C standard library I/O operations.

Unix files are a sequence of bytes, but the underlying hardware that files are implemented on are typically block-based. This means that to read a single byte, an entire block must be read into the kernel. To write a file, the kernel must read the entire block, modify the byte, and write the entire block back.

Most programming languages include standard I/O routines as part of the core language or standard library. The I/O library functions typically provide high-level formatting to convert binary data into text and buffering to make more efficient I/O operations.

The Unix unformatted read system call has an an analog in the C standard library in fgetc (read single byte) and fread (read block of bytes).

Your mission is to pick a large file such as this image Icicles
and read the entire file using both read(2) and fgetc(3)/fread(3). See the relevant man pages.

Compare performance for read sizes of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and 4096 bytes. Graph your data. Since the performance difference may be dramatic, you may have to resort to a semi-log plot. System Calls vs. Standard Library Timing

A fairly-well-fleshed-out skeleton has been provided for you. Fill in the sections marked “TODO(lab)”.

Expected Output

Here is the output from a single run:


Unix system calls, chunk size 1 elapsed time:
	 1.339777231 seconds
	bytes read: 4264394
C standard library I/O fgetc elapsed time:
	 0.007076979 seconds
	bytes read: 4264394
Unix system calls, chunk size 2 elapsed time:
	 0.659083843 seconds
	bytes read: 4264394
C standard library I/O, block size 2 elapsed time:
	 0.021763086 seconds
	bytes read: 4264394
Unix system calls, chunk size 4 elapsed time:
	 0.331376076 seconds
	bytes read: 4264394
C standard library I/O, block size 4 elapsed time:
	 0.010421991 seconds
	bytes read: 4264394
Unix system calls, chunk size 8 elapsed time:
	 0.166327953 seconds
	bytes read: 4264394
C standard library I/O, block size 8 elapsed time:
	 0.005503893 seconds
	bytes read: 4264394
Unix system calls, chunk size 16 elapsed time:
	 0.082514048 seconds
	bytes read: 4264394
C standard library I/O, block size 16 elapsed time:
	 0.003057957 seconds
	bytes read: 4264394
Unix system calls, chunk size 32 elapsed time:
	 0.042083979 seconds
	bytes read: 4264394
C standard library I/O, block size 32 elapsed time:
	 0.001880884 seconds
	bytes read: 4264394
Unix system calls, chunk size 64 elapsed time:
	 0.021432161 seconds
	bytes read: 4264394
C standard library I/O, block size 64 elapsed time:
	 0.001292944 seconds
	bytes read: 4264394
Unix system calls, chunk size 128 elapsed time:
	 0.010902166 seconds
	bytes read: 4264394
C standard library I/O, block size 128 elapsed time:
	 0.000910044 seconds
	bytes read: 4264394
Unix system calls, chunk size 256 elapsed time:
	 0.005445957 seconds
	bytes read: 4264394
C standard library I/O, block size 256 elapsed time:
	 0.000805855 seconds
	bytes read: 4264394
Unix system calls, chunk size 512 elapsed time:
	 0.002906084 seconds
	bytes read: 4264394
C standard library I/O, block size 512 elapsed time:
	 0.000774145 seconds
	bytes read: 4264394
Unix system calls, chunk size 1024 elapsed time:
	 0.001536131 seconds
	bytes read: 4264394
C standard library I/O, block size 1024 elapsed time:
	 0.000572920 seconds
	bytes read: 4264394
Unix system calls, chunk size 2048 elapsed time:
	 0.000847101 seconds
	bytes read: 4264394
C standard library I/O, block size 2048 elapsed time:
	 0.000534058 seconds
	bytes read: 4264394
Unix system calls, chunk size 4096 elapsed time:
	 0.000493050 seconds
	bytes read: 4264394
C standard library I/O, block size 4096 elapsed time:
	 0.000526905 seconds
	bytes read: 4264394

Notes and Hints

Although it shouldn't really make much of a difference in this case, since you are collecting timing information, compile with maximum compiler optimization flags set (-O3).

It may be helpful to write (or find someone to write for you) a script that will post-process your output into a suitable input format for a graphing program such as gnuplot

Your home directories on the Linux Lab machines are stored remotely on a fileserver, so the network transit time may dominate the performance (the experiment has not yet been conducted, so you will be charting new territory). To get accurate results, read a file from local filesystem (e.g. some large binary executable).

RAII was used to collect timing information for the section of code we wish to time. The constructor collects the start time; the destructor collects the finish time and calculates the elapsed time.

Although it didn't add anything to the code in this case, RAII is also used to avoid a resource leak: number of open files.