Programming Assignment -- CS 7810

The following resources will help you learn the basics of Pthreads programming (details are also provided about MPI and TM). After experimenting with the toy example programs, create the following non-trivial multi-threaded application with shared-memory (using Pthreads). This is due on Friday February 27th, 2009.

You are being provided with the following template C program (here's the corresponding Java program -- you are welcome to write your program in Java as well, but don't have to). This program sets up a connection with a database server and invokes mysql queries to gather a large amount of stock price data. In essence, this template program only performs some of the I/O required of your program. It is up to you to take this data and populate your own data structures. In the template program, the function getTickers() shows you how to collect the names of each stock and the function getSingleStockData() shows you how to collect daily stock prices for a given stock.

  1. The first problem for you is to produce a multi-threaded version of the basic I/O program. You must fork off N threads that first read the list of stock names from the database server and populate your data structures. You must then fork off M threads that read daily stock prices for a year for each of these stock names and again populate your database. Experimentally determine the optimal values for N and M for some state-of-the-art multi-core system. Note that this problem deals with I/O parallelization with CPU threads, not computation parallelization.
  2. Now perform the following computation on the data. For each stock, perform a running average (over the last 5 days) of its stock price for each day of the year. For each day of the year, identify (and print) the stock that on that day has the highest percentage gain over its 5-day average. Break up this entire computation into Q threads such that performance is optimized. Note that this deals with computation parallelization, but largely deals with only reads to shared variables. There is almost no read-write sharing.
  3. Finally, conjure up another computation on this database that does involve some non-trivial read-write sharing, i.e., you should acquire locks when accessing some critical section and there should be a producer-consumer relationship between some of the threads. Again, compute the optimal number of threads R.

What you need to submit back to me (email me the tarball if that is most convenient): the C programs (appropriately titled and commented), a README file that provides details on the application, how to run it, and any analysis that you may have carried out. Be detailed when discussing the performance observations and describing what problem you solved in part 3 and how it involved non-trivial synchronization.


Resources:

Optional reading: