If you’re interested in the scansion of the texts, you can use the source code of the pages as data. Each syllable is tagged as a span, with classes short/long, foot#, word#, wordend, hemi1/hemi2 (before or after main caesura), footend. Note that syllables that technically contain more than one word have to be treated as a single word.
Lines are divs, and may be tagged with classes: speech, speaker’s name, new paragraph (for formatting only).
I’ll be uploading csv versions of the scansion. Here’s a zip archive with Iliad books 1-12: iliadcsv. These should open in your spreadsheet app of choice. Each row of the spreadsheet is a syllable. See the header row for info on the fields.
All the Iliad scansion is available as a set of csv files: IliadAllCSV. You can open them up in a spreadsheet app, convert to sql etc.
I would not recommend using these for caesura statistics: some of the caesuras do need correcting, but more generally, I’m of the view that locating one main caesura in each line is inappropriate. The main reason I have included them here is so that my work may be of use to students who will be required to do so. Some lines do have one natural break, but some have none, some have two (Kirk’s “threefolders”), and a few have 3. Often these breaks are pretty self-evident, but sometimes the performer has to decide (for instance) between a sense break or a formula break. It’s not clear to me that there’s any way to make a scientific judgment in all cases, and I wouldn’t like these data to contribute to any circular arguments (i.e. arguments based on these data will only confirm and explain subjective choices made by me). I should note too that my observation of caesura(s) in performance is changing as I progress through the project and as my feel for the rhythm develops, so you should not look to my reading (at this point) for a consistent interpretation of pauses. I hope to have more to say on this later.