This README file contains the instructions for using the "SSPNet Conflict Corpus". If you use the data, please cite the following article: S.Kim, M.Filippone, F.Valente and A.Vinciarelli, “Predicting the Conflict Level in Television Political Debates: an Approach Based on Crowdsourcing, Nonverbal Communication and Gaussian Processes“, Proceedings of ACM International Conference on Multimedia, pp. 793-796, 2012. For any questions, please contact Alessandro Vinciarelli (vincia@dcs.gla.ac.uk) ----- The Corpus includes the following material: - videodata.zip Content: Archive of the 1430 video clips of the Corpus Format: zip archive including flv videos Size: 2.67 GB - audiodata.zip (audio channel of the 1430 video clips of the Corpus) Content: Audio channel of the 1430 Corpus clips Format: zip archive including wav recordings Size: 3.62 GB - participantslist.csv Content: list of debate participants Format: csv Size: 13 kB - manualdiarization.zip Content: Speaker segmentation of the Corpus clips (one file per clip) Format: csv Size: 372 kB - conflictlevel.csv Content: conflict level of the Corpus clips (groundtruth) Format: csv Size: 32 kB ----- Naming conventions The clips are named in the following way: xx-yy-zz_start_end.aaa where xx year when the debate was televised (e.g., '06' corresponds to 2006) yy month when the debate was televised (e.g., '04' corresponds to April) zz day of the month when the debate was televised (e.g., '03' corresponds to the 3rd day of the month) start start time of the clip in the debate in seconds (e.g., 318 means that the clip starts at second 318 of the debate televised on day zz of mont yy in year zz). end end time of the clip in the debate in seconds (same convention as the start time) aaa extension ('wav' for audio and 'flv' for video) All clips for which the prefix xx-yy-zz is the same belong to the same debate. Clips for which the prefix xx-yy-zz is different belong to different debates. ----- Format of "participantslist.csv' Column 1: speaker code in the format spk_nn (nn is an ID) Column 2: name of the speaker in the format "firstname lastname" Column 3: gender of the speaker (male or female) Column 4: type of speaker (participant or moderator) Column 5: debate where the participant appears in the format xx-yy-zz (see naming conventions above) The columns after the fifth include further debates where the speaker appears (if any). ----- Format of manual diarization csv files Diarization files are named using the same conventions as the clips (see above), but have extension 'csv' Column 1: Start time of the turn Column 2: End time of the turn Column 3: speaker code (see format of participantslist.csv) Column 4: speaker code (only in case of overlapping speech) ----- Format of 'conflictlevel.csv' Groundtruth associated to the clip: Column 1: Name of the clip in format xx-yy-zz_start_end.aaa (see naming conventions above) Column 2: Groundtruth conflict level (float number)