NOTE: THIS IS NOT A READY-TO-USE SCRIPT FOR TASKS OTHER THAN THE SETSWANA PROJECT. If you want to segment other speech data with this script it needs a bit scripting work to modify it to your needs.AUTOSEG a praat script for automated speech data segmentation
|
Description
- Full Descriptive Name: Automatic Segmentation and Markup in Praat Textgrid for Setswana(Autoseg 1.0)
- Description: This script is designed specifically for the research task of Setswana Project at Georgetown University (PI:Elizabeth Zsiga, NSF funded). It takes a sound file with a three (or in some cases 2 or 4) repetitions of a word, and automatically segments it into the three repetitions, with all the segments and targeted phonemes marked up in the textgrid(created by the script) with 6 tiers. It will also put the labels of the phoneme/words into the tier intervals based on the file name (naming convention). User needs to give a set of value of the durations of the four targeted phonemes. These duration values are statistically derived from the previously hand-marked data of 27 speakers of Setswana (Sengwato dialect) and are entered in the initial Form by the user. For details see the script. Currently I am updating the codes of this script to accommodate more variations in data format.
- Updates: Complete documentation of updates is provided below.
Documentation
1.Basic Operations
The basic operations for the users of this script is simple. Simply load all the sound files (each with 3 or 4 repetitions of a word in one of the four environments in Setswana) in the Praat Object window, select them all simultaneously, and run the script. After running the script, the text grids and the modified sound files (if any) will be saved to a user specified directory.
2.Parameter Settings
The script will give an initial form for setting the basic parameters. Here are the statistically derived parameters for the four environments based on the old files:
USER SETTING of the six numbers in the initial Form from the script:(BASED ON OLD FILES)
-N, -ORE: default setting(use numbers already in there)
-ISO: 0, 0 .783, 0, 0, 0 .058, 0.212
-XO: 0.142, 0.396, 0.08, 0.062, 0.1, 0.104
The default settings are based on N and ore cases. However, using these different settings for different environments means that one will have to select the sound files and run the script in groups with the same environment (instead of running all the files at once for a speaker). This could be a lot of extra work.
Alternatively, one might use the default setting for all files and running the script for all files for a speaker at once. The advantage of this method is that it saves a lot of time and energy. Even though the actual output will vary across environments, the time taken to adjust a 1s difference and a 0.2s difference are not drastically different.Overall, using the default setting across environments may save time.
Another alternative would be to build in settings for the script and the user does not have to differentiate the settings across environments. However, due to inter-speaker speaking rate differences, the model with high predicability power will be difficulty to build. But updates may be available in the near future that incorporate a relative ranking (instead of precise) of parameter setting.
3. Saving files
Because of the noise-detection/repair feature of the script, the script will operate on sound files and re-save them. Therefore, saving directory becomes important in order to not confuse yourself with old sound files and modified sound files. It is also important to keep in mind which of the Textgrids you have hand corrected, and which you have not corrected.
The simplest way to keep things clear is to save all the files to the same directory that the original sound files are located. This means:
(1) When running the script, enter the directory of the original sound files as your output directory;
(2) After you hand-corrected the Textgrid, save it in the original sound file directory (same directory) as well.
On PC, the Textgrid that is saved by the script will be exactly the same type as the Textgrid that is saved by hand (after you modified it). Therefore, the hand-corrected text grids will overwrite the script-generated text grids if they are saved in the same directory.
On Mac OS X, the the Textgrid that is saved by the script will be NOT the same type as the Textgrid that is saved by hand (after you modified it). The former is of Kind "simple text file" whereas the latter of Kind "Praat Annotation". Therefore, the hand corrected file will not overwrite the script-generated files even if they're in the same directory. Instead, there will be two text grids, one of "praat annotation" type and one of "simple text file" type. In this case, this helps keep track of which text grids have been hand-corrected (with two text grids files), and which have not (only one text grid generated by script). In the end, you can choose to delete all the script-generated text grids and keep all the hand-corrected ones.
Note: as pointed out by Paul Boersma, the creator of Praat, I accidentally inserted a space in the end of the line where the script saves the textgrid file, which is what caused this problem on Mac in the first place (UNIX systems are sensitive to spaces and capitalizations). This problem is solved in update 9.
The basic operations for the users of this script is simple. Simply load all the sound files (each with 3 or 4 repetitions of a word in one of the four environments in Setswana) in the Praat Object window, select them all simultaneously, and run the script. After running the script, the text grids and the modified sound files (if any) will be saved to a user specified directory.
2.Parameter Settings
The script will give an initial form for setting the basic parameters. Here are the statistically derived parameters for the four environments based on the old files:
USER SETTING of the six numbers in the initial Form from the script:(BASED ON OLD FILES)
-N, -ORE: default setting(use numbers already in there)
-ISO: 0, 0 .783, 0, 0, 0 .058, 0.212
-XO: 0.142, 0.396, 0.08, 0.062, 0.1, 0.104
The default settings are based on N and ore cases. However, using these different settings for different environments means that one will have to select the sound files and run the script in groups with the same environment (instead of running all the files at once for a speaker). This could be a lot of extra work.
Alternatively, one might use the default setting for all files and running the script for all files for a speaker at once. The advantage of this method is that it saves a lot of time and energy. Even though the actual output will vary across environments, the time taken to adjust a 1s difference and a 0.2s difference are not drastically different.Overall, using the default setting across environments may save time.
Another alternative would be to build in settings for the script and the user does not have to differentiate the settings across environments. However, due to inter-speaker speaking rate differences, the model with high predicability power will be difficulty to build. But updates may be available in the near future that incorporate a relative ranking (instead of precise) of parameter setting.
3. Saving files
Because of the noise-detection/repair feature of the script, the script will operate on sound files and re-save them. Therefore, saving directory becomes important in order to not confuse yourself with old sound files and modified sound files. It is also important to keep in mind which of the Textgrids you have hand corrected, and which you have not corrected.
The simplest way to keep things clear is to save all the files to the same directory that the original sound files are located. This means:
(1) When running the script, enter the directory of the original sound files as your output directory;
(2) After you hand-corrected the Textgrid, save it in the original sound file directory (same directory) as well.
On PC, the Textgrid that is saved by the script will be exactly the same type as the Textgrid that is saved by hand (after you modified it). Therefore, the hand-corrected text grids will overwrite the script-generated text grids if they are saved in the same directory.
On Mac OS X, the the Textgrid that is saved by the script will be NOT the same type as the Textgrid that is saved by hand (after you modified it). The former is of Kind "simple text file" whereas the latter of Kind "Praat Annotation". Therefore, the hand corrected file will not overwrite the script-generated files even if they're in the same directory. Instead, there will be two text grids, one of "praat annotation" type and one of "simple text file" type. In this case, this helps keep track of which text grids have been hand-corrected (with two text grids files), and which have not (only one text grid generated by script). In the end, you can choose to delete all the script-generated text grids and keep all the hand-corrected ones.
Note: as pointed out by Paul Boersma, the creator of Praat, I accidentally inserted a space in the end of the line where the script saves the textgrid file, which is what caused this problem on Mac in the first place (UNIX systems are sensitive to spaces and capitalizations). This problem is solved in update 9.
Updates
Update 9
- Date:2/15/13
- Description: This update includes several minor improvements. 1. Make distinctions of baka1, baka2, baka3 for individual repetitions (before, they were all labeled baka all the way through).2. build in some relative duration differences among different environments.3. the textgrid file will be saved as the same file type as when you save them manually. (on Mac, PC)
Update 8
- Date 2/6/13
- Description: The previous versions used a interval-number-checking (tier1) algorithm to decide whether the current sound file is noisy and how to repair it (by adding silence in the beginning). However, this approach has problems in that it can only deal with cases where the problem of the segmentation is in the beginning of the file (being to noisy in the beginning). Two other scenarios that it was not capable of handling include: (1)file is noisy in the end of the file; (2)file is noisy in both the beginning and the end of the file. (One other rare scenario is that file is noisy in the middle, which is currently not dealt with. ) The interval-number-checking algorithm therefore is not good enough to deal with these three scenarios. In this update I re-wrote the code of the Re-Segmentation procedures. The noisiness of the file is instead detected based on the observation that in a correctly segmented textgrid (tier 1), the initial interval and the last interval should be always labeled as silence, i.e., "" in PSL. Three scenario are dealt with using three similar procedures:(1)if the label of first interval!="", then add silence in the beginning (procedure ReSegment_B); (2)if the label of last interval!="", then add silence in the end(procedure ReSegment_E);(3)if both labels != "", then add silence in both(procedure ReSegment_BE). Empirical tests on 800 files proved the effectiveness/robustness of this method. In the praat info output, the number of intervals pre and post resegmentation is still printed for reference.
- Note to user: in the praat info output window when running the script, if the label of the first/last interval is labeled "" (blank), that is a desired status. Anything other than the blank would be a problem that the script will fix (and should become blank in the fixed text grid).
Update 7
- Date: 2/1/13
- Description: This version includes two updates: (1) Regressive segmentation for '-xo' cases: most cases with initial 'xo' is not properly segmented: the detected onset is always a little late (less than 1s) than the actual onset because of the weak energy in the onset fricative noise. In this version, the script will detect the 'xo' cases and regressively put the onset 0.7s backward. Empirical test showed that it is a good strategy. (note: before we thought the user settings of segment durations for 'xo' appears to be much shorter than 'ore' and 'N' cases; this turned out to be not the main reason that 'xo' cases seem to have shorter durations in the initial segments than the other two cases in the textgrid created by the script. This, instead, is mainly because the detected onset of 'xo' is delayed than the actual onset of the fricative.) (2) Updated source code for labeling so that an initial 'u' label is inserted on the segment tier for 'N' cases (un or um). Per the phonetic symbol input on Praat, this is labeled as \hs and appear as the 'u' sound.
Update 6
- Date: 1/31/13
- Description: This version includes important updates. (1) improved performance on issues of segmenting based on consonantal closure silence; (2) improved overall segmentation successful rate by implementing a sound-file editing procedure. In general, if a sound file is detected as noisy (N[interval of t1]=2n) after initial segmentation, the script will delete that segmented textgrid, add 1s of silence in the beginning of the sound file, save the new sound file, and perform the segmentation task again. In most cases the noisy problem is solved and the resulting textgrid is correct.
Update 5
- Date:1/20/13
- Description:The Autoseg script has been maintained and updated since its first release, and the current version has gone a long way since the initial version due to constant user feedback. The current version differs from file 12 in that it solved the problem where the segmentation is totally thrown off by the consonantal closure within a word, therefore segmenting a word into two words separated by the closure silence. The current version uses an updated intensity setting so that the consonantal closure will not be segmented as silence that separate words but correctly segment words by inter-word silences.
Update 4
- Date:1/13/13
- Description: Solved the file-name reading problem. Previous versions of the script can only read file names whose root word has four letters, such as baka, dila, but thrown off by words like bZala. This one can read file name of any length and correctly label the textgrid.
Update 3
- date:1/13/13
- Description: This version will check whether the file is noisy on top of performing the segmentation task outlined in file 9 and previous versions. Noisy files are defined as having usually non-speech noise(usually in the beginning of the sound file) that mess up with the segmentation by the script. This version first runs a check on the files and will return the noisy file names in the praat info window so that they can be hand cleaned.
Update 2
- Date: 1/6/13
- Description: solved the saving directory/path problem. This one can save both in Windows and in Mac OS X. The user needs to give the exact path of the intended folder for saving the text grid files. Run the script to see more instructions on the initial form.
Update 1
- Date: 11/28/12
- Description: Update on File6. This version: works for any number of repetition in the word list, either 2,3,4, or any other number.