{"id":18736,"date":"2018-10-24T08:59:34","date_gmt":"2018-10-24T05:59:34","guid":{"rendered":"https:\/\/tentaclii.wordpress.com\/?p=18736"},"modified":"2023-02-09T20:41:44","modified_gmt":"2023-02-09T20:41:44","slug":"cats-and-dogs-as-an-automatic-audiobook","status":"publish","type":"post","link":"https:\/\/jurn.link\/tentaclii\/index.php\/2018\/10\/24\/cats-and-dogs-as-an-automatic-audiobook\/","title":{"rendered":"&#8220;Cats and Dogs&#8221; as an automatic audiobook"},"content":{"rendered":"<p>A small experiment, to demonstrate and pin down a workflow for a state-of-the-art &#8216;expressive audiobook&#8217; reading in 2018, done by affordable consumer text-to-speech software and voice.<\/p>\n<p><strong>Result:<\/strong> <a href=\"https:\/\/archive.org\/details\/cats-and-dogs-hpl-1926-tts\">The final audio file<\/a> (42 minutes).<\/p>\n<p><strong>Input text:<\/strong> a difficult one, the complex essay &#8220;Cats and Dogs&#8221; (1926) by H.P. Lovecraft. Pulp fiction, with simple-sentences and obvious words, might work far better.  But this was a stress-test.<\/p>\n<p><strong>Voice used:<\/strong> Ivona &#8216;Brian&#8217; (British English, 22hz, about $50). &#8216;Brian&#8217; does not flow across words as smoothly and blandly as the default Windows 8 Microsoft Zira does. As a result Brian sometimes has occasional mis-emphasis of words and a slight slurring, yet is far more expressive in an audiobook than Zira.<\/p>\n<p><strong>1.<\/strong> The text was read by &#8216;Brian&#8217; in the text-to-speech software TextAloud 4, with the text read out to a standard MP3 file. <\/p>\n<p>* Speed: Normal.<br \/>\n* Pitch: -5 (to deepen the voice slightly).<br \/>\n* Volume: 100% (perhaps too high, you might also try 70%).<br \/>\n* Pauses between sentences: 0.7 seconds (default in TextAloud is 0.5).<br \/>\n* Pauses between paragraphs: two seconds.<\/p>\n<p>(<em>Why not use the free Balabolka reader? Because it doesn&#8217;t offer pause adjustment<\/em> <em>Update: it now offers <a href=\"https:\/\/www.jurn.link\/tentaclii\/oldimages\/example.jpg\">markup<\/a> to add pauses and pitch shifts<\/em>. Further update: Now you can also set universal pauses).<\/p>\n<p><strong>2.<\/strong> I loaded the resulting MP3 output file into the free audio editor Audacity.  An Equalisation filter was run to try to cut the 5Khz &#8211; 7Khz sibilance. The same preset tried to slightly boost 1KHz &#8211; 5KHz, for overall speech intelligibility.<\/p>\n<p><a href=\"https:\/\/www.jurn.link\/tentaclii\/oldimages\/sibil.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.jurn.link\/tentaclii\/oldimages\/sibil.jpg?w=529\" alt=\"\" width=\"529\" height=\"230\" class=\"aligncenter size-large wp-image-18737\" \/><\/a><\/p>\n<p><strong>3.<\/strong> The simple free <a href=\"http:\/\/www.digitalfishphones.com\/main.php?item=2&amp;subItem=5\">Spitfish De-esser<\/a> was then run inside Audacity, to further reduce sibilance. (Select All | Effect | Spitfish | Apply | Close). This runs far more quickly than Audacity&#8217;s native de-essing filter, as well as being simpler to control. You may have problems seeing the download button so here is a direct <a href=\"http:\/\/www.digitalfishphones.com\/binaries\/the_fish_fillets_v1_1.zip\">.ZIP download<\/a>.<\/p>\n<p><a href=\"https:\/\/www.jurn.link\/tentaclii\/oldimages\/spitfish.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.jurn.link\/tentaclii\/oldimages\/spitfish.jpg\" alt=\"\" width=\"455\" height=\"323\" class=\"aligncenter size-full wp-image-18738\" \/><\/a><\/p>\n<p><strong>4.<\/strong> Ran the Effects | Limiter, using its default &#8216;Soft Limit&#8217; preset.<\/p>\n<p><strong>5.<\/strong> Added Reverb filter, with its default &#8216;Voice I&#8217; preset.<\/p>\n<p><strong>6.<\/strong> Ran the Spitfish De-esser again, to make a final attempt to reduce the remaining sibilance.  Same settings as before.<\/p>\n<p><strong>7.<\/strong> Saved as an MP3, 320bk\/s quality, resulting in a 50Mb file for a 42 minute reading.<\/p>\n<p>Incidentally, it&#8217;s apparently possible to &#8220;chain&#8221; these steps (like a Photoshop Action) in Audacity, as a preset, and then play them back automatically.  I couldn&#8217;t find that option in my Audacity, but that&#8217;s perhaps because I have an older version.<\/p>\n<p><strong>Results:<\/strong><\/p>\n<p>The results were fairly listenable, and (once the raspy &#8216;synthetic voice sibilance&#8217; was reduced) definitely seems like an advance on previous robo-voices.  But the test result was certainly not ideal, due to the &#8216;Brian&#8217; voice&#8217;s unnatural unexpected stresses placed on certain words and the slurring of others.  It&#8217;s rather like listening to a &#8216;sticky&#8217;\/&#8217;wobbly&#8217; old cassette tape from the 1980s, and becomes rather wearing after a while.  It can result in an aural equivalent of the motion-sickness that one encounters in many videogames.  <\/p>\n<p>Perhaps there may be some search-and-replace script that automatically tweaks a text so that &#8216;Brian&#8217; reads it better, but I couldn&#8217;t find one. Simple and immediate global fixes are:<\/p>\n<p>* Change Mr. and Mrs. to <em>Mister<\/em> and <em>Misses<\/em>.<br \/>\n* Change capitalised acronyms such as NASA to <em>Nasser<\/em>, or they will be said &#8216;En-Ay-Ess-Ay&#8217;.<br \/>\n* Change crunched up hyphenation, such as and &#8220;then-as you all know-he did something&#8221; to &#8220;then &#8211; as you all know &#8211; he did something&#8221;.<\/p>\n<p>It also helps to have a <a href=\"http:\/\/www.customsolutions.us\/cleanup\/\">good Text Cleaner<\/a> software running when you copy-paste your text into TextAloud, which will fix line-wrapping and other problems.<\/p>\n<p>There are of course various machine-learning services, such as Amazon Parrot, which claim to offer smoother reading voices for text-to-speech.  But they appear to be for big-budget developers, are Cloud-based, and it seems unlikely that owners such as Amazon will ever allow them to be unleashed on the making of long audiobooks (which would compete with Audible). What&#8217;s being tested above are the tools available to consumers for less than $100 total.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A small experiment, to demonstrate and pin down a workflow for a state-of-the-art &#8216;expressive audiobook&#8217; reading in 2018, done by &hellip;<\/p>\n<p><a href=\"https:\/\/jurn.link\/tentaclii\/index.php\/2018\/10\/24\/cats-and-dogs-as-an-automatic-audiobook\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23],"tags":[],"class_list":["post-18736","post","type-post","status-publish","format-standard","hentry","category-podcasts-etc"],"_links":{"self":[{"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/posts\/18736","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/comments?post=18736"}],"version-history":[{"count":1,"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/posts\/18736\/revisions"}],"predecessor-version":[{"id":58537,"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/posts\/18736\/revisions\/58537"}],"wp:attachment":[{"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/media?parent=18736"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/categories?post=18736"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jurn.link\/tentaclii\/index.php\/wp-json\/wp\/v2\/tags?post=18736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}