Integrating The Google Translate API Into a ColdFusion ... - lsntap

27 downloads 123 Views 569KB Size Report
Dec 31, 2012 ... should not be translated into a new piece of content. To do this we integrated. Google Translate and wrote a script that: ○ Takes and sends a ...
Integrating The Google Translate API Into a ColdFusion-based system Scope of the work ILAO uses a custom content management system (CMS). Our entire website platform, including the CMS, is written in ColdFusion. The first step in our Spanish language project was to build a mechanism to copy content in English, translating some fields while still preserving the fields that should not be translated into a new piece of content. To do this we integrated Google Translate and wrote a script that: ● ● ● ●

Takes and sends a piece of English content to Google for translation and creates a new piece of content from the translation Translates the English metadata that should be translated (title, description, keywords) and added the translations to the new content Copies remaining English metadata that shouldn’t be translated (problem codes, jurisdictions, site settings) to the new piece of content Adds a field to relate as equivalent pages the English and Spanish content so that when a change is made in one, the content management team knows to update the corresponding content.

In addition, we integrated Google Translate into the module that allows partner organization’s to create a profile; the integration then to allow legal aid agencies to translate that profile. This profile information is then used in our legal referral section of AyudaLegalIL.org.

Integrating google translate Setting up An account in Google The Google Translate API is only available as a paid service. It currently costs $20 per million characters translated. Google does not seem to count HTML markup as part of the character count. For reference, ILAO spent less than $100 in year one, translating about 200 pieces of legal content. To set up an account, create a project in the API console (https://code.google.com/ apis/console) and turn on the Translate API . You’ll be required to then set up billing

Last Updated: December 31, 2012

using a credit card. You can also then set daily billable limits and API access limits from within the API console. Once billing is enabled, a key needs to be generated. This key needs to be passed to Google with each request made. The key can be changed at any time by accessing the account on Google. (You can also set it up using OAuth 2.0 access). The Translation Function The Google Translate API uses REST to accept requests and return translations in JSON format. ILAO created a function in ColdFusion to pass our content to Google and get back the translated results. The function accepts four arguments: qString, which is what we send to Google to translate; the source language, in ISO 639-1 format; the target language, in ISO 639-1 format, and the format of our text. We also stored our key from Google as a variable ("this.key") so that we can pass the key along with our request. A full list of ISO 639-1 codes can be found here: http://en.wikipedia.org/wiki/List_of_ISO_6391_codes

Our function sends an HTTP request to Google, as a POST using a GET override, and Google returns a JSON object. We deserialize the JSON object to a string and then can use it in the rest of our application.

Last Updated: December 31, 2012

Character Count Issues The Translate API cannot handle large text requests. Larger requests need to be broken down into smaller chunks, sent over to Google and then be concatenated back together. To accommodate issues with character count, we created a long text function that parses our text on common markers, to preserve as much context as possible. The function parses content first on paragraph marks. If the length of the string is still too long, we break that string according to line breaks. If the string is still too long, we’ll break on list element tags. Note: while this code runs in ColdFusion, it does make use of the Java String class, which is why it is necessary to JavaCast the content variable as a Java String object.

Last Updated: December 31, 2012

name="format" type="string" required="false"



Last Updated: December 31, 2012

Sample API Request and Translations Below is a sample request of a question and answer pair from one of our pieces of content. Our system sent the following to Google Translate; this example has no HTML markup https://www.googleapis.com/language/translate/v2?key={ourKey} &format=html&source=en&target=es&q=What will get decided in a divorce case?

And Google Translate returned: { "data": { "translations": [ { "translatedText": "¿Qué se decidió en un caso de divorcio?" } ] } }

Our system sent the following partial answer to Google Translate that contains Flash HTML markup: https://www.googleapis.com/language/translate/v2?key={ourKey} &format=html&source=en&target=es&q=

Divorce consists of four main areas:

  • Dissolution of Marriage: This is the legal term for divorce. The Court will end your marriage and all the legal benefits that are a part of your marriage.


  • And Google Translate returned: { "data": { "translations": [ { "translatedText": "\u003cFONT COLOR=\"#000000\" LETTERSPACING=\"0\" KERNING=\"1\"\u003e\u003cB\u003eDisolución del Matrimonio:\u003c/b\u003e \u003cFONT KERNING=\"0\"\u003eEste es el término legal para el divorcio. La Corte pondrá fin a su matrimonio y todos los beneficios legales que son parte de su matrimonio.\u003c/font\u003e\u003c/font\u003e \u003c/ LI\u003e" } ] } }

    Note that the returned text contains many \ and codes that begin with u00 text. The \ are used by JSON to escape certain characters like double and single quotations. The u00 codes are Unicode characters for special characters like #, that need to be encoded when exchanged between systems.

    Content System Screenshots A standard English article's Content Information Screen (Content ID 8609).

    Last Updated: December 31, 2012

    Last Updated: December 31, 2012

    After our single-click translation to Spanish, the page is mirrored with the title, description, keywords, and content translated and all other elements kept in English. For the content, all HTML markup is preserved. An automatic equivalent page link is generated. In addition, all problem code and jurisdiction settings are also copied.

    Last Updated: December 31, 2012

    Last Updated: December 31, 2012

    Organization profiles using google translate Below is a screenshot for organization administrators to create foreign language descriptions. The organization administrator can pick a language from the drop down and the description box is updated with the translation of their English description. If they are editing an existing description, that displays instead of the Google translated version of the English description. Following any update, an email notification is automatically generated and sent to ILAO's Spanish Content and Outreach Coordinator to go in and review the translation.

    Last Updated: December 31, 2012

    For Further Information For further information, please contact ILAO’s Director of Technology Development, Gwen Daniels at [email protected] or ILAO’s Program Director, Teri Ross at [email protected].

    Last Updated: December 31, 2012