Translating text with Azure cognitive services
Some time ago, I used the Bing Translator API to help create localization for some of our products. As Microsoft recently retired the Data Market used to provide this service it was high time to migrate to the replacement Cognitive Services API hosted on Azure. This article covers using the basics of Azure cognitive services to translate text using simple HTTP requests.
Getting started
I'm going to assume you've already signed up for the Text Translation Cognitive Services API. If you haven't, you can find a step by step guide on the API documentation site. Just as with the original version, there's a free tier where you can translate 2 million characters per month.
Once you have created your API service, display the Keys page and copy one of the keys for use in your application (it doesn't matter which one you choose).
Remember that these keys should be kept secret. Don't paste them in screenshots as I have above (unless you regenerated the key after taking the screenshot!), don't commit them to public code repositories - treat them as any other password. "Keep it secret, keep it safe".
Creating a login token
The first thing we need to do generate an authentication token.
We do this by sending a POST
request to Microsoft's
authentication API along with a custom
Ocp-Apim-Subscription-Key
header that contains the API key we
copied earlier.
Note: When using the
HttpWebRequest
object, you must set theContentLength
to be zero even though we're not actually setting any body content. If the header isn't present the authentication server will throw a411
(Length Required) HTTP exception.
Assuming we have passed a valid API key, the response body will contain a token we can use with subsequent requests.
Tokens are only valid for 10 minutes and it is recommended you renew these after 8 or so minutes. For this reason, I store the current time so that future requests can compare the stored time against the current and automatically renew the token if required.
private string _authorizationKey;
private string _authorizationToken;
private DateTime _timestampWhenTokenExpires;
private void RefreshToken()
{
HttpWebRequest request;
if (string.IsNullOrEmpty(_authorizationKey))
{
throw new InvalidOperationException("Authorization key not set.");
}
request = WebRequest.CreateHttp("https://api.cognitive.microsoft.com/sts/v1.0/issueToken");
request.Method = WebRequestMethods.Http.Post;
request.Headers.Add("Ocp-Apim-Subscription-Key", _authorizationKey);
request.ContentLength = 0; // Must be set to avoid 411 response
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
_authorizationToken = this.GetResponseString(response);
_timestampWhenTokenExpires = DateTime.UtcNow.AddMinutes(8);
}
}
Using the token
For all subsequent requests in this article, we'll be sending
the token with the request. This is done via the Authorization
header which needs to be set with the string Bearer <TOKEN>
.
Getting available languages
The translation API can translate a reasonable range of
languages (including for some reason Klingon), but it can't
translate all languages. Therefore, if you're building a
solution that uses the translation API it's probably a good idea
to find out what languages are available. This can be done by
calling the GetLanguagesForTranslate
service method.
Rather annoyingly the translation API doesn't use straightforward JSON objects but instead the ancient XML serialization dialect (it appears to be a WCF service rather than newer WebAPI) which seems an odd choice in this day and age of easily consumed JSON services. Still, at least it means I can create a self contained example project without needing external packages.
First we create the HttpWebRequest
object and assign our
Authorization
header. Next, we set the value of the Accept
header to be application/xml
. The API call actually seems to
ignore this header and always return XML regardless, but at
least if it changes in future to support multiple outputs our
existing code is explicit in what it wants.
The body of the response contains a XML document similar to the following
<ArrayOfstring xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<string>af</string>
<string>ar</string>
<string>bn</string>
<!-- SNIP -->
<string>ur</string>
<string>vi</string>
<string>cy</string>
</ArrayOfstring>
You could parse it yourself, but I usually don't like the
overhead of having to work with name-spaced XML documents.
Fortunately, I can just use the DataContractSerializer
to
parse it for me.
In order to use the
DataContractSerializer
class you need to have a reference toSystem.Runtime.Serialization
in your project.
public string[] GetLanguages()
{
HttpWebRequest request;
string[] results;
this.CheckToken();
request = WebRequest.CreateHttp("https://api.microsofttranslator.com/v2/http.svc/GetLanguagesForTranslate");
request.Headers.Add("Authorization", "Bearer " + _authorizationToken);
request.Accept = "application/xml";
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
results = ((List<string>)new DataContractSerializer(typeof(List<string>)).ReadObject(stream)).ToArray();
}
}
return results;
}
Getting language names
The previous section obtains a list of ISO language codes, but
generally you would probably want to present something more
friendly to end-users. We can obtain localized language names
via the GetLanguageNames
method.
This time we need to perform a POST
, and include a custom body
containing the language codes we wish to retrieve friendly names
for, along with a query string argument that specifies which
language to use for the friendly names.
The body should be XML similar to the following. This is
identical to the output of the GetLanguagesForTranslate
call
above.
<ArrayOfstring xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<string>af</string>
<string>ar</string>
<string>bn</string>
<!-- SNIP -->
<string>ur</string>
<string>vi</string>
<string>cy</string>
</ArrayOfstring>
The response body will be a string array where each element
contains the friendly language name of the matching element from
the request body. The following example is a sample of output
when German (de
) friendly names are requested.
<ArrayOfstring xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<string>Afrikaans</string>
<string>Arabisch</string>
<string>Bangla</string>
<!-- SNIP -->
<string>Urdu</string>
<string>Vietnamesisch</string>
<string>Walisisch</string>
</ArrayOfstring>
Previously we used the DataContractSerializer
deserialize the
response body and we can use the same class to serialize the
request body too. We also have to specify the Content-Type
of
the data we're transmitting. And of course make sure we include
the locale
query string argument in the posted URI.
If you forget to set the
Content-Type
header then according to the documentation you'd probably expect it to return 400 (Bad Request). Somewhat curiously, it returns 200 (OK) with a 500-esque HTML error message in the body. So don't forget to set the content type!
public string[] GetLocalizedLanguageNames(string locale, string[] languages)
{
HttpWebRequest request;
string[] results;
DataContractSerializer serializer;
this.CheckToken();
serializer = new DataContractSerializer(typeof(string[]));
request = WebRequest.CreateHttp("https://api.microsofttranslator.com/v2/http.svc/GetLanguageNames?locale=" + locale);
request.Headers.Add("Authorization", "Bearer " + _authorizationToken);
request.Accept = "application/xml";
request.ContentType = "application/xml"; // must be set to avoid invalid 200 response
request.Method = WebRequestMethods.Http.Post;
using (Stream stream = request.GetRequestStream())
{
serializer.WriteObject(stream, languages);
}
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
results = (string[])serializer.ReadObject(stream);
}
}
return results;
}
Translating phrases
The final piece of the puzzle is to actually translate a string.
We can do this using the Translate
service method, which is a
simple enough method to use - you pass the text, source language
and output language as query string parameters, and the
translation will be returned in the response body as an XML
string.
You can also specify a category for the translation. I believe this is for use with Microsoft's Translation Hub so as of yet I haven't tried experimenting with this parameter.
The following example is a the response returned when requesting
a translation of Hello World!
from English (en
) to German
(de
).
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">Hallo Welt!</string>
The request is similar to other examples in this article. The
only point to note is that as the text
query string argument
will contain user enterable content, I'm encoding it using
Uri.EscapeDataString
to account for any special characters.
public string Translate(string text, string from, string to)
{
HttpWebRequest request;
string result;
string queryString;
this.CheckToken();
queryString = string.Concat("text=", Uri.EscapeDataString(text), "&from=", from, "&to=", to);
request = WebRequest.CreateHttp("https://api.microsofttranslator.com/v2/http.svc/Translate?" + queryString);
request.Headers.Add("Authorization", "Bearer " + _authorizationToken);
request.Accept = "application/xml";
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
result = (string)_stringDataContractSerializer.ReadObject(stream);
}
}
return result;
}
Other API methods
The GetLanguagesForTranslate
, GetLanguageNames
and
Translate
API methods above describe the basics of using the
translation services. The service API does offer additional
functionality, such as the ability to translate multiple strings
at once or to return multiple translations for a single string
or even to try and detect the language of a piece of text. These
are for use in more advanced scenarios that what I'm currently
interested in and so I haven't looked further into these
methods.
Sample application
The code samples in this article are both overly verbose (lots
of duplicate setup and processing code) and functionally lacking
(no checking of status codes or handling of errors). The
download sample accompanying this article includes a more robust
TranslationClient
class that can be easily used to add the
basics of the translation APIs to your own applications.
Note that unlike most of my other articles/samples this one won't run out the box - the keys seen in the application and screenshots have been revoked, and you'll need to substitute the ones you get when you created your service using the Azure Portal.
Update History
- 2017-05-05 - First published
- 2020-11-22 - Updated formatting
Related articles you may be interested in
Downloads
Filename | Description | Version | Release Date | |
---|---|---|---|---|
AzureTranslationDemo.zip
|
Sample project for the Translating text with Azure cognitive services blog post. |
05/05/2017 | Download |
Leave a Comment
While we appreciate comments from our users, please follow our posting guidelines. Have you tried the Cyotek Forums for support from Cyotek and the community?
Comments
Tomasz
#
Another great article. I really enjoyed reading all articles about WInForms, especially about user controls. They helped me a lot in past. This one is about Azure, hopefully You'll add more articles about it in the future.
Best regards,