But unfortunately that's not what happens in the real world. Instead of nice composable services you usually get web sites targeted at people not machines. That means you need to automate what would typically be browser conversations with various websites to get the data you need. And then you need to deal with the format of the resulting data. If you're lucky, the data may be structured in a comma separated value (CSV) text file. But undoubtedly you'll have to parse unstructured text representations of a report or get the data out of an excel spreadsheet or even a PDF. It's not pretty.
But lets forget that unpleasantness for the moment and deal with the first problem you'll encounter in trying to automate a browser conversation, getting by the various authentication mechanisms. Let's look at BASIC authentication first. It's pretty common and well supported in the Java-based Jakarta Commons HttpClient library and the Ruby Net::HTTP Standard Library.
BASIC Authentication
To start things off I created a simple Java-based "Dynamic Web Project" using the Eclipse Web Tools Project (WTP) plugins. To keep things simple I created a servlet that returns a string. Then I configured the application to protect the url for that servlet with basic authentication in web.xml. Then in Tomcat's server.xml I modified the context element for my web app to include a reference to the default Tomcat in-memory user database:
<Context docBase="MyWebApp" path="/MyWebApp" [...]With that in place, and the server running we're able to write a couple of methods to access the servlet. In Java:
<Realm className="org.apache.catalina.realm.UserDatabaseRealm"
debug="0" resourceName="UserDatabase"/>
</Context>
public static void basicAuthDemo()In this example I limited HttpClient's default authentication mechanism to BASIC. I know what my target system uses so why complicate matters with DIGEST or NTLM? Then it was a simple matter of defining the credentials and telling HttpClient to automatically use them and then executing the Http GET method. It looks surprisingly similar in Ruby:
throws HttpException, IOException{
HttpClient client = new HttpClient();
List<String> authPrefs = new ArrayList<String>();
authPrefs.add(AuthPolicy.BASIC);
client.getParams().setParameter(
AuthPolicy.AUTH_SCHEME_PRIORITY, authPrefs);
client.getState().setCredentials(
new AuthScope("localhost", 8080, "localhost:8080"),
new UsernamePasswordCredentials("tomcat", "tomcat")
);
GetMethod get = new GetMethod(
"http://localhost:8080/MyWebApp/myservlet");
get.setDoAuthentication(true);
client.executeMethod(get);
System.out.println(get.getResponseBodyAsString());
get.releaseConnection();
}
def basic_auth_demoThe difference between the two implementations is that the Java HttpClient is doing some housekeeping for you. You define a scope for your authentication and as long as you GetMethod is configured to do authentication it will automatically pick up any necessary credentials from the HttpClient instance. In Ruby you need to set the credentials on the GetMethod explicitly.
url = URI.parse('http://localhost:8080/MyWebApp/myservlet')
get = Net::HTTP::Get.new(url.path)
get.basic_auth('tomcat','tomcat')
response = Net::HTTP.new(url.host, url.port).start do |http|
http.request(get)
end
puts response.body
end
FORM based authentication
The next most common method is form-based authentication. When you make a request for a web resource, the response contains a cookie that identifies your session on the server. If that session indicates that you haven't been authenticated yet, then you're redirected to a form to enter your id and password. You fill in the values and then submit the form. Now assuming you entered the right credentials your session on the server will indicate that you're authenticated and every subsequent request (which includes the cookie to identify your now authenticated session) will execute normally. There are variations on this theme that may add more than one cookie so just be sure to capture the cookies and continue to submit them on every request in your conversation.
In order to test this I modified the web.xml file for my Java web app:
<login-config>and added the requisite JSP pages. The login.jsp contains a form that looks like this:
<auth-method>FORM</auth-method>
<form-login-config>
<form-login-page>/login.jsp</form-login-page>
<form-error-page>/login-error.jsp</form-error-page>
</form-login-config>
</login-config>
<form method="POST" action="j_security_check">So the Java code to access the servlet using form based authentication looks like this:
Username:<input type="text" name="j_username"><br/>
Password:<input type="password" name="j_password"><br/>
<input type=submit value="Login">
</form>
public static void formAuthDemo()The Ruby code looks like this:
throws IOException, HttpException {
HttpClient client = new HttpClient();
// make the initial get to get the JSESSION cookie
GetMethod get = new GetMethod(
"http://localhost:8080/MyWebApp/myservlet");
client.executeMethod(get);
get.releaseConnection();
// authorize
PostMethod post = new PostMethod(
"http://localhost:8080/MyWebApp/j_security_check");
NameValuePair[] data = {
new NameValuePair("j_username", "tomcat"),
new NameValuePair("j_password", "tomcat")
};
post.setRequestBody(data);
client.executeMethod(post);
post.releaseConnection();
//resubmit the original request
client.executeMethod(get);
String response = get.getResponseBodyAsString();
get.releaseConnection();
System.out.println(response);
}
def form_auth_demoAgain, the two implementations are remarkably similar. The biggest difference is that the Java HttpClient library is again doing the housekeeping, by tracking and automatically resubmitting the cookies for you. In the Ruby code you have to fetch the cookie yourself from the response header and set the HTTP header for all future requests.
res = Net::HTTP.new('localhost', 8080).start do |http|
#make the initial get to get the JSESSION cookie
get = Net::HTTP::Get.new('/MyWebApp/myservlet')
response = http.request(get)
cookie = response.response['set-cookie'].split(';')[0]
#authorize
post = Net::HTTP::Post.new('/MyWebApp/j_security_check')
post.set_form_data({'j_username'=>'tomcat', 'j_password'=>'tomcat'})
post['Cookie'] = cookie
http.request(post)
#resubmit the original request
get['Cookie'] = cookie
response = http.request(get)
puts response.body
end
end
So there you go, you're past the website authentication and are ready to make whatever requests you need to get the data you require.
5 comments:
You can find more information on using Ruby for HTTP-Requests here:
http://www.juretta.com/log/2006/08/13/ruby_net_http_and_open-uri/
Thanks. I was able to extract what I needed to get my specific cookie problem solved.
hi,
Im stuck with a prob. I want to download some data through yahoo. It says that data will be downloaded only if u r logged in to yahoo acnt. How can we do this using a java prog.
I ve written the code for downloading. But how to show them that I am logged in through an account??
Can you please help me. please mail me at saurabh.chawdhary[AT]gmail.com`
Thank you for posting this article, it helped me complete at task at my job.
How would you do form based authentication from a Cocoa/Obj-C based client?
Post a Comment