URL (Uniform Resource Locator) 中文名为统一资源定位符, 有时也被俗称为网页地址, 表示为互联网上的资源,如网页或者 FTP 地址.
URL url = new URL("http://www.smallcpp.cn/archives.html");
URLConnection uc = url.openConnection();
InputStream is = uc.getInputStream();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024 * 8];
int len = 0;
while ((len = is.read(buffer)) != -1) {
bos.write(buffer, 0, len);
}
System.out.println(bos.toString());
bos.close();
is.close();
ContentEncoding 为 text/html 或其他类似的文本直接打印出来就行, 如果网页是经过 gzip 压缩的话, 需要再转化一下.
String url = "http://www.baidu.com";
URL cumtURL = new URL( url);
HttpURLConnection cumtConnection = (HttpURLConnection)cumtURL.openConnection();
cumtConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13");
System.out.println(cumtConnection.getContentEncoding());
InputStream urlStream = new GZIPInputStream(cumtConnection.getInputStream());
BufferedReader reader = new BufferedReader(new InputStreamReader(urlStream,"gb2312"));
String line = "";
while((line = reader.readLine()) != null) {
System.out.println(line);
}
reader.close();
urlStream.close();
conn.setRequestProperty("Range", "bytes=" + startIndex + "-" + endIndex);