位置: IT常识 - 正文

Building a HTTP Proxy

编辑:rootadmin
Web ProxyBuilding a HTTP Proxy Frequently Asked Questions Download the testing script Download the s

推荐整理分享Building a HTTP Proxy,希望有所帮助,仅作参考,欢迎阅读内容。

文章相关热门搜索词:,内容如对您有帮助,希望把文章链接给更多的朋友!

Building a HTTP Proxy

Frequently Asked Questions Download the testing script Download the sample Makefile

Contents

1 Building a HTTP Proxy 1.1 Overview 1.2 Introduction: The Hypertext Transfer Protocol 1.2.1 HTTP Proxies 1.3 Assignment Details 1.3.1 The Basics 1.3.2 Listening 1.3.3 Parsing the URL 1.3.4 Getting Data from the Remote Server 1.3.5 Returning Data to the Client 1.3.6 Testing Your Proxy 1.4 Configuring a Web Browser to Use a Proxy 1.4.1 A Caveat 1.4.2 Firefox 1.4.2.1 Configuring Firefox to use HTTP/1.0 1.4.3 Internet Explorer 1.5 Socket Programming 1.6 Grading 1.6.1 A Note on Network Programming 1.6.2 Possible Extensions 1.6.2.1 Content Transformation 1.6.2.2 Caching 1.6.2.3 Link Prefetch 1.6.2.4 Other Possible Extensions

Overview

In this assignment, you will implement a simple web proxy that passes requests and data between a web client and a web server. This will give you a chance to get to know one of the most popular application protocols on the Internet- the Hypertext Transfer Protocol (HTTP)v. 1.0- and give you an introduction to the Berkeley sockets API. When you're done with the assignment, you should be able to configure your web browser to use your personal proxy server as a web proxy.Introduction: The Hypertext Transfer Protocol

The Hypertext Transfer Protocol or (HTTP) is the protocol used for communication on this web. That is, it is the protocol which defines how your web browser requests resources from a web server and how the server responds. For simplicity, in this assignment we will be dealing only with version 1.0 of the HTTP protocol, defined in detail in RFC 1945. You should read through this RFC and refer back to it when deciding on the behavior of your proxy.

HTTP communications happen in the form of transactions, a transaction consists of a client sending a request to a server and then reading the response. Request and response messages share a common basic format:

An initial line (a request or response line, as defined below) Zero or more header lines A blank line (CRLF) An optional message body.

For most common HTTP transactions, the protocol boils down to a relatively simple series of steps (important sections of RFC 1945 are in parenthesis):

A client creates a connection to the server. The client issues a request by sending a line of text to the server. This request line consists of a HTTP method (most often GET, but POST, PUT, and others are possible), a request URI (like a URL), and the protocol version that the client wants to use (HTTP/1.0). The message body of the initial request is typically empty. (5.1-5.2, 8.1-8.3, 10, D.1) The server sends a response message, with its initial line consisting of a status line, indicating if the request was successful. The status line consists of the HTTP version (HTTP/1.0), a response status code (a numerical value that indicates whether or not the request was completed successfully), and a reason phrase, an English-language message providing description of the status code. Just as with the the request message, there can be as many or as few header fields in the response as the server wants to return. Following the CRLF field separator, the message body contains the data requested by the client in the event of a successful request. (6.1-6.2, 9.1-9.5, 10) Once the server has returned the response to the client, it closes the connection.

It's fairly easy to see this process in action without using a web browser. From a Unix prompt, type:

telnet www.yahoo.com 80

This opens a TCP connection to the server at www.yahoo.com listening on port 80- the default HTTP port. You should see something like this:

Trying 209.131.36.158...Connected to www.yahoo.com (209.131.36.158).Escape character is '^]'.

type the following:

GET / HTTP/1.0

and hit enter twice. You should see something like the following:

HTTP/1.1 200 OKDate: Fri, 10 Nov 2006 20:31:19 GMTConnection: closeContent-Type: text/html; charset=utf-8

<html><head><title>Yahoo!</title>(More HTML follows)

There may be some additional pieces of header information as well- setting cookies, instructions to the browser or proxy on caching behavior, etc. What you are seeing is exactly what your web browser sees when it goes to the Yahoo home page: the HTTP status line, the header fields, and finally the HTTP message body- consisting of the HTML that your browser interprets to create a web page.

HTTP Proxies

Ordinarily, HTTP is a client-server protocol. The client (usually your web browser) communicates directly with the server (the web server software). However, in some circumstances it may be useful to introduce an intermediate entity called a proxy. Conceptually, the proxy sits between the client and the server. In the simplest case, instead of sending requests directly to the server the client sends all its requests to the proxy. The proxy then opens a connection to the server, and passes on the client's request. The proxy receives the reply from the server, and then sends that reply back to the client. Notice that the proxy is essentially acting like both a HTTP client (to the remote server) and a HTTP server (to the initial client).

Why use a proxy? There are a few possible reasons:

Performance: By saving a copy of the pages that it fetches, a proxy can reduce the need to create connections to remote servers. This can reduce the overall delay involved in retrieving a page, particularly if a server is remote or under heavy load. Content Filtering and Transformation: While in the simplest case the proxy merely fetches a resource without inspecting it, there is nothing that says that a proxy is limited to blindly fetching and serving files. The proxy can inspect the requested URL and selectively block access to certain domains, reformat web pages (for instances, by stripping out images to make a page easier to display on a handheld or other limited-resource client), or perform other transformations and filtering. Privacy: Normally, web servers log all incoming requests for resources. This information typically includes at least the IP address of the client, the browser or other client program that they are using (called the User-Agent), the date and time, and the requested file. If a client does not wish to have this personally identifiable information recorded, routing HTTP requests through a proxy is one solution. All requests coming from clients using the same proxy appear to come from the IP address and User-Agent of the proxy itself, rather than the individual clients. If a number of clients use the same proxy (say, an entire business or university), it becomes much harder to link a particular HTTP transaction to a single computer or individual.

Links:

RFC 1945 The Hypertext Transfer Protocol, version 1.0

Assignment DetailsThe Basics

Your first task is to build a basic web proxy capable of accepting HTTP requests, making requests from remote servers, and returning data to a client.

This assignment can be completed in either ANSI C or C++. It should compile and run without errors from the FC 010 cluster, producing a binary called proxy that takes as its first argument a port to listen from. Don't use a hard-coded port number.

You shouldn't assume that your server will be running on a particular IP address, or that clients will be coming from a pre-determined IP.Listening

When your proxy starts, the first thing that it will need to do is establish a socket connection that it can use to listen for incoming connections. Your proxy should listen on the port specified from the command line, and wait for incoming client connections.

Once a client has connected, the proxy should read data from the client and then check for a properly-formatted HTTP request. An invalid request from the client should be answered with an appropriate error code.Parsing the URL

Once the proxy sees a valid HTTP request, it will need to parse the requested URL. The proxy needs at most three pieces of information: the requested host and port, and the requested path. See the URL (7) manual page for more info.Getting Data from the Remote Server

Once the proxy has parsed the URL, it can make a connection to the requested host (using the appropriate remote port, or the default of 80 if none is specified) and send a HTTP request for the appropriate file. The proxy then sends the HTTP request that it received from the client to the remote server.Returning Data to the Client

After the response from the remote server is received, the proxy should send the response message to the client via the appropriate socket. Once the transaction is complete, the proxy should close the connection.Testing Your Proxy

Run your client with the following command:

./proxy <port>, where port is the port number that the proxy should listen on. As a basic test of functionality, try requesting a page using telnet:

telnet localhost <port>Trying 127.0.0.1...Connected to localhost.localdomain (127.0.0.1).Escape character is '^]'.GET http://www.google.com HTTP/1.0

If your proxy is working correctly, the headers and HTML of the Google homepage should be displayed on your terminal screen.

For a slightly more complex test, you can configure your web browser to use your proxy server as its web proxy. See the section beflow for details.Configuring a Web Browser to Use a ProxyA Caveat

If you write a single-threaded proxy server, you will probably see some problems when you use your proxy with a standard web browser. Because a web browser like Firefox or IE issues multiple HTTP requests for each URL you request (for instance, to download images and other embedded content), a single-threaded proxy will likely miss some requests, resulting in missing images or other minor errors. That's OK. You are not required to use threading in this assignment. As long as your proxy works correctly for a simple HTML document (like, for instance, this assignment page) and follows the RFC, you can still receive all the points for this assignment.Firefox

Version 2.0:

Select Tools->Options from the menu. Click on the 'Advanced' icon in the Options dialog. Select the 'Network' tab, and click on 'Settings' in the 'Connections' area. Select 'Manual Proxy Configuration' from the options available. In the boxes, enter

Building a HTTP Proxy

the hostname and port where proxy program is running.

Earlier Versions:

Select Edit->Preferences from the menu. On the 'General' tab, click 'Connection Settings'. Select 'Manual Proxy Configuration' and enter the hostname and port where your proxy is running.

To stop using the proxy server, select 'Direct connection to the Internet' in the connection settings dialog.

Configuring Firefox to use HTTP/1.0

Because Firefox defaults to using HTTP/1.1 and your proxy speaks HTTP/1.0, there are a couple of minor changes that need to be made to Firefox's configuration. Fortunately, Firefox is smart enough to know when it is connecting through a proxy, and has a few special configuration keys that can be used to tweak the browser's behavior.

Type 'about:config' in the title bar. In the search/filter bar, type 'network.http.proxy' You should see three keys: network.http.proxy.keepalive, network.http.proxy.pipelining, and network.http.proxy.version. Set keepalive to false. Set version to 1.0. Make sure that pipelining is set to false.

Internet Explorer

Take a look at this page for complete instructions on enabling a proxy for various versions if Internet Explorer.

You should also do the following to make Internet Explorer work in a HTTP 1.0 compatible mode with your proxy:

Under Internet Options, select the 'Advanced' tab. Scroll down to HTTP 1.1 Settings. Uncheck 'Use HTTP 1.1 through proxy connections'.

Socket Programming

In order to build your proxy you will need to learn and become comfortable programming sockets. The Berkeley sockets library is the standard method of creating network systems on Unix. There are a number of functions that you will need to use for this assignment:

Parsing addresses:

inet_addr Convert a dotted quad IP address (such as 36.56.0.150) into a 32-bit address. gethostbyname Convert a hostname (such as argus.stanford.edu) into a 32-bit address. getservbyname Find the port number associated with a particular service, such as FTP.

Setting up a connection:

socket Get a descriptor to a socket of the given type connect Connect to a peer on a given socket getsockname Get the local address of a socket

Creating a server socket:

bind Assign an address to a socket listen Tell a socket to listen for incoming connections accept Accept an incoming connection

Communicating over the connection:

read/write Read and write data to a socket descriptor htons, htonl / ntohs , ntohl Convert between host and network byte orders (and vice versa) for 16 and 32-bit values

You can find the details of these functions in the Unix man pages (most of them are in section 2) and in the Stevens Unix Network Programming book, particularly chapters 3 and 4. Other sections you may want to browse include the client-server example system in chapter 5 (you will need to write both client and server code for this assignment) and the name and address conversion functions in chapter 9.

Links:

Guide to Network Programming Using Sockets An Introduction to Sockets Programming HTTP Made Really Easy- A Practical Guide to Writing Clients and Servers

Grading

You should submit your completed proxy by the date posted on the course website to iavramop at princeton dot edu. You will need to submit a tarball file containing the following:

All of the source code for your proxy A Makefile that builds your proxy A README file describing your code and the design decisions that you made, as described in the project guidelines page.

If you don't know how to create a tarball (tar archive), take a look at the sample Makefile at the top of the page, or man tar

Your proxy will be graded out of ten points, with the following criteria:

Your assignment should create a binary name proxy that will compile and run on the FC 010 cluster. The first command line argument should be the port that the proxy will listen from. Your proxy should run silently- any status messages or diagnostic output should be off by default. You can complete the assignment in either ANSI C or C++. Your proxy should work with both Firefox 2.0 and Internet Explorer 6. We'll first check that your proxy works correctly with a small number of major web pages, using the same script that we've given you to test your proxy. If your proxy passes all of these 'public' tests, you will get 7 of the possible points. We'll then check a number of additional URLs and transactions that you will not know in advance. If your proxy passes all of these tests, you get two additional points. These tests will check the overall robustness of your proxy, and how you handle certain edge cases. This may include sending your proxy incorrectly formed HTTP requests, large transfers, less common HTTP methods, etc. Well written (good abstraction, error checking, readability) and well commented code will get one additional point, for a total of 10. The first student to submit a proxy that scores a perfect 10 will win a prize!

There will also be some sort of prize for the best extension to the proxy. Adding an extension will not change your grade. Take a look below for some hints about possible extensions that you can add to the proxy.

As mentioned above you are not required to implement a multi-threaded proxy for this assignment. If you write a single-threaded client, you may see errors when using your proxy with a standard web browser, but that's OK. As long as your proxy works correctly for single HTTP transactions (for instance, try telnetting to to the port the proxy is running from and requesting a single HTML document) you can still receive all the possible points for this assignment.A Note on Network Programming

Writing code that will interact with other programs on the Internet is a little different than just writing something for your own use. The general guideline often given for network programs is: be lenient about what you accept, but strict about what you send. That is, even if a client doesn't do exactly the right thing, you should make a best effort to process their request if it is possible to easily figure out their intent. On the other hand, you should ensure that anything that you send out conforms to the published protocols as closely as possible. If an incoming request has a single field out of whack (such as sending you a request using HTTP 0.9 or 1.1), uses non-standard line terminators (some clients only send \r instead of the standard \r\n), or does something you don't quite expect with HTTP headers, you should still handle the request rather than dropping the request. Pay attention to parts of the RFC that specify areas where not all clients may conform exactly to what you expect. We'll be looking for this kind of interoperability in both the second round of tests that we run and in the style portion of your grade.

When in doubt, try to follow the behavior specified in RFC 1945. Also, check the FAQ for more specific guidelines.Possible Extensions

While it may not be obvious at first, proxies are very flexible tools that can serve a number of different purposes on the web. Common uses for proxies include improving giving performance boosts to dial-up users (through caching and pre-fetching), privacy protection (through anonymous proxies), content filtering and blocking (used in many "NetNanny"-type applications), and content transformation.

Sample Proxy Applications:

Anonymizer - A privacy protection/anonymous browsing service. Foxy - A filtering web proxy. Google Web Accelerator - The latest of a number of 'accelerators'.

You can impliment any of the following extensions (or some other extension that you've created yourself) as part of the contest we'll be running along with this assignment.

Content Transformation

Content transformation is the process of a proxy inserting, removing, or changing the contents of a resource requested from a remote server. After the resource has been retrieved from the server, the proxy is free to do whatever it would like to the content. Since the data returned from a web server is usually just text, this means that we can change the page almost any way we want- add or remove dirty words, change the text to Pig-Latin, rotate the images on the page 90 degrees, etc.Caching

Caching is one of the most common performance enhancements that web proxies implement. Caching takes advantage of the fact that most pages on the web don't change that often, and that any page that you visit once you (or someone else using the same proxy) are likely to visit again. A caching proxy server saves a copy of the files that it retrieves from remote servers. When another request comes in for the same resource, it returns the saved (or cached) copy instead of creating a new connection to a remote server. This saves a modest amount of time and CPU if the remote server is nearby and lightly trafficked, but can create more significant savings in the case of a more distant server or a remote server that is overloaded (it can also help reduce the load on heavily trafficked servers).

Caching introduces a few new complexities as well. First of all, a great deal of web content is dynamically generated, and as such shouldn't really be cached. Second, we need to decide how long to keep pages around in our cache. If the timeout is set too short, we negate most of the advantages of having a caching proxy. If the timeout is set too long, the client may end up looking at pages that are outdated or irrelevant.

There are a few steps to implementing caching behavior for your web proxy:

First, alter your proxy so that you can specify a timeout value (probably in seconds) on the command line. Second, you'll need to alter how your proxy retrieves pages. It should now check to see if a page exists in the proxy before retrieving a page from a remote server. If there is a valid cached copy of the page, that should be presented to the client instead of creating a new server connection. Finally, you will need to somehow implement cache expiration. The timing does not need to be exact (i.e. it's okay if a page is still in your cache after the timeout has expired, but it's not okay to serve a cached page after its timeout has expired), but you want to ensure that pages that are older than the user-set timeout are not served from the cache.

Link Prefetch

Building on top of your caching and content transformation code, the last piece of functionality that you will implement is called link prefetching. The idea behind link prefetching is simple: if a user asks for a particular page, the odds are that he or she will next request a page linked from that page. Link prefetching uses this information to attempt to speed up browsing by parsing requested pages for links, and then fetching the linked pages in the background. The pages fetched from the links are stored in the cache, ready to be served to the client when they are requested without the client having to wait around for the remote server to be contacted.

Parsing and fetching links can take an appreciable amount of time, especially for a page with a lot of links. For this reason, if you haven't already, at this stage you should make your proxy into a multi-threaded application. One thread should remain dedicated to the tasks that you have already implemented: reading requests from the client and serving pages from either the cache or a remote server. In a separate thread, the proxy will parse a page and extract the HTTP links, request those links from the remote server, and add them to the cache.Other Possible Extensions

HTTP 1.1 Support HTTP Connection Keep-Alive

本文链接地址:https://www.jiuchutong.com/zhishi/313342.html 转载请保留说明!

上一篇:织梦dedecms动态获取会员总数方法(织梦怎么改网站主页)

下一篇:syslogng配置(syslog ng)

  • 荣耀畅玩20支持快充吗(荣耀畅玩20支持5g吗)

    荣耀畅玩20支持快充吗(荣耀畅玩20支持5g吗)

  • 华为nova3i的宽度(华为nova3屏幕长度宽度)

    华为nova3i的宽度(华为nova3屏幕长度宽度)

  • 骁龙730的手机有哪些(骁龙730的手机有几款)

    骁龙730的手机有哪些(骁龙730的手机有几款)

  • 怎样才能抽到抖音卡(抖音怎么抽奖?)

    怎样才能抽到抖音卡(抖音怎么抽奖?)

  • 普通打印机能打印不干胶纸吗(普通打印机能打印菲林纸吗)

    普通打印机能打印不干胶纸吗(普通打印机能打印菲林纸吗)

  • 闲鱼发布成功却显示未上架(为什么闲鱼显示发布成功却看不到)

    闲鱼发布成功却显示未上架(为什么闲鱼显示发布成功却看不到)

  • 无线rssi值多少为正常(wifi rssi范围)

    无线rssi值多少为正常(wifi rssi范围)

  • 知乎闪退怎么回事(知乎闪退怎么回复)

    知乎闪退怎么回事(知乎闪退怎么回复)

  • 手机上眼睛的图案代表什么(手机上的眼睛怎么去掉)

    手机上眼睛的图案代表什么(手机上的眼睛怎么去掉)

  • 小米手机恢复出厂设置后小米账号(小米手机恢复出厂)

    小米手机恢复出厂设置后小米账号(小米手机恢复出厂)

  • 户户通看一会就卡住了(户户通看一会就黑屏)

    户户通看一会就卡住了(户户通看一会就黑屏)

  • 计算机主要功能是什么(计算机主要功能是进行什么运算)

    计算机主要功能是什么(计算机主要功能是进行什么运算)

  • 短信未送达是拉黑了吗(短信显示未送达是不是对方就看了)

    短信未送达是拉黑了吗(短信显示未送达是不是对方就看了)

  • 小米充电线叫什么名字(小米充电线叫啥)

    小米充电线叫什么名字(小米充电线叫啥)

  • ipad平板录音功能在哪(ipad录音功能在哪里打开)

    ipad平板录音功能在哪(ipad录音功能在哪里打开)

  • 苹果x怎么启用广角镜头(苹果x怎么启用4g)

    苹果x怎么启用广角镜头(苹果x怎么启用4g)

  • 抖音标签要怎么设置(抖音标签要怎么写才火)

    抖音标签要怎么设置(抖音标签要怎么写才火)

  • 拼多多怎么评论置顶(拼多多怎么评论别人的评论)

    拼多多怎么评论置顶(拼多多怎么评论别人的评论)

  • 荣耀9x充电指示灯在哪里(荣耀9x充电时指示灯为啥不亮)

    荣耀9x充电指示灯在哪里(荣耀9x充电时指示灯为啥不亮)

  • ipad迷你5可以用笔吗(ipadmini5可以吗)

    ipad迷你5可以用笔吗(ipadmini5可以吗)

  • iphonexs屏幕泛黄严重(iphonexs屏幕发黄)

    iphonexs屏幕泛黄严重(iphonexs屏幕发黄)

  • 抖音里的穿越怎么拍的(抖音穿越那个叫什么)

    抖音里的穿越怎么拍的(抖音穿越那个叫什么)

  • Win7系统中,网络识别故障如何操作才能解决?(win7网络连接在哪里打开)

    Win7系统中,网络识别故障如何操作才能解决?(win7网络连接在哪里打开)

  • pinfo命令  基于lynx类型info浏览(ping命令详解步骤)

    pinfo命令 基于lynx类型info浏览(ping命令详解步骤)

  • 个人物品出售要交哪些税
  • 企业纳税信用等级评定标准
  • 企业所得税季末从业人数怎么填
  • 出口退税自查中的“四自三不见”是指什么
  • 工业企业会计报告
  • 固定资产按什么价值入账
  • 填制凭证的内容通常包括
  • 欠款还了一部分怎么写起诉书
  • 企业如何进行利润分配的会记处理
  • 建筑资质挂靠费用怎么写会计分录?
  • 生产车间计提折旧分录
  • 一般纳税人6个点和13点
  • 勾选发票必须当月认证吗
  • 企业的哪些活动属于投资活动?
  • 人工成本如何分摊到服务成本
  • 支付宝对公账户还款清零要多久
  • 百旺税控服务器管理系统
  • 败诉赔偿如何支付
  • 成本费用利润率越高,说明企业盈利能力
  • 金税盘红字信息表金额大于原蓝票
  • 办公室空调维修属于办公费吗
  • 预缴企业所得税怎么做账务处理
  • 单位支付经济补偿金的情形
  • wedp是什么文件
  • php加密后怎么运行
  • 材料委托加工
  • php中strcmp函数
  • smiles查询
  • 固定资产处理步骤
  • 通往海滩的木板有什么用
  • vue实现瀑布流布局
  • php发送post
  • 其他货币资金明细科目有哪些
  • java单点登录token
  • web前端期末大作业旅游页面
  • php简单获取网站的方法
  • 现金盘存的具体方法
  • 阿里云ecs重装
  • 税率免税的发票
  • 电子税务局如何添加办税人员
  • 工会记账凭证怎么记
  • mysql创建临时表并赋值
  • 主要业务活动是什么意思
  • 其他应收款账户管理应遵循
  • 进项抵扣项目
  • 公司使用pos机
  • 预缴的所得税怎么申请退税
  • 权益净利率如何分析
  • 离线发票累计金额是多少
  • 出售无形资产取得的收入计入什么科目
  • 印花税怎么进行税种认定
  • 业务招待费会计科目
  • 施工期间水电费没有发票财务不给报销
  • 记帐凭证会计处理程序及其应用
  • mysql删除表数据怎么恢复
  • sql server中千万数量级分页存储过程代码
  • centos安装dig
  • 电脑出现蓝屏后黑屏怎么办
  • 苹果电脑快捷键截图怎么截
  • gpt分区方法
  • win8系统怎么更改无线网络IP
  • windowsxp文件夹里面的文件突然消失
  • window10找不到安装的软件
  • 微软7月补丁
  • win10系统升级后共享打印机不能用
  • windows安装mq
  • js时间倒计时定时器怎么弄
  • cocos2dx schedule
  • [置顶]JM259194
  • ecmascript6教程
  • ssh自动输入密码登录
  • javascript的描述
  • nodejs如何发布服务
  • mvc使用
  • 快速学会java
  • jquery的使用
  • comparable接口怎么用
  • 小规模纳税人利润如何缴税
  • 本市可以跨县高考报名吗
  • 出口增加为什么汇率会增加
  • 免责声明:网站部分图片文字素材来源于网络,如有侵权,请及时告知,我们会第一时间删除,谢谢! 邮箱:opceo@qq.com

    鄂ICP备2023003026号

    网站地图: 企业信息 工商信息 财税知识 网络常识 编程技术

    友情链接: 武汉网站建设