Basic Web Architecture

The basic site architecture is two-tiered and characterized by a web client that displays information content and a web server that transfers information to the client. This architecture depends on three key standards: HTML document for encoding content, URLs for naming remote information objects in a global namespace, and HTTP for staging the transfer. Hypertext Markup Language (HTML) – the common language for representation hypertext documents on the Web. HTML had a first public release as HTML 0.0 in 1990, was Internet draft HTML 1.0 in 1993, and HTML 2.0 in 1994. The September 22 1995 draft of the HTML 2.0 specification has been approved as a standard by the IETF Application Area HTML Working Group. HTML HTML 3.0 and Netscape are competing next generations of HTML 2.0. Proposed features in HTML 3.0 include: forms, style sheets, mathematical markup, and text flow around figures. For more detailed information, see the HTML Reference Manual. HTML is an application of the Standard Generalized Markup Language (SGML ISO-8879), an international standard approved in 1986, which specifies a formal meta-language for defining document markup systems (more here and here). An SGML Document Type Definition (DTD) specifies valid tag element names and attributes. HTML consists of embedded content separated by hierarchical case sensitive start and end tag names which may contain embedded element attributes in the start tag. These attributes may be required, optional, or empty. In addition, documents can be inter or intra linked by establishing source and target anchor points. Many HTML documents are the result of manual authoring or word processing HTML converters, but now several WYSIWYG HTML editors support styles – see listing at W3C and the Internet Survey Tools section on HTML authoring. HTML files are viewed using a WWW browser client (software), the primary user interface to the Web. HTML allows for embedding of images, sounds, video streams, form fields and simple text formatting. References, called hyperlinks, to other objects are embedded using URLs (see below). When an object is selected by a hyperlink, the browser takes an action based on the URL’s type, eg, retrieve a file, connect to another Web site and display a HTML file stored there, or launch an application such as an E-mail or newsgroup reader. Universal Resource Identifier (URI) – an IETF protocol for addressing objects in the WWW ( “if it’s out there, we can point at it”). There are two types of URIs, Universal Resource Names (URN) and the Universal Resource Locators (URLs). The current IETF URI spec is here and the URL spec is here. URLs are dependent arrivals and contain four distinct parts: the protocol type, the machine name, the directory path and the file name. There are several kinds of URLs: file URLs, URLs FTP, Gopher URLs, News URLs, and HTTP URLs. URLs may be relative to a directory or offsets into a document. Arguments to CGI programs (see below) may be embedded in URLs after the? Character. Hypertext Transfer Protocol (HTTP) – an application-level network protocol for the WWW. Tim Burners-Lee, father of the Web, describes it as a “generic stateless object-oriented protocol.” Stateless means neither the client nor the server store information about the state of the other side of an ongoing connection. Statelessness is a scalability property but is not necessarily efficient since HTTP sets up a new connection for each request, which is not desirable for situations requiring sessions or transactions. In HTTP commands (request methods) can be associated with particular types of network objects (files, documents, network services). Commands are provided for establishing a TCP / IP connection to a WWW server, sending a request to the server (containing a method to be applied to a specific network object identified by the object’s identification, and the HTTP protocol version, followed by information encoded in a header style) returning a response from the server to the client (consisting of three parts: a status line, a response header, and response data), and closing the connection. HTTP supports dynamic data representation through client-server negotiation. The requesting customer specifies it can accept certain MIME content types (more on this below) and the server responds with one of these. All WWW clients can handle text / plain and text / html. HTTP/1.0 Internet Draft 05 (the seventh release of HTTP/1.0) is targeted as an Internet Informational RFC. The next immediate version of HTTP is HTTP/1.1 Internet Draft 01.

Leave a Reply