Chapter 2. Java Syntax from the Ground Up

This chapter is a terse but comprehensive introduction to Java syntax. It is written primarily for readers who are new to the language, but have at least some previous programming experience. Determined novices with no prior programming experience may also find it useful. If you already know Java, you should find it a useful language reference. In previous editions of this book, this chapter was written explicitly for C and C++ programmers making the transition to Java. It has been rewritten for this edition to make it more generally useful, but it still contains comparisons to C and C++ for the benefit of programmers coming from those languages.[1]

[1] Readers who want even more thorough coverage of the Java language should consider The Java Programming Language, Second Edition, by Ken Arnold and James Gosling (the creator of Java) (Addison Wesley Longman). And hard-core readers may want to go straight to the primary source: The Java Language Specification, by James Gosling, Bill Joy, and Guy Steele (Addison Wesley Longman). This specification is available in printed book form, but is also freely available for download from Sun's web site at http://java.sun.com/docs/books/jls/. I found both documents quite helpful while writing this chapter.

This chapter documents the syntax of Java programs by starting at the very lowest level of Java syntax and building from there, covering increasingly higher orders of structure. It covers:

The characters used to write Java programs and the encoding of those characters.
Data types, literal values, identifiers, and other tokens that comprise a Java program.
The operators used in Java to group individual tokens into larger expressions.
Statements, which group expressions and other statements to form logical chunks of Java code.
Methods (also called functions, procedures, or subroutines), which are named collections of Java statements that can be invoked by other Java code.
Classes, which are collections of methods and fields. Classes are the central program element in Java and form the basis for object-oriented programming. Chapter 3, "Object-Oriented Programming in Java", is devoted entirely to a discussion of classes and objects.
Packages, which are collections of related classes.
Java programs, which consist of one or more interacting classes that may be drawn from one or more packages.

The syntax of most programming languages is complex, and Java is no exception. In general, it is not possible to document all elements of a language without referring to other elements that have not yet been discussed. For example, it is not really possible to explain in a meaningful way the operators and statements supported by Java without referring to objects. But it is also not possible to document objects thoroughly without referring to the operators and statements of the language. The process of learning Java, or any language, is therefore an iterative one. If you are new to Java (or a Java-style programming language), you may find that you benefit greatly from working through this chapter and the next twice, so that you can grasp the interrelated concepts.

2.1. The Unicode Character Set

Java programs are written using the Unicode character set. Unlike the 7-bit ASCII encoding, which is useful only for English, and the 8-bit ISO Latin-1 encoding, which is useful only for major Western European languages, the 16-bit Unicode encoding can represent virtually every written language in common use on the planet. Very few text editors support Unicode, however, and in practice, most Java programs are written in plain ASCII. 16-bit Unicode characters are typically written to files using an encoding known as UTF-8, which converts the 16-bit characters into a stream of bytes. The format is designed so that plain ASCII and Latin-1 text are valid UTF-8 byte streams. Thus, you can simply write plain ASCII programs, and they will work as valid Unicode.

If you want to embed a Unicode character within a Java program that is written in plain ASCII, use the special Unicode escape sequence \uxxxx. That is, a backslash and a lowercase u, followed by four hexadecimal characters. For example, \u0020 is the space character, and \u3c00 is the character π. You can use Unicode characters anywhere in a Java program, including comments and variable names.

Chapter 2. Java Syntax from the Ground Up

Contents:

2.1. The Unicode Character Set