Gcc which march




















When used with -march , the Pentium Pro instruction set is used, so the code runs on all i family chips. Used by Centrino notebooks. No scheduling is implemented for this chip. Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions.

The choices for cpu-type are the same as for -march. In addition, -mtune supports 2 extra choices for cpu-type :. But, if you do not know exactly what CPU users of your application will have, then you should use this option. As new processors are deployed in the marketplace, the behavior of this option will change.

Therefore, if you upgrade to a newer version of GCC, code generation controlled by this option will change to reflect the processors that are most common at the time that version of GCC is released. In contrast, -mtune indicates the processor or, in this case, collection of processors for which the code is optimized. Produce code optimized for the most current Intel processors, which are Haswell and Silvermont for this version of GCC. But, if you want your application performs better on both Haswell and Silvermont, then you should use this option.

As new Intel processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, code generation controlled by this option will change to reflect the most current Intel processors at the time that version of GCC is released. Generate floating-point arithmetic for selected unit unit.

The choices for unit are:. Use the standard floating-point coprocessor present on the majority of chips and emulated otherwise. Code compiled with this option runs almost everywhere. The temporary results are computed in bit precision instead of the precision specified by the type, resulting in slightly different results compared to most of other chips.

See -ffloat-store for more detailed description. Use scalar floating-point instructions present in the SSE instruction set. The earlier version of the SSE instruction set supports only single-precision arithmetic, thus the double and extended-precision arithmetic are still done using A later version, present only in Pentium 4 and AMD x chips, supports double-precision arithmetic too. For the x compiler, these extensions are enabled by default. The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of code, but may break some existing code that expects temporaries to be 80 bits.

This is the default choice for the x compiler, Darwin x targets, and the default choice for x targets with the SSE2 instruction set when -ffast-math is enabled. Attempt to utilize both instruction sets at once. This effectively doubles the amount of available registers, and on chips with separate execution units for and SSE the execution resources too.

Use this option with care, as it is still experimental, because the GCC register allocator does not model separate functional units well, resulting in unstable performance. Output assembly instructions using selected dialect. Also affects which dialect is used for basic asm see Basic Asm and extended asm see Extended Asm. Control whether or not the compiler uses IEEE floating-point comparisons. These correctly handle the case where the result of a comparison is unordered. Warning: the requisite libraries are not part of GCC.

You must make your own arrangements to provide suitable library functions for cross-compilation. On machines where a function returns floating-point results in the register stack, some floating-point opcodes may be emitted even if -msoft-float is used.

The usual calling convention has functions return values of types float and double in an FPU register, even if there is no FPU. The idea is that the operating system should emulate an FPU.

The option -mno-fp-ret-in causes such values to be returned in ordinary CPU registers instead. Some emulators do not support the sin , cos and sqrt instructions for the Specify this option to avoid generating those instructions.

This option is overridden when -march indicates that the target CPU always has an FPU and so the instruction does not need emulation. These instructions are not generated unless you also use the -funsafe-math-optimizations switch. Control whether GCC aligns double , long double , and long long variables on a two-word boundary or a one-word boundary. Aligning double variables on a two-word boundary produces code that runs somewhat faster on a Pentium at the expense of more memory. Warning: if you use the -malign-double switch, structures containing the above types are aligned differently than the published application binary interface specifications for the x and are not binary compatible with structures in code compiled without that switch.

These switches control the size of long double type. The x application binary interface specifies the size to be 96 bits, so -m96bit-long-double is the default in bit mode.

Modern architectures Pentium and newer prefer long double to be aligned to an 8- or byte boundary. In arrays or structures conforming to the ABI, this is not possible. So specifying -mbit-long-double aligns long double to a byte boundary by padding the long double with an additional bit zero. In the x compiler, -mbit-long-double is the default choice as its ABI specifies that long double is aligned on byte boundary. Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double.

Warning: if you override the default value for your target ABI, this changes the size of structures and arrays containing long double variables, as well as modifying the function calling convention for functions taking long double. Hence they are not binary-compatible with code compiled without that switch. A size of 64 bits makes the long double type equivalent to the double type. This is the default for bit Bionic C library.

Control how GCC aligns variables. This value must be the same across all objects linked into the binary, and defaults to Use a different function-calling convention, in which functions that take a fixed number of arguments return with the ret num instruction, which pops their arguments while returning.

This saves one instruction in the caller since there is no need to pop the arguments there. You can specify that an individual function is called with this calling sequence with the function attribute stdcall.

You can also override the -mrtd option by using the function attribute cdecl. See Function Attributes. Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.

Also, you must provide function prototypes for all functions that take variable numbers of arguments including printf ; otherwise incorrect code is generated for calls to those functions. In addition, seriously incorrect code results if you call a function with too many arguments. Normally, extra arguments are harmlessly ignored. See the GNU make info page for a list of some of the commonly used variables in this category.

They can be used to decrease the amount of debug messages for a program, increase error warning levels and, of course, to optimize the code produced. The GCC manual maintains a complete list of available options and their purposes. Variables set in this file will be exported to the environment of programs invoked by portage such that all packages will be compiled using these options as a base.

Almost every system should be configured in this manner. Don't set them arbitrarily. Individual packages further modify these options either in the ebuild or the build system itself to generate the final set of flags used when invoking the compiler. Being aware of the risks involved, take a look at some sane, safe optimizations.

These will hold in good stead and will be endearing to developers the next time a problem is reported on Bugzilla. Remember: aggressive flags can ruin code! Sometimes these conditions are mutually exclusive, so this guide will stick to combinations known to work well. Ideally, they are the best available for any CPU architecture.

For informational purposes, aggressive flag use will be covered later. Not every option listed on the GCC manual there are hundreds will be discussed, but basic, most common flags will be reviewed. The first and most important option is -march. This tells the compiler what code it should produce for the system's processor architecture or arch ; it tells GCC that it should produce code for a certain kind of CPU.

Different CPUs have different capabilities, support different instruction sets, and have different ways of executing code. The -march flag will instruct the compiler to produce specific code for the system's CPU, with all its capabilities, features, instruction sets, quirks, and so on provided the source code is prepared to use them. For instance, to take benefit from AVX instructions, the source code needs to be adapted to support it. The reason it isn't enabled at -O2 is that it doesn't always improve code, it can make code slower as well, and usually makes the code larger; it really depends on the loop etc.

To get more details, including march and mtune values, two commands can be used. When this flag is used, GCC will attempt to detect the processor and automatically set appropriate flags for it. However, this should not be used when intending to compile packages for different CPUs! Also available are the -mtune and -mcpu flags. These flags are normally only used when there is no available -march option; certain processor architectures may require -mtune or even -mcpu.

Unfortunately, GCC's behavior isn't very consistent with how each flag behaves from one architecture to the next. Consider using -mtune when generating code for older CPUs such as i and i Do not use -mcpu on x86 or x systems, as it is deprecated for those architectures.

Again, GCC's behavior and flag naming is not consistent across architectures, so be sure to check the GCC manual to determine which one should be used. Next up is the -O variable. This variable controls the overall level of optimization. Changing this value will make the code compilation take more time and will use much more memory, especially as the level of optimization is increased. With the exception of -O0 , the -O settings each activate several additional flags, so be sure to read the GCC manual's chapter on optimization options to learn which flags are activated at each -O level, as well as some explanations as to what they do.

As previously mentioned, -O2 is the recommended optimization level. If package compilation fails and while not using -O2 , try rebuilding with that option. A common flag is -pipe. This flag has no effect on the generated code, but it makes the compilation process faster. It tells the compiler to use pipes instead of temporary files during the different stages of compilation, which uses more memory.

On systems with low memory, GCC might get killed. For the march, you are just talking about available instructions and instruction sets. Any version of GCC knows about some set of instruction sets, usually corresponding to the newest arch it knows about.

It can also query the instruction sets supported by the current CPU. So I think we can say gcc is doing a reasonable thing on the -march side of things. That leaves -mtune. We know that, however, only with the benefit of hindsight: Skylake performs very much like Broadwell which performs essentially identical to Haswell before it , so Broadwell is a good tune for Skylake.

So even though I originally thought this was really dumb, I can see the logic. Usually you want to specific -march since the difference there is huge: newer instruction sets, and -mtune comes along for the side.

I can somewhat understand the choice of compiler-default behaviors, but also expect it might wander a bit between versions. This should not matter for most folk, for most problems, but if you are working a problem targeted for a specific processor, this stuff matters.

It was just easier to let GCC figure things out instead of specifying the actual values, and it worked, so why bother? The reason I had to change the code base was virtual machines. If the compiler does not know the actual architecture — you mentioned that broadwell is not correct, just close enough — how is it going to know that tuning for broadwell is more appropriate than tuning generic? Because apparently it is not a broadwell.

It seems consistent to me apply generic tuning for a CPU that the compiler does not yet have enough details. It cannot just assume that broadwell tuning is the best choice for all future broadwell successor CPUs. It is not wrong, but I would argue that it is not possible to infer this behaviour from the documentation.

So the net result is a surprise, and surprises are not good. One of the longest running threads in compiler development, this is a great post with the key question asked, some valuable introspection tools, and the general state of things explained.

Maybe I need to pony up some open source development effort…. I find it to be more of a documentation broad wording issue and not a bug per se. Where it says :. It exactly means cpu-type , not attribute-option. The confusing wording, but correct one.



0コメント

  • 1000 / 1000