DateTimeFormatter handling Sep / Sept short months [duplicate]

2 weeks ago 10
ARTICLE AD BOX

The Answer by ser7605325 is correct.

Caching month names

The Comment by Andreas suggests a more efficient approach by caching all the possible month names ahead of time. Let’s explore some code for that.

Map

We can populate a Map with all the possible month names as keys, and the corresponding Month enum object as the values.

Uppercase/lowercase is likely to be a problem in matching the text. Java 26 brings a Comparator built to eliminate case distinctions without applying locale or language specific conversions, per rules documented by the Unicode Consortium. The comparator is String.UNICODE_CASEFOLD_ORDER. See JDK-8365675 Add String Unicode Case-Folding Support.

Comparator

We can use that Comparator to organize the elements of a SequencedMap such as TreeMap.

To populate the map, we loop through every Locale. For each of those, we loop through each Month enum object. For each of those, we loop through both the FULL and FULL_STANDALONE objects of TextStyle.

private static SequencedMap < String, Month > makeMap ( ) { TreeMap < String, Month > map = new TreeMap <>( String.UNICODE_CASEFOLD_ORDER ); for ( Locale locale : Locale.getAvailableLocales() ) { for ( Month month : Month.values() ) { for ( TextStyle style : Set.of( TextStyle.FULL_STANDALONE , TextStyle.FULL ) ) { map.put( month.getDisplayName( style , locale ) , month ); } } } return map; }

In early-access of Java 26, we end up with 2,535 entries in our map.

For earlier versions of Java, you could add the ICU4J library to your project to access its case-folding features. Or you could try String.CASE_INSENSITIVE_ORDER as the Comparator, but I do not know if all the possible localized month names will work well.

Next we need a method to look up a localized month name, and return a Month enum object.

Well, actually, we do not need a method. We could just call get on the pre-populated map. The same comparator we used to populate the map, String.UNICODE_CASEFOLD_ORDER, is used by the get method to find a match regardless of case.

But our TreeMap is modifiable. So let's wrap our TreeMap to make it unmodifiable. We change the last line of that method shown above.

return Collections.unmodifiableSequencedMap( map );

Let's store that map as a public static final.

public static final SequencedMap < String, Month > nameToMonthMap = ParsingMonthName.makeMap();

Usage:

IO.println( nameToMonthMap.get( "May" ) ); IO.println( nameToMonthMap.get( "März" ) ); IO.println( nameToMonthMap.get( "Julio" ) );

When run:

MAY MARCH JULY

Keep in mind that the map returns null if we pass some unknown or invalid month name.

Let's get back to the full Question: How to parse these different date representations.

Let's strip each input of any extraneous characters.

String input = theInput.trim().strip();

First we notice that the month-name and year combos have a SPACE as delimiters. No other inputs have a SPACE, so we can discriminate based on that character.

We split the input on the SPACE character. The first part is assumed to be a localized month name. So we parse it with our static method shown above. Then we parse the year number as a Year object. Combine to get a YearMonth object. Return the date of the first day of that month.

If anything goes wrong along the way, we return an empty Optional. Using Optional forces the calling programmer to deal with the very real possibility of invalid input resulting in an impossible/empty value.

// If input contains a SPACE, we assume it is a localized month name followed by a year. if ( input.contains( " " ) ) { final String[] split = input.split( " " ); final String monthNameInput = split[ 0 ]; final String yearInput = split[ 1 ]; Month month = nameToMonthMap.get( monthNameInput ); if ( Objects.isNull( month ) ) { return Optional.empty(); } Year year; try { year = Year.of( Integer.parseInt( yearInput ) ); } catch ( NumberFormatException e ) { return Optional.empty(); } YearMonth yearMonth = YearMonth.of( year.getValue() , month ); return Optional.of( yearMonth.atDay( 1 ) ); }

For the rest of the inputs, we can discriminate by length of the string input.

long inputLength = input.codePoints().count();

We notice that only the year alone is four-digits. So we can discriminate by length being four. We parse as a Year object, trapping for DateTimeParseException. If that parsing succeeds, we return the first date of that year.

if ( inputLength == 4 ) { try { return Optional.of( Year.parse( input ).atDay( 1 ) ); } catch ( DateTimeParseException e ) { return Optional.empty(); } }

The next longest length is ten characters. We see that two kinds of input are ten characters long: the integer numbers, and the date strings in standard ISO 8601 format (YYYY-MM-DD).

If we detect any HYPEN-MINUS characters, we know the input is the ISO 8601 date string. We can directly parse as a LocalDate object, because the java.time classes use ISO 8601 formats by default when parsing/generating text.

Presumably the integer numbers represent a count of whole seconds since the first moment of 1970 in UTC, 1970-01-01T00:00Z. So we need two steps: (a) Parse the string input as a long value, and (b) Interpret that long as a java.time.Instant object. To get a date, we make an OffsetDateTime object, using ZoneOffset.UTC constant of an offset from UTC of zero hours-minutes-seconds. We choose this offset as a gross assumption, because you did not indicate how to interpret that input. Perhaps you intended a time zone but neglected in specify?

if ( inputLength == 10 ) { if ( input.contains( "-" ) ) // If HYPHEN-MINUS, must be a date in standard ISO 8601 format of YYYY-MM-DD. { try { return Optional.of( LocalDate.parse( input ) ); } catch ( DateTimeParseException e ) { return Optional.empty(); } } else // Else no HYPHEN-MINUS, so must be an integer number. { try { return Optional.of( Instant.ofEpochSecond( Long.parseLong( input ) ).atOffset( ZoneOffset.UTC ).toLocalDate() ); } catch ( NumberFormatException | DateTimeException e ) { return Optional.empty(); } } }

The last length is nineteen characters. This input would be a date with time-of-day in standard ISO 8601 format. The T in the middle is a delimiter between the date portion and the time portion. So we could split on that T, and report the date portion. But I would recommend being more thorough by actually parsing the entire input to be sure it matches our expectations. We can parse as a LocalDateTime object, then extract a LocalDate object.

if ( inputLength == 19 ) { try { return Optional.of( LocalDateTime.parse( input ).toLocalDate() ); } catch ( DateTimeParseException e ) { return Optional.empty(); } }

Putting that all together, we get this:

package work.basil.example; import java.time.*; import java.time.format.DateTimeParseException; import java.time.format.TextStyle; import java.util.*; public class ParsingMonthName { public static final SequencedMap < String, Month > nameToMonthMap = ParsingMonthName.makeMap(); private static SequencedMap < String, Month > makeMap ( ) { TreeMap < String, Month > map = new TreeMap <>( String.UNICODE_CASEFOLD_ORDER ); for ( Locale locale : Locale.getAvailableLocales() ) { for ( Month month : Month.values() ) { for ( TextStyle style : Set.of( TextStyle.FULL_STANDALONE , TextStyle.FULL ) ) { map.put( month.getDisplayName( style , locale ) , month ); } } } return Collections.unmodifiableSequencedMap( map ); } public static Optional < LocalDate > parseMysteryInput ( final String theInput ) { // Remove any extraneous characters. final String input = theInput.trim().strip(); // If input contains a SPACE, we assume it is a localized month name followed by a year. if ( input.contains( " " ) ) { final String[] split = input.split( " " ); final String monthNameInput = split[ 0 ]; final String yearInput = split[ 1 ]; Month month = nameToMonthMap.get( monthNameInput ); if ( Objects.isNull( month ) ) { return Optional.empty(); } Year year; try { year = Year.of( Integer.parseInt( yearInput ) ); } catch ( NumberFormatException e ) { return Optional.empty(); } YearMonth yearMonth = YearMonth.of( year.getValue() , month ); return Optional.of( yearMonth.atDay( 1 ) ); } final long inputLength = input.codePoints().count(); if ( inputLength == 4 ) { try { return Optional.of( Year.parse( input ).atDay( 1 ) ); } catch ( DateTimeParseException e ) { return Optional.empty(); } } if ( inputLength == 10 ) { if ( input.contains( "-" ) ) // If HYPHEN-MINUS, must be a date in standard ISO 8601 format of YYYY-MM-DD. { try { return Optional.of( LocalDate.parse( input ) ); } catch ( DateTimeParseException e ) { return Optional.empty(); } } else // Else no HYPHEN-MINUS, so must be an integer number. { try { return Optional.of( Instant.ofEpochSecond( Long.parseLong( input ) ).atOffset( ZoneOffset.UTC ).toLocalDate() ); } catch ( NumberFormatException | DateTimeException e ) { return Optional.empty(); } } } if ( inputLength == 19 ) { try { return Optional.of( LocalDateTime.parse( input ).toLocalDate() ); } catch ( DateTimeParseException e ) { return Optional.empty(); } } // If reaching this point, we encountered unexpected input. return Optional.empty(); } }

Let's try it with your example inputs.

final String inputs = """ 1370037600 1385852400 1356994800 2014-03-01T00:00:00 2013-06-01T00:00:00 2012-01-01 2012 May 2012 März 2010 Julio 2009 """; inputs .lines() .map( ParsingMonthName :: parseMysteryInput ) .forEach( IO :: println );

When run:

Optional[2013-05-31] Optional[2013-11-30] Optional[2012-12-31] Optional[2014-03-01] Optional[2013-06-01] Optional[2012-01-01] Optional[2012-01-01] Optional[2012-05-01] Optional[2010-03-01] Optional[2009-07-01]

ISO 8601

Educate the people who produced this data mess about the virtues of using only standard ISO 8601 formats for exchanging date-time values textually.

The java.time classes use ISO 8601 formats by default when parsing/generating text.

Read Entire Article