QCon SF 2009: Don Box & Amanda Laucher, Codename "M": Language, Data, and Modeling, Oh My!
These are my unedited notes from Don Box & Amanda Laucher's talk about Codename "M": Language, Data, and Modeling, Oh My!
- [Don has promised not to tweet. That's a good start.]
- Interesting similarities btw/ M and MPS
- M is a language for data - one of the most interesting places of data is a sequence of Unicode code points
- Great support for text processing perceived as critical
- Example: extract some data from text - tweets
- Intellipad - default environment for writing grammar files, this is where tool support shows up first, then in Visual Studio
M code for a language definition (a function from text to something else, declaratively specified):
module QCon { language Simple { syntax Main = empty; } }
Main is the rule that is the entry point
- Open file in three-pane (or rather, four-pane) mode
- Show source file, grammar, show output - empty file yields Main []
- Amanda makes a great assistant ;-)
- Non-empty file produces errors
- Change "empty" to "any" - file with a single character works, more than that produces errors; change to any+, validates again
Simple language for interpreting tweets:
module QCon { language Simple { syntax Main = Tweet; syntax Tweet = Content*; syntax Content = RawText | Handle | HashTag; token RawText = (any - ("#"|"@"))+ token Handle = "@" Name; token HashTag = "#" Name; token Name = (any - ("@|"#"|" ""))+; } }
regular expressions at the token layer, context-free grammar at the syntax level
- Amanda: "Only crap languages make you define something before you have to use it"
- Discussion betwen Josh, Amanda, Don about whether or not the grammar is correct
- Good point about interactive grammar development using the three-pane editor
- Pattern names used to display data in a structured way
- Adding @Classification["Keyword"] syntax-colors the source
- M is structurally typed
- M consists of lists and records
- Generating the right-hand side:
- Intellipad crashes! [Boom!] :-)
- The spec for M is licensed under the Microsoft OSP (which makes people as happy as they can be working with Microsoft)
- Javascript-based implementation of parts of the language; subset of the three-pane mode
- Toolchain is written in C#, parser written in M, more and more compilers written in M
Intellipad crashes! Again! [Boom!] :-)
module QCon { language Simple { syntax Main = v:Tweet => v; syntax Tweet = v:Content* => v; syntax Content = v:RawText => v | v:Handle => v | v:HashTag => v; token RawText = (any - ("#"|"@"))+ token Handle = "@" v:Name => v; token HashTag = "#" v:Name => v; token Name = (any - ("@|"#"|" ""))+; } }
bubbles up the actual content, result is just a list of the strings extracted
module QCon { language Simple { syntax Main = v:Tweet => v; syntax Tweet = v:Content* => v; syntax Content = v:RawText => v | v:Handle => v | v:HashTag => v; token RawText = v:(any - ("#"|"@"))+ => { Kind => "RawText", Text => v } token Handle = "@" v:Name => { Kind => "Handle", Name => v }; token HashTag = "#" v:Name => { Kind => "HashTag", Topic => v }; token Name = (any - ("@|"#"|" ""))+; } }
Question from Josh: Isn't this mixing lexing and production rules? Don: Goal is to have no difference, but it can be pulled out
- Next: Consuming stuff
- Using TDD to Don's tweets
- VS project includes language grammar defined earlier
- Amanda: Can one debug a grammar? Don: Answered later
- M runtime can be hosted inside a C# program
- Recently stopped internally to use .NET 3.5/VS 2008, now exclusively on .NET 4 and VS 2010
- Language defintion is included in the program binary
Showing off some
var language = Language.load(/* MImage */ typeof(Program).Assembly, "QCon", "Simple"); // yields runtime that can parse a program dynamic result = language.ParseString(input); bool hasHashTag = false; foreach (var content in result) { hasHashTag = content.Kind == "HashTag"; if (hasHashTag) break; } AssertEqual(true, hashHashTag);
Demo is actually working
- Change AssertEqual to Assert.IsTrue to get around exception thrown
- Question from Ian Robinson: Q. Can LINQ be used to walk the result? A. Not yet, as dyamic and LINQ don't mix yet
- Not going to write any more C# types, has written all that are in him ;-)
"This will fail!"
var language = Language.load(/* MImage */ typeof(Program).Assembly, "QCon", "Simple"); // yields runtime that can parse a program dynamic result = language.ParseString(input); var query = (from content in ((IEnumerable)result).OfType<dynamic>() where content.Kind == "HashTag" select content).Any(); Assert.IsTrue(query);
It failed indeed.
- "This talk is not about integration of two features I don't work on"
- M is optionally typed, structurally typed
- optional typing:
syntax Bob = any* : Integer32 => 42;
- only partially plumbed in the current version
Example for structural typing (not really working yet):
module Ola { type HashTagRec = { Kind : Text where value == "HashTag"; Topic : Text; } }
Rudimentary grammar debugging: set breakpoints in input source text; 4th pane shows up, shows matching stack
- Syntactical and semantical editing support
- Semantics relies on hooks that are not there yet
- M is built using an M Grammar
- Language completion for M is built using C#
- Ambiguators for GLR need to be written in C#
- One of the metrics used to evaluate the language: XML. There is a grammar for XML, briefly demoed.
- MPS guys have a different religion – Microsoft believes people have a text editor they love
<?magnum PI?>
:-)- Comment from Martin Fowler: Main difference to classic tools such as ANTLR is the dynamic - no code generation needed
- Comment from Ola Bini: Didn't see the type-checking and debugging before – now he's impressed